Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decompose TagFirstLast implementation for speed #643

Merged
merged 3 commits into from
Jan 18, 2023

Conversation

Orace
Copy link
Contributor

@Orace Orace commented Nov 5, 2019

This implementation avoid the creation of a KeyValuePair for each item of the sequence.

It improves evaluation of a 0 returning resultSelector by a factor of 3

2019-11-05_14h13_39

@Orace Orace closed this Nov 5, 2019
@Orace Orace deleted the TagFirstLast branch November 5, 2019 13:38
@Orace Orace restored the TagFirstLast branch November 5, 2019 13:41
@Orace Orace reopened this Nov 5, 2019
@Orace Orace force-pushed the TagFirstLast branch 2 times, most recently from 8d50b73 to e61eaf0 Compare January 16, 2023 11:11
@Orace
Copy link
Contributor Author

Orace commented Jan 16, 2023

Depends on #928 that introduce a test for TagFirstLast.

@codecov
Copy link

codecov bot commented Jan 16, 2023

Codecov Report

Merging #643 (8760100) into master (60ec000) will decrease coverage by 0.01%.
The diff coverage is 93.33%.

@@            Coverage Diff             @@
##           master     #643      +/-   ##
==========================================
- Coverage   92.41%   92.41%   -0.01%     
==========================================
  Files         112      112              
  Lines        3426     3439      +13     
  Branches     1017     1021       +4     
==========================================
+ Hits         3166     3178      +12     
  Misses        199      199              
- Partials       61       62       +1     
Impacted Files Coverage Δ
MoreLinq/TagFirstLast.cs 94.73% <93.33%> (-5.27%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@Orace Orace force-pushed the TagFirstLast branch 2 times, most recently from 600b63b to 02932e8 Compare January 16, 2023 11:37
Copy link
Contributor

@viceroypenguin viceroypenguin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Subject to correction of test in #928, approved

@Orace Orace force-pushed the TagFirstLast branch 2 times, most recently from 45aef84 to 0717f2e Compare January 16, 2023 12:36
@atifaziz atifaziz changed the title Improve TagFirstLast Decompose TagFirstLast implementation for speed Jan 16, 2023
Copy link
Member

@atifaziz atifaziz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we publish the performance numbers using BenchmarkDotNet? I'm afraid that a screenshot doesn't say much about the configuration, runtime and other factors could have influenced the run.

Also, please bear in mind that TagFirstLast that reuses CountDown also benefits from its optimisations for lists and collections so the implementation in this PR could introduce a performance regression in those cases. The benchmarks should therefore include sequences, lists & collections as sources.

@viceroypenguin
Copy link
Contributor

Can we publish the performance numbers using BenchmarkDotNet? I'm afraid that a screenshot doesn't say much about the configuration, runtime and other factors could have influenced the run.

Also, please bear in mind that TagFirstLast that reuses CountDown also benefits from its optimisations for lists and collections so the implementation in this PR could introduce a performance regression in those cases. The benchmarks should therefore include sequences, lists & collections as sources.

I'd argue (without proof, but reasonable belief) that collection evaluation would not improve performance significantly, if at all, for this operator. Knowing the size of the collection improves performance in one of two cases: when it allows us to avoid iterating the collection at all, or when it allows us to avoid buffering (CountDown being an example).

In this case, we need to iterate the full list anyway (since, we are returning an enumeration that contains every value from the original), and we are only buffering a single value, and that only for a single loop each.

As such, I would find it surprising (but not impossible) to find a performance improvement using CountDown on a collection.

@Orace
Copy link
Contributor Author

Orace commented Jan 17, 2023

Can we publish the performance numbers using BenchmarkDotNet? I'm afraid that a screenshot doesn't say much about the configuration, runtime and other factors could have influenced the run.

I will try to find time for this, sorry for the screenshot, I can't believe I was able to do such a thing 3 years ago 🙄

Also, please bear in mind that TagFirstLast that reuses CountDown also benefits from its optimisations for lists and collections so the implementation in this PR could introduce a performance regression in those cases. The benchmarks should therefore include sequences, lists & collections as sources.

Actually TagFirstLast reuses CountDown after Index:

public static IEnumerable<TResult> TagFirstLast<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, bool, bool, TResult> resultSelector)
{
if (source == null) throw new ArgumentNullException(nameof(source));
if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector));
return source.Index() // count-up
.CountDown(1, (e, cd) => resultSelector(e.Value, e.Key == 0, cd == 0));
}

Since Index use Select and Select returns neither a list nor a collection I don't think there is any optimizations here.

But we can add some over the implementation proposed in this PR.

@atifaziz
Copy link
Member

I will try to find time for this

Cool.

Actually TagFirstLast reuses CountDown after Index:
Since Index use Select and Select returns neither a list nor a collection I don't think there is any optimizations here.

That's a good point.

But we can add some over the implementation proposed in this PR.

Let's benchmark and we can always add those later since the optimisations (if any) weren't getting leveraged due to chaining with Index.

@Orace
Copy link
Contributor Author

Orace commented Jan 17, 2023

I quickly set up some benchmarks (available here).
It tests the old and new implementation against 1 million items in a List or an Enumerable.Range (RangeIterator):

BenchmarkDotNet=v0.13.4, OS=Windows 10 (10.0.19045.2486)
Intel Core i7-8750H CPU 2.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100
[Host] : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX2
DefaultJob : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX2

Method Source Mean Error StdDev Allocated
TagFirstLast List[Int32] [47] 34.11 ms 0.505 ms 0.472 ms 512 B
TagFirstLastNew List[Int32] [47] 14.12 ms 0.045 ms 0.035 ms 137 B
TagFirstLast RangeIterator [36] 31.26 ms 0.232 ms 0.217 ms 491 B
TagFirstLastNew RangeIterator [36] 11.73 ms 0.102 ms 0.085 ms 137 B

Copy link
Member

@atifaziz atifaziz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing the benchmark numbers and code.

Since this operator is now implemented entirely on its own, it's worth updating the tests to use TestingSequence (to ensure disposal and single-pass iteration of the source sequence) like so:

diff --git a/MoreLinq.Test/TagFirstLastTest.cs b/MoreLinq.Test/TagFirstLastTest.cs
index b4bf9cb..acab948 100644
--- a/MoreLinq.Test/TagFirstLastTest.cs
+++ b/MoreLinq.Test/TagFirstLastTest.cs
@@ -26,9 +26,10 @@ public class TagFirstLastTest
         public void TagFirstLastDoesOneLookAhead()
         {
             var source = MoreEnumerable.From(() => 123, () => 456, BreakingFunc.Of<int>());
-            source.TagFirstLast((item, isFirst, isLast) => new { Item = item, IsFirst = isFirst, IsLast = isLast })
-                  .Take(1)
-                  .Consume();
+            using var result = source.TagFirstLast((item, isFirst, isLast) => new { Item = item, IsFirst = isFirst, IsLast = isLast })
+                                     .AsTestingSequence();
+            result.Take(1).Consume();
+
         }
 
         [Test]
@@ -41,24 +42,27 @@ public void TagFirstLastIsLazy()
         public void TagFirstLastWithSourceSequenceOfZero()
         {
             var source = Enumerable.Empty<int>();
-            var sut = source.TagFirstLast(BreakingFunc.Of<int, bool, bool, int>());
-            Assert.That(sut, Is.Empty);
+            using var result = source.TagFirstLast(BreakingFunc.Of<int, bool, bool, int>())
+                                     .AsTestingSequence();
+            Assert.That(result, Is.Empty);
         }
 
         [Test]
         public void TagFirstLastWithSourceSequenceOfOne()
         {
             var source = new[] { 123 };
-            source.TagFirstLast((item, isFirst, isLast) => new { Item = item, IsFirst = isFirst, IsLast = isLast })
-                  .AssertSequenceEqual(new { Item = 123, IsFirst = true, IsLast = true });
+            using var result = source.TagFirstLast((item, isFirst, isLast) => new { Item = item, IsFirst = isFirst, IsLast = isLast })
+                                     .AsTestingSequence();
+            result.AssertSequenceEqual(new { Item = 123, IsFirst = true, IsLast = true });
         }
 
         [Test]
         public void TagFirstLastWithSourceSequenceOfTwo()
         {
             var source = new[] { 123, 456 };
-            source.TagFirstLast((item, isFirst, isLast) => new { Item = item, IsFirst = isFirst, IsLast = isLast })
-                  .AssertSequenceEqual(new { Item = 123, IsFirst = true,  IsLast = false },
+            using var result = source.TagFirstLast((item, isFirst, isLast) => new { Item = item, IsFirst = isFirst, IsLast = isLast })
+                                     .AsTestingSequence();
+            result.AssertSequenceEqual(new { Item = 123, IsFirst = true,  IsLast = false },
                                        new { Item = 456, IsFirst = false, IsLast = true });
         }
 
@@ -66,8 +70,9 @@ public void TagFirstLastWithSourceSequenceOfTwo()
         public void TagFirstLastWithSourceSequenceOfThree()
         {
             var source = new[] { 123, 456, 789 };
-            source.TagFirstLast((item, isFirst, isLast) => new { Item = item, IsFirst = isFirst, IsLast = isLast })
-                  .AssertSequenceEqual(new { Item = 123, IsFirst = true,  IsLast = false },
+            using var result = source.TagFirstLast((item, isFirst, isLast) => new { Item = item, IsFirst = isFirst, IsLast = isLast })
+                                     .AsTestingSequence();
+            result.AssertSequenceEqual(new { Item = 123, IsFirst = true,  IsLast = false },
                                        new { Item = 456, IsFirst = false, IsLast = false },
                                        new { Item = 789, IsFirst = false, IsLast = true  });
         }

With this, we'll be good to merge!

@Orace Orace requested a review from atifaziz January 17, 2023 17:21
Copy link
Member

@atifaziz atifaziz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!

Copy link
Contributor

@viceroypenguin viceroypenguin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atifaziz I think you recommended the wrong change. .AsTestingSequence() should be done on the source, and then .TagFirstLast() should be done on the TestingSequence returned by .AsTestingSequence(). Tests do not actually test that TagFirstLast() disposes properly.

Copy link
Member

@atifaziz atifaziz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atifaziz I think you recommended the wrong change. .AsTestingSequence() should be done on the source, and then .TagFirstLast() should be done on the TestingSequence returned by .AsTestingSequence(). Tests do not actually test that TagFirstLast() disposes properly.

Right, that's what happens when you do a review/recommendation while in a rush to catch the shops before they close. I've always suffered from mistakes when I go fast, but then when I go slow, the project suffers. Anyway, fortunately, there's more than one pair of eyes on this. @viceroypenguin Thanks for spotting this.

@Orace Sorry for misleading and would appreciate if you could make the changes to chain AsTestingSequence() on the source sequence to the operator rather than the one that's its result.

@Orace Orace requested a review from atifaziz January 18, 2023 09:20
Copy link
Member

@atifaziz atifaziz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor

@viceroypenguin viceroypenguin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@atifaziz atifaziz merged commit fd10ce0 into morelinq:master Jan 18, 2023
@Orace Orace deleted the TagFirstLast branch January 18, 2023 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants