Decompose `TagFirstLast` implementation for speed #643

Orace · 2019-11-05T13:34:29Z

This implementation avoid the creation of a KeyValuePair for each item of the sequence.

It improves evaluation of a 0 returning resultSelector by a factor of 3

Orace · 2023-01-16T11:19:15Z

Depends on #928 that introduce a test for TagFirstLast.

codecov · 2023-01-16T11:21:35Z

Codecov Report

Merging #643 (8760100) into master (60ec000) will decrease coverage by 0.01%.
The diff coverage is 93.33%.

@@            Coverage Diff             @@
##           master     #643      +/-   ##
==========================================
- Coverage   92.41%   92.41%   -0.01%     
==========================================
  Files         112      112              
  Lines        3426     3439      +13     
  Branches     1017     1021       +4     
==========================================
+ Hits         3166     3178      +12     
  Misses        199      199              
- Partials       61       62       +1

Impacted Files	Coverage Δ
MoreLinq/TagFirstLast.cs	`94.73% <93.33%> (-5.27%)`	⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

viceroypenguin

Subject to correction of test in #928, approved

atifaziz

Can we publish the performance numbers using BenchmarkDotNet? I'm afraid that a screenshot doesn't say much about the configuration, runtime and other factors could have influenced the run.

Also, please bear in mind that TagFirstLast that reuses CountDown also benefits from its optimisations for lists and collections so the implementation in this PR could introduce a performance regression in those cases. The benchmarks should therefore include sequences, lists & collections as sources.

viceroypenguin · 2023-01-17T00:30:22Z

Can we publish the performance numbers using BenchmarkDotNet? I'm afraid that a screenshot doesn't say much about the configuration, runtime and other factors could have influenced the run.

Also, please bear in mind that TagFirstLast that reuses CountDown also benefits from its optimisations for lists and collections so the implementation in this PR could introduce a performance regression in those cases. The benchmarks should therefore include sequences, lists & collections as sources.

I'd argue (without proof, but reasonable belief) that collection evaluation would not improve performance significantly, if at all, for this operator. Knowing the size of the collection improves performance in one of two cases: when it allows us to avoid iterating the collection at all, or when it allows us to avoid buffering (CountDown being an example).

In this case, we need to iterate the full list anyway (since, we are returning an enumeration that contains every value from the original), and we are only buffering a single value, and that only for a single loop each.

As such, I would find it surprising (but not impossible) to find a performance improvement using CountDown on a collection.

Orace · 2023-01-17T10:33:03Z

Can we publish the performance numbers using BenchmarkDotNet? I'm afraid that a screenshot doesn't say much about the configuration, runtime and other factors could have influenced the run.

I will try to find time for this, sorry for the screenshot, I can't believe I was able to do such a thing 3 years ago 🙄

Also, please bear in mind that TagFirstLast that reuses CountDown also benefits from its optimisations for lists and collections so the implementation in this PR could introduce a performance regression in those cases. The benchmarks should therefore include sequences, lists & collections as sources.

Actually TagFirstLast reuses CountDown after Index:

MoreLINQ/MoreLinq/TagFirstLast.cs

Lines 57 to 64 in f4806f5

    
           public static IEnumerable<TResult> TagFirstLast<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, bool, bool, TResult> resultSelector) 
        
           { 
        
               if (source == null) throw new ArgumentNullException(nameof(source)); 
        
               if (resultSelector == null) throw new ArgumentNullException(nameof(resultSelector)); 
        
               return source.Index() // count-up 
        
                            .CountDown(1, (e, cd) => resultSelector(e.Value, e.Key == 0, cd == 0)); 
        
           }

Since Index use Select and Select returns neither a list nor a collection I don't think there is any optimizations here.

But we can add some over the implementation proposed in this PR.

atifaziz · 2023-01-17T11:25:38Z

I will try to find time for this

Cool.

Actually TagFirstLast reuses CountDown after Index:
Since Index use Select and Select returns neither a list nor a collection I don't think there is any optimizations here.

That's a good point.

But we can add some over the implementation proposed in this PR.

Let's benchmark and we can always add those later since the optimisations (if any) weren't getting leveraged due to chaining with Index.

Orace · 2023-01-17T11:32:46Z

I quickly set up some benchmarks (available here).
It tests the old and new implementation against 1 million items in a List or an Enumerable.Range (RangeIterator):

BenchmarkDotNet=v0.13.4, OS=Windows 10 (10.0.19045.2486)
Intel Core i7-8750H CPU 2.20GHz (Coffee Lake), 1 CPU, 12 logical and 6 physical cores
.NET SDK=7.0.100
[Host] : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX2
DefaultJob : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX2

Method	Source	Mean	Error	StdDev	Allocated
TagFirstLast	List[Int32] [47]	34.11 ms	0.505 ms	0.472 ms	512 B
TagFirstLastNew	List[Int32] [47]	14.12 ms	0.045 ms	0.035 ms	137 B
TagFirstLast	RangeIterator [36]	31.26 ms	0.232 ms	0.217 ms	491 B
TagFirstLastNew	RangeIterator [36]	11.73 ms	0.102 ms	0.085 ms	137 B

atifaziz

Thanks for sharing the benchmark numbers and code.

Since this operator is now implemented entirely on its own, it's worth updating the tests to use TestingSequence (to ensure disposal and single-pass iteration of the source sequence) like so:

diff --git a/MoreLinq.Test/TagFirstLastTest.cs b/MoreLinq.Test/TagFirstLastTest.cs
index b4bf9cb..acab948 100644
--- a/MoreLinq.Test/TagFirstLastTest.cs
+++ b/MoreLinq.Test/TagFirstLastTest.cs
@@ -26,9 +26,10 @@ public class TagFirstLastTest
         public void TagFirstLastDoesOneLookAhead()
         {
             var source = MoreEnumerable.From(() => 123, () => 456, BreakingFunc.Of<int>());
-            source.TagFirstLast((item, isFirst, isLast) => new { Item = item, IsFirst = isFirst, IsLast = isLast })
-                  .Take(1)
-                  .Consume();
+            using var result = source.TagFirstLast((item, isFirst, isLast) => new { Item = item, IsFirst = isFirst, IsLast = isLast })
+                                     .AsTestingSequence();
+            result.Take(1).Consume();
+
         }
 
         [Test]
@@ -41,24 +42,27 @@ public void TagFirstLastIsLazy()
         public void TagFirstLastWithSourceSequenceOfZero()
         {
             var source = Enumerable.Empty<int>();
-            var sut = source.TagFirstLast(BreakingFunc.Of<int, bool, bool, int>());
-            Assert.That(sut, Is.Empty);
+            using var result = source.TagFirstLast(BreakingFunc.Of<int, bool, bool, int>())
+                                     .AsTestingSequence();
+            Assert.That(result, Is.Empty);
         }
 
         [Test]
         public void TagFirstLastWithSourceSequenceOfOne()
         {
             var source = new[] { 123 };
-            source.TagFirstLast((item, isFirst, isLast) => new { Item = item, IsFirst = isFirst, IsLast = isLast })
-                  .AssertSequenceEqual(new { Item = 123, IsFirst = true, IsLast = true });
+            using var result = source.TagFirstLast((item, isFirst, isLast) => new { Item = item, IsFirst = isFirst, IsLast = isLast })
+                                     .AsTestingSequence();
+            result.AssertSequenceEqual(new { Item = 123, IsFirst = true, IsLast = true });
         }
 
         [Test]
         public void TagFirstLastWithSourceSequenceOfTwo()
         {
             var source = new[] { 123, 456 };
-            source.TagFirstLast((item, isFirst, isLast) => new { Item = item, IsFirst = isFirst, IsLast = isLast })
-                  .AssertSequenceEqual(new { Item = 123, IsFirst = true,  IsLast = false },
+            using var result = source.TagFirstLast((item, isFirst, isLast) => new { Item = item, IsFirst = isFirst, IsLast = isLast })
+                                     .AsTestingSequence();
+            result.AssertSequenceEqual(new { Item = 123, IsFirst = true,  IsLast = false },
                                        new { Item = 456, IsFirst = false, IsLast = true });
         }
 
@@ -66,8 +70,9 @@ public void TagFirstLastWithSourceSequenceOfTwo()
         public void TagFirstLastWithSourceSequenceOfThree()
         {
             var source = new[] { 123, 456, 789 };
-            source.TagFirstLast((item, isFirst, isLast) => new { Item = item, IsFirst = isFirst, IsLast = isLast })
-                  .AssertSequenceEqual(new { Item = 123, IsFirst = true,  IsLast = false },
+            using var result = source.TagFirstLast((item, isFirst, isLast) => new { Item = item, IsFirst = isFirst, IsLast = isLast })
+                                     .AsTestingSequence();
+            result.AssertSequenceEqual(new { Item = 123, IsFirst = true,  IsLast = false },
                                        new { Item = 456, IsFirst = false, IsLast = false },
                                        new { Item = 789, IsFirst = false, IsLast = true  });
         }

With this, we'll be good to merge!

atifaziz

Thanks for this!

viceroypenguin

@atifaziz I think you recommended the wrong change. .AsTestingSequence() should be done on the source, and then .TagFirstLast() should be done on the TestingSequence returned by .AsTestingSequence(). Tests do not actually test that TagFirstLast() disposes properly.

atifaziz

@atifaziz I think you recommended the wrong change. .AsTestingSequence() should be done on the source, and then .TagFirstLast() should be done on the TestingSequence returned by .AsTestingSequence(). Tests do not actually test that TagFirstLast() disposes properly.

Right, that's what happens when you do a review/recommendation while in a rush to catch the shops before they close. I've always suffered from mistakes when I go fast, but then when I go slow, the project suffers. Anyway, fortunately, there's more than one pair of eyes on this. @viceroypenguin Thanks for spotting this.

@Orace Sorry for misleading and would appreciate if you could make the changes to chain AsTestingSequence() on the source sequence to the operator rather than the one that's its result.

atifaziz

Thanks!

viceroypenguin

LGTM

Orace closed this Nov 5, 2019

Orace deleted the TagFirstLast branch November 5, 2019 13:38

Orace restored the TagFirstLast branch November 5, 2019 13:41

Orace reopened this Nov 5, 2019

Orace force-pushed the TagFirstLast branch 2 times, most recently from 8d50b73 to e61eaf0 Compare January 16, 2023 11:11

Orace force-pushed the TagFirstLast branch 2 times, most recently from 600b63b to 02932e8 Compare January 16, 2023 11:37

Orace mentioned this pull request Jan 16, 2023

Additional tests for TagFirstLast #928

Merged

viceroypenguin approved these changes Jan 16, 2023

View reviewed changes

Orace force-pushed the TagFirstLast branch 2 times, most recently from 45aef84 to 0717f2e Compare January 16, 2023 12:36

atifaziz changed the title ~~Improve TagFirstLast~~ Decompose TagFirstLast implementation for speed Jan 16, 2023

atifaziz requested changes Jan 16, 2023

View reviewed changes

Improve TagFirstLast

8e35d5c

Orace force-pushed the TagFirstLast branch from 0717f2e to 8e35d5c Compare January 17, 2023 11:38

Orace requested a review from atifaziz January 17, 2023 15:04

atifaziz requested changes Jan 17, 2023

View reviewed changes

Orace requested a review from atifaziz January 17, 2023 17:21

Use TestingSequence in TagFirstLastTest

b200bae

Orace force-pushed the TagFirstLast branch from 11d42ec to b200bae Compare January 17, 2023 17:26

atifaziz approved these changes Jan 17, 2023

View reviewed changes

viceroypenguin suggested changes Jan 17, 2023

View reviewed changes

atifaziz requested changes Jan 17, 2023

View reviewed changes

Fix TestingSequence usages in TagFirstLast tests.

8760100

Orace requested a review from atifaziz January 18, 2023 09:20

atifaziz approved these changes Jan 18, 2023

View reviewed changes

viceroypenguin approved these changes Jan 18, 2023

View reviewed changes

atifaziz merged commit fd10ce0 into morelinq:master Jan 18, 2023

Orace deleted the TagFirstLast branch January 18, 2023 14:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decompose `TagFirstLast` implementation for speed #643

Decompose `TagFirstLast` implementation for speed #643

Orace commented Nov 5, 2019 •

edited

Orace commented Jan 16, 2023

codecov bot commented Jan 16, 2023 •

edited

viceroypenguin left a comment

atifaziz left a comment

viceroypenguin commented Jan 17, 2023

Orace commented Jan 17, 2023

atifaziz commented Jan 17, 2023

Orace commented Jan 17, 2023

atifaziz left a comment

atifaziz left a comment

viceroypenguin left a comment

atifaziz left a comment •

edited

atifaziz left a comment

viceroypenguin left a comment

Decompose TagFirstLast implementation for speed #643

Decompose TagFirstLast implementation for speed #643

Conversation

Orace commented Nov 5, 2019 • edited

Orace commented Jan 16, 2023

codecov bot commented Jan 16, 2023 • edited

Codecov Report

viceroypenguin left a comment

Choose a reason for hiding this comment

atifaziz left a comment

Choose a reason for hiding this comment

viceroypenguin commented Jan 17, 2023

Orace commented Jan 17, 2023

atifaziz commented Jan 17, 2023

Orace commented Jan 17, 2023

atifaziz left a comment

Choose a reason for hiding this comment

atifaziz left a comment

Choose a reason for hiding this comment

viceroypenguin left a comment

Choose a reason for hiding this comment

atifaziz left a comment • edited

Choose a reason for hiding this comment

atifaziz left a comment

Choose a reason for hiding this comment

viceroypenguin left a comment

Choose a reason for hiding this comment

Decompose `TagFirstLast` implementation for speed #643

Decompose `TagFirstLast` implementation for speed #643

Orace commented Nov 5, 2019 •

edited

codecov bot commented Jan 16, 2023 •

edited

atifaziz left a comment •

edited