Skip to content

Releases: neon-sunset/RangeExtensions

2.1.1

09 Jan 16:11
Compare
Choose a tag to compare

Changes

🚀 Enhancements

  • Further speed-up vectorized array/span initialization (improves ToList() too) (9a667be)

Loop body comparison:

| Old                           | New                           | New Aarch64                       |
|-------------------------------|-------------------------------|-----------------------------------|
| M02_L01:                      | M02_L01:                      | G_M000_IG04:                      |
|    vmovd     xmm1,r9d         |    vmovupd   [rax],ymm2       |    str     q18, [x1]              |
|    vpbroadcastd ymm1,xmm1     |    vmovupd   [rax+20],ymm0    |    str     q16, [x1, #0x10]       |
|    vpaddd    ymm1,ymm1,ymm0   |    vpaddd    ymm2,ymm2,ymm1   |    add     v18.4s, v18.4s, v17.4s |
|    add       r9d,r11d         |    vpaddd    ymm0,ymm0,ymm1   |    add     v16.4s, v16.4s, v17.4s |
|    vmovd     xmm2,r9d         |    add       rax,40           |    add     x1, x1, #32            |
|    vpbroadcastd ymm2,xmm2     |    add       r9d,10           |    add     w5, w5, #8             |
|    vpaddd    ymm2,ymm2,ymm0   |    cmp       r11d,r9d         |    cmp     w4, w5                 |
|    add       r9d,r11d         |    jg        short M02_L01    |    bgt     G_M000_IG04            |
|    movsxd    r10,esi          |                               |                                   |
|    vmovupd   [rax+r10*4],ymm1 |                               |                                   |
|    lea       r10d,[rsi+8]     |                               |                                   |
|    movsxd    r10,r10d         |                               |                                   |
|    vmovupd   [rax+r10*4],ymm2 |                               |                                   |
|    add       esi,10           |                               |                                   |
|    cmp       edi,esi          |                               |                                   |
|    jg        short M02_L01    |                               |                                   |
|-------------------------------|-------------------------------|-----------------------------------|

🗒️ Notes

If you are interested, make sure to check the past two releases for more details!
https://github.com/neon-sunset/RangeExtensions/releases/tag/2.0.0
https://github.com/neon-sunset/RangeExtensions/releases/tag/2.1.0

Full Changelog: 2.1.0...2.1.1

Published with dotnet-releaser

2.1.0

08 Jan 03:08
6eb5428
Compare
Choose a tag to compare

Changes

🚀 Enhancements

  • SIMDify array/span initialization which improves performance by appx. 2-4x depending on supported vector width (PR #16)
  • Improve .ToList() performance by 2x via directly passing RangeEnumerable to List<int> constructor which subsequently calls ICollection.ToArray(...)
BenchmarkDotNet=v0.13.3, OS=macOS 13.1 (22C65) [Darwin 22.2.0]
AMD Ryzen 7 5800X 3.80GHz, 1 CPU, 8 logical and 8 physical cores
.NET SDK=8.0.100-alpha.1.23056.11
  [Host]     : .NET 8.0.0 (8.0.23.5503), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.0 (8.0.23.5503), X64 RyuJIT AVX2
Method Length Mean Error Ratio Allocated
RangeToArray 10 8.249 ns 0.1493 ns 1.00 64 B
EnumerableToArray 10 15.416 ns 0.1819 ns 1.87 104 B
RangeToList 10 18.367 ns 0.1284 ns 2.23 120 B
EnumerableToList 10 22.675 ns 0.4197 ns 2.75 136 B
RangeSelectToArray 10 14.934 ns 0.1617 ns 1.81 96 B
EnumerableSelectToArray 10 27.146 ns 0.2635 ns 3.30 152 B
RangeToArray 100 19.584 ns 0.1200 ns 1.00 424 B
EnumerableToArray 100 44.881 ns 0.2353 ns 2.29 464 B
RangeToList 100 29.633 ns 0.2484 ns 1.51 480 B
EnumerableToList 100 97.329 ns 0.7041 ns 4.97 496 B
RangeSelectToArray 100 62.172 ns 0.6928 ns 3.18 456 B
EnumerableSelectToArray 100 75.454 ns 0.8087 ns 3.85 512 B
RangeToArray 10000 690.475 ns 10.5986 ns 1.00 40024 B
EnumerableToArray 10000 3,285.207 ns 22.7170 ns 4.76 40064 B
RangeToList 10000 1,231.378 ns 11.5101 ns 1.78 40080 B
EnumerableToList 10000 7,269.712 ns 75.8317 ns 10.52 40096 B
RangeSelectToArray 10000 5,024.135 ns 49.6868 ns 7.27 40056 B
EnumerableSelectToArray 10000 5,044.309 ns 67.8153 ns 7.30 40112 B

📦 Dependencies

  • Bump BenchmarkDotNet from 0.13.2 to 0.13.3 (PR #15) by @dependabot (bot)

🧰 Misc

Full Changelog: 2.0.0...2.1.0

Published with dotnet-releaser

2.0.0

23 Dec 04:50
234da64
Compare
Choose a tag to compare

Changes

✨ New Features

Select and Where

This release expands the list of Range extensions with several convenience methods that bring it much closer to counterparts in Rust, Python and others.

var floats = (0..100).Select(i => (float)i);
var odd = (0..100).Where(i => i % 2 != 0);

var randomNumbers = (0..1000)
    .Select(_ => Random.Shared.Next())
    .ToArray();

Iterating through these directly is significantly faster when compared to Enumerable.Range.
In fact, with DynamicPGO enabled, on osx-arm64 the loop foreach (var i in (0..1000).Select(i => i * 2)) gets within 1.3-1.5x ratio of a regular plain for loop. Delegate devirtualization has gotten so good!

IList on RangeEnumerable and SelectRange

Now both these types also implement IList.
While RangeEnumerable is effectively a materialized sequence of numbers, SelectRange serves a purpose of "Select View" over that sequence, with exact T values materialized on access similar to plain IEnumerable<T>.

While this does not conform 100% to existing semantics, it was a necessary change to take the advantage of IEnumerable internals for methods that lack bespoke implementations in this library or when RangeExtensions enumerators are boxed.

Bespoke Aggregate, First, Last and more

The library now exposes additional bespoke LINQ method implementations for shorthand usage together with Range that further improve its usability in functional scenarios.

var digits = (0..10)
    .Aggregate(new StringBuilder(), (sb, i) => sb.Append(i))
    .ToString();

Assert.Equal("0123456789", digits);

// None of these allocate, and all of them are O(1)
var fifteenthDecimal = (0..1337)
    .Select(i => (decimal)i)
    .Take(10)
    .ElementAt(4); // or even just [4]

// Efficient scan from the end of the range
var lastEven = (0..1337)
    .Where(i => i % 2 is 0)
    .Last();

Further optimizations on RangeEnumerable

Previously,Range was nested inside of RangeEnumerable only being unwrapped when creating an enumerator.
This was complicating the job for the JIT code generation and preventing it from seeing that we have already verified a few conditions like whether the range is valid and doesn't start from end for either of the indexes, whether its start value is greater or equal to zero, etc (depends on particular loop used with range).

In addition, this approach was causing further issues when RangeEnumerable was boxed or nested in either RangeSelect or RangeWhere which made the initial prototype have performance barely better or sometimes worse than private implementations of Select and Where iterators from BCL.

Therefore, the best solution turned out to be the most straightforward and likely intellectually disappointing to people who read release notes till the end.

First and foremost, reinterpret-casting Indexes nested inside of Range into ints allowed for better constant propagation and dead branch elimination reducing the size and improving the quality of codegen for precondition checks on an enumerable/enumerator creation.

Next, by flattening all nested structs into their parent holders i.e. storing int _start; int _end; instead of Range and copying RangeEnumerable implementation bits to RangeSelect and RangeWhere instead of nesting RangeEnumerable, it was possible to achieve pretty good enregistering of struct fields, effectively removing the abstraction and turning them into locals.

This in combination with delegate inlining (requires .NET 7 and DynamicPGO) allowed to bring the performance of direct foreach (var i in (0..Length).Select(i => i * 2)) loops to within 30% of plain for loop without any delegates. This also applies to (0..100).Aggregate(...) methods and any other that uses LINQ methods to produce a value functional-style.

However, I decided to postpone (or maybe it was small brain cope?) correctly abstracting away duplicate code in .SpeedOpt.cs via either generics or templating/code generation to deliver the update as is. If you are interested, I will be happy to review and accept a PR that closes #14

📦 Dependencies

  • Bump Microsoft.NET.Test.Sdk from 17.3.1 to 17.3.2 (PR #9) by @dependabot (bot)
  • Bump coverlet.collector from 3.1.2 to 3.2.0 (PR #11) by @dependabot (bot)
  • Bump Microsoft.NET.Test.Sdk from 17.3.2 to 17.4.0 (PR #12) by @dependabot (bot)
  • Bump Microsoft.NET.Test.Sdk from 17.4.0 to 17.4.1 (PR #13) by @dependabot (bot)

Full Changelog: 1.2.2...2.0.0

Published with dotnet-releaser

1.2.2

03 Sep 12:02
89df618
Compare
Choose a tag to compare

Changes

🐛 Bug Fixes

  • Fix a typo in ThrowHelpers.EmptyRange() message. (89df618)

📦 Dependencies

  • Bump MinVer from 4.1.0 to 4.2.0 (PR #6) by @dependabot (bot)
  • Bump BenchmarkDotNet from 0.13.1 to 0.13.2 (PR #7) by @dependabot (bot)
  • Bump Microsoft.NET.Test.Sdk from 17.3.0 to 17.3.1 (PR #8) by @dependabot (bot)

Full Changelog: 1.2.1...1.2.2

Published with dotnet-releaser

1.2.1

13 Aug 05:54
Compare
Choose a tag to compare

Changes

🐛 Bug Fixes

  • Fix displayed coverage (PR #5)
  • Fix gap in tests (ebf4883)

📦 Dependencies

  • Bump Microsoft.NET.Test.Sdk from 17.2.0 to 17.3.0 (PR #4) by @dependabot (bot)

🧰 Misc

  • Re-enable Coveralls publishing because it has been fixed (2248bda)
  • Complete coverage to 100% (8d6f42b)
  • Update description (f453133)

Full Changelog: 1.2.0...1.2.1

1.2.0

10 Aug 08:00
Compare
Choose a tag to compare

Changes

✨ New Features

  • Add Span overloads for CopyTo/TryCopyTo for fun (6af9719)
  • Implement ICollection on RangeEnumerable for performance (4318e63)

🐛 Bug Fixes

  • Fix ICollection.CopyTo implementation and complete test coverage (c71c076)

📦 Dependencies

🧰 Misc

  • Make tests compatible with XUnit 2.4.2 (76970d0)
  • Update README, slighly improve Sum() codegen and replace Count() with Count where appropriate (2b3efb6)

Full Changelog: 1.1.0...1.2.0

1.1.0

28 Jul 14:32
Compare
Choose a tag to compare

Changes

✨ New Features

  • Add rangeEnumerable.Sum()

🧰 Misc

  • Further optimizations, cleanup and additional tests (482e7b5)

Full Changelog: 1.0.0...1.1.0

1.0.0

16 Jul 18:36
Compare
Choose a tag to compare

✨Initial Release!

Full Changelog: 6ad37f19b58b2776f800017425ea51f58122e2a4...1.0.0