Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sorted collections #501

Merged
merged 77 commits into from
Sep 15, 2021
Merged

Add sorted collections #501

merged 77 commits into from
Sep 15, 2021

Conversation

jackfirth
Copy link
Owner

@jackfirth jackfirth commented Apr 27, 2021

This is a work-in progress change that will add mutable and immutable sorted collections to Rebellion, including:

  • mutable sorted sets
  • persistent immutable sorted sets
  • mutable sorted maps
  • persistent immutable sorted maps
  • mutable range sets
  • extending the existing immutable range sets to support persistence
  • mutable range maps
  • persistent immutable range maps

The immutable collections will also come with builders, which support the use case of building a collection up with many writes before any reads are performed (such as in a stream pipeline, e.g. (transduce numbers #:into (into-sorted-set natural<=>)). When created with builders, the immutable collections will not use a persistent representation under the hood to support constant-time random access and efficient iteration. Instead, the persistent representation will be created lazily the first time an immutable collection is used with the persistent update functions such as sorted-set-add. This combines the best of both worlds: use cases that don't need to interleave reads and writes get efficient random-access immutable collections, and use cases that do have their immutable collections upgraded to persistent collections automatically.

Lots of implementation work left to do. In particular, I still need to figure out how to represent subset views efficiently, especially for mutable collections. Also, the mutable collections won't be thread-safe but will provide fail-fast behavior on concurrent modification during iteration, like java's ConcurrentModificationException. This is too useful for detecting data races to ignore. In the future I may consider offering alternative thread-safe implementations, but the default mutable collection implementations won't change.

@jackfirth jackfirth added the enhancement New feature or request label Apr 27, 2021
@jackfirth jackfirth added this to In progress in Collections library via automation Apr 27, 2021
Planning to reuse this for other sorted collections.
Insertion is implemented. Removal and membership testing aren't yet.
Nothing's implemented yet. Planning on using Okasaki's implementation in Purely Functional Data Structures.
Mutable sorted sets use mutable red-black trees, immutable sorted sets use persistent red-black trees. As an optimization, when an immutable sorted set is created with a builder its elements are placed into an immutable vector and the tree structure (which requires more memory and indirections, but offers persistence) is created lazily. This way, immutable sorted sets that are built from stream pipelines never have to pay the costs of using persistent data structures and get O(1) random access, indirection-free search, and much faster iteration speeds.
Also, move the `cut` data type into its own private module.
This will replace the implementation currently in `rebellion/collection/sorted-set` eventually.
@jackfirth
Copy link
Owner Author

jackfirth commented Aug 27, 2021

Remaining tasks for sorted sets:

  • Implement persistent red-black tree element removal
  • Add unmodifiable-sorted-set wrapper for read-only access to mutable sorted sets
  • Add synchronized-sorted-set wrapper for thread-safe access to mutable sorted sets
  • Add managed-sorted-set wrapper for kill-safe access to mutable sorted sets
  • Implement concurrent modification during iteration detection
  • Add sorted-set-impersonate for chaperones and impersonators
  • Prevent out-of-bounds insertion into mutable sorted subsets
  • Add sorted-set/c contract combinator
  • Prevent passing ranges to sorted-subset that don't use the same comparator as the sorted set
  • Ensure sorted-set-reverse is an involution
  • Ensure sorted-subset with unbounded-range is an identity operation
  • Ensure sorted-subset is idempotent
  • Ensure unmodifiable-sorted-set is idempotent
  • Ensure synchronized-sorted-set is idempotent
  • Ensure managed-sorted-set is idempotent

This is needed for the synchronized collection views. Not really tested for now since I'm mainly testing it via the synchronized sorted set implementation. Needs thorough tests Eventually™.
Handy version of sequence->sorted-set that doesn't require knowing the comparator.
@jackfirth
Copy link
Owner Author

I'm going to merge this as-is since it's already huge. All the basics for sorted sets are there sans detection of modification during iteration. I'll work on sorted maps next, since multisets, range sets, and range maps are all just views of sorted maps.

@jackfirth jackfirth marked this pull request as ready for review September 15, 2021 05:50
@jackfirth jackfirth merged commit ca733d0 into master Sep 15, 2021
@jackfirth jackfirth deleted the sorted-collections branch September 15, 2021 05:50
Collections library automation moved this from In progress to Done Sep 15, 2021
@jackfirth jackfirth mentioned this pull request Sep 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

Successfully merging this pull request may close these issues.

None yet

1 participant