Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul #1

Merged
merged 8 commits into from Apr 16, 2024
Merged

Overhaul #1

merged 8 commits into from Apr 16, 2024

Conversation

BurningWitness
Copy link
Collaborator

@BurningWitness BurningWitness commented Apr 14, 2024

This is a full rewrite of the library, featuring, in order of implementation:

  • Data.Patricia.Word.Strict: this is the same definitionally as IntMap, the only meaningful differences are that it uses Words as keys directly and that the API is way more frugal than in containers (also has a few cool functions, like range operations).

    This and every single other tree in the library shamelessly reuses the findings of the recent Optimize IntMap.Bin haskell/containers#995 PR for performance;

  • Data.Patricia.Word.Lazy: slight variation on IntMap, spine-lazy. Almost all the functions are the same as for the strict version though;

  • Data.Zebra.Word: IntSet, but instead of storing packed keys it stores space partitions.

    Technically this is a 2^64-wide bit array.

  • Data.Radix1Tree.Word8.Strict: a spine-strict radix tree with non-empty byte sequences as keys.

    The internal implementation is similar to bytestring-trie, but without "non-trivial invariants". The tree does not have a concrete key type; instead, there are two types of keys: Feed for breaking an input key down into bytes (inlines for performance) and Build to reshape stored keys into new formats.

    Implements all the same functions as Data.Patricia.Word.Strict plus some more, with the exception of Range ones because those functions are gargantuan for radix trees, so they're both a pain in the butt to write and probably would not inline either way.

  • Data.RadixTree.Word8.Strict: an extension to Radix1Tree that allows storing a value at an empty byte sequence.

    The entire interface is duplicated from the Radix1Tree, with the help of inlining.

  • Data.RadixTree.Word8.Lazy and Data.RadixTree.Word8.Strict: spine-lazy radix trees. Same stuff as with Patricia trees: lots of code duplication, different asymptotics.

    These can store values at infinitely long keys, provided you never access them, how droll.

All of this is tested (for result-correctness, but no introspection testing), all of this is documented, none of this is benchmarked (I don't know what case to target, so I didn't bother).

Note that while this theoretically should support GHC 8.4 through 8.10, this is only reachable using a custom build due to the template-haskell >= 2.17 dependency, and as such that has not been tested.

  and avoiding `showHex` so as to not have to support an eldritch Show constraint
  in previous versions
  Note that versions before GHC 9.0 cannot be tested in CI due to the
  template-haskell > 2.17 lower bound.
unhammer added a commit to unhammer/countwords that referenced this pull request Apr 15, 2024
c.f. sergv/radix-tree#1

$ stack build

Error: While constructing the build plan, the following exceptions were encountered:

In the dependencies for radix-tree-1.0.0.0:
    template-haskell-2.16.0.0 from stack configuration does not match >=2.17 && <3  (latest matching version is 2.19.0.0)
needed due to countwordsHS-0.1.0.0 -> radix-tree-1.0.0.0
@sergv
Copy link
Owner

sergv commented Apr 16, 2024

This is great stuff, I'm ready to merge. I can as well add you to github/hackage maintainers so you can release too, if you wish.

Would you consider a couple of suggestions in the meantime?



{-# INLINE splitL0 #-}
splitL0 :: Openness -> Feed -> RadixTree a -> (RadixTree a, RadixTree a)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For strict radix trees the signature is

splitL0 :: Openness -> Feed -> RadixTree a -> Split a a

Would it make sense to copy what containers do here and make strict version of splitL0 also return lazy tuple so that users can switch with no cost? Strict version will be a wrapper around working returning strict tuple and just repackage it into a lazy tuple.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The introduction of Split and SplitLookup was a very late change from my side, the thing that irked me was that the strict splits went through the trouble of creating two completely strict values and then at the very end threw in two boxes in the form of a lazy 2-tuple. I as such prefer a strict tuple because I expect all results in the strict module to be strict.

copy what containers do ... so that users can switch with no cost

Lazy and Strict in this library do not mean the same thing as they do in containers. containers' definition is "everything in Lazy does not evaluate the value before putting into the tree, everything in Strict does", but their underlying implementation is spine-strict in both cases.

The difference between a split on a spine-lazy tree and a split on a spine-strict one is massive, so I don't see the need to have the two be directly interchangeable.

| White
deriving (Show, Eq)

invert :: Color -> (# Color #)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that returning singleton strict tuples in cases like this could be avoided at no cost? Do these singleton tuples serve some purpose, in particular for the invert function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a nice explanation for what unboxed 1-tuples do on StackOverflow. For this particular case the the idea is that allocations for Black and White are shared, so there never is a need to evaluate the results to WHNF.

Had a little back-and-forth about it here, final STG seems to align with my expectations.

@sergv
Copy link
Owner

sergv commented Apr 16, 2024

NB I'm not particularly worried about supporting GHCs before 9.2, if it works with them then great, if not then let them be.



-- | Result of a tree split.
data Split l r = Split !(RadixTree l) !(RadixTree r)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this is just a strict pair from the strict package. There's also strict Maybe in there so it would make senso switch.

This echoes another suggestion about unifying function types between lazy and strict versions of the API. Even if no unification is to take place, returning strict tuple from strict package seems better thar returning custom type isomorphic to it - the strict tuple has helpers to work with it whereas types like Split don't.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those helper functions will be pretty useless when it turns out your code looks like

let !(Radix.Split l r) = Radix.split _ _ a
in _

At the end of the day this is just the return type for a bunch of functions in this library and if I want to unpack or unbang something later on, I want to have that power. Also means we don't have to depend on three extra libraries.


There are broader points here of Haskell suffering from dependency hells and people shunning custom types, but this isn't a good time to rant about it.

@sergv sergv merged commit 7470d34 into sergv:master Apr 16, 2024
@sergv
Copy link
Owner

sergv commented Apr 16, 2024

I've added you to github collaborators for thi srepo. If you want to become Hackage maintainer so you can release then please provide your hackage username.

@BurningWitness
Copy link
Collaborator Author

The maintaner name is OleksiiDivak.

@sergv
Copy link
Owner

sergv commented Apr 20, 2024

I've added you.

@jcranch
Copy link

jcranch commented Apr 21, 2024

Looks great! Are you willing to add it to Stackage? It is preferred that the package maintainer do so, though I am willing to do so if that would be a problem for you.

@BurningWitness
Copy link
Collaborator Author

BurningWitness commented Apr 22, 2024

@jcranch I personally don't care about Stackage, but I'm fine with anyone taking that responsibility, especially if they're already maintaining some other package there. I don't expect this package to break much (few dependencies, very wide bounds), so it shouldn't be a nuisance.

@BurningWitness BurningWitness deleted the overhaul branch April 22, 2024 13:31
jcranch added a commit to jcranch/stackage that referenced this pull request Apr 22, 2024
This is not my package, but I have explicit permission from a maintainer:
sergv/radix-tree#1 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants