Overhaul #1

BurningWitness · 2024-04-14T18:33:31Z

This is a full rewrite of the library, featuring, in order of implementation:

Data.Patricia.Word.Strict: this is the same definitionally as IntMap, the only meaningful differences are that it uses Words as keys directly and that the API is way more frugal than in containers (also has a few cool functions, like range operations).

This and every single other tree in the library shamelessly reuses the findings of the recent Optimize IntMap.Bin haskell/containers#995 PR for performance;
Data.Patricia.Word.Lazy: slight variation on IntMap, spine-lazy. Almost all the functions are the same as for the strict version though;
Data.Zebra.Word: IntSet, but instead of storing packed keys it stores space partitions.

Technically this is a 2^64-wide bit array.
Data.Radix1Tree.Word8.Strict: a spine-strict radix tree with non-empty byte sequences as keys.

The internal implementation is similar to bytestring-trie, but without "non-trivial invariants". The tree does not have a concrete key type; instead, there are two types of keys: Feed for breaking an input key down into bytes (inlines for performance) and Build to reshape stored keys into new formats.

Implements all the same functions as Data.Patricia.Word.Strict plus some more, with the exception of Range ones because those functions are gargantuan for radix trees, so they're both a pain in the butt to write and probably would not inline either way.
Data.RadixTree.Word8.Strict: an extension to Radix1Tree that allows storing a value at an empty byte sequence.

The entire interface is duplicated from the Radix1Tree, with the help of inlining.
Data.RadixTree.Word8.Lazy and Data.RadixTree.Word8.Strict: spine-lazy radix trees. Same stuff as with Patricia trees: lots of code duplication, different asymptotics.

These can store values at infinitely long keys, provided you never access them, how droll.

All of this is tested (for result-correctness, but no introspection testing), all of this is documented, none of this is benchmarked (I don't know what case to target, so I didn't bother).

Note that while this theoretically should support GHC 8.4 through 8.10, this is only reachable using a custom build due to the template-haskell >= 2.17 dependency, and as such that has not been tested.

and avoiding `showHex` so as to not have to support an eldritch Show constraint in previous versions

Note that versions before GHC 9.0 cannot be tested in CI due to the template-haskell > 2.17 lower bound.

c.f. sergv/radix-tree#1 $ stack build Error: While constructing the build plan, the following exceptions were encountered: In the dependencies for radix-tree-1.0.0.0: template-haskell-2.16.0.0 from stack configuration does not match >=2.17 && <3 (latest matching version is 2.19.0.0) needed due to countwordsHS-0.1.0.0 -> radix-tree-1.0.0.0

sergv · 2024-04-16T19:25:58Z

This is great stuff, I'm ready to merge. I can as well add you to github/hackage maintainers so you can release too, if you wish.

Would you consider a couple of suggestions in the meantime?

sergv · 2024-04-16T19:27:32Z

src/Data/RadixNTree/Word8/Lazy.hs

+
+
+{-# INLINE splitL0 #-}
+splitL0 :: Openness -> Feed -> RadixTree a -> (RadixTree a, RadixTree a)


For strict radix trees the signature is

splitL0 :: Openness -> Feed -> RadixTree a -> Split a a

Would it make sense to copy what containers do here and make strict version of splitL0 also return lazy tuple so that users can switch with no cost? Strict version will be a wrapper around working returning strict tuple and just repackage it into a lazy tuple.

The introduction of Split and SplitLookup was a very late change from my side, the thing that irked me was that the strict splits went through the trouble of creating two completely strict values and then at the very end threw in two boxes in the form of a lazy 2-tuple. I as such prefer a strict tuple because I expect all results in the strict module to be strict.

copy what containers do ... so that users can switch with no cost

Lazy and Strict in this library do not mean the same thing as they do in containers. containers' definition is "everything in Lazy does not evaluate the value before putting into the tree, everything in Strict does", but their underlying implementation is spine-strict in both cases.

The difference between a split on a spine-lazy tree and a split on a spine-strict one is massive, so I don't see the need to have the two be directly interchangeable.

sergv · 2024-04-16T19:29:54Z

src/Data/Zebra/Word/Internal.hs

+           | White
+             deriving (Show, Eq)
+
+invert :: Color -> (# Color #)


It seems that returning singleton strict tuples in cases like this could be avoided at no cost? Do these singleton tuples serve some purpose, in particular for the invert function?

There's a nice explanation for what unboxed 1-tuples do on StackOverflow. For this particular case the the idea is that allocations for Black and White are shared, so there never is a need to evaluate the results to WHNF.

Had a little back-and-forth about it here, final STG seems to align with my expectations.

src/Data/RadixNTree/Word8/Lazy.hs

sergv · 2024-04-16T19:32:47Z

NB I'm not particularly worried about supporting GHCs before 9.2, if it works with them then great, if not then let them be.

sergv · 2024-04-16T19:38:19Z

src/Data/RadixNTree/Word8/Strict.hs

+
+
+-- | Result of a tree split.
+data Split l r = Split !(RadixTree l) !(RadixTree r)


It seems this is just a strict pair from the strict package. There's also strict Maybe in there so it would make senso switch.

This echoes another suggestion about unifying function types between lazy and strict versions of the API. Even if no unification is to take place, returning strict tuple from strict package seems better thar returning custom type isomorphic to it - the strict tuple has helpers to work with it whereas types like Split don't.

Those helper functions will be pretty useless when it turns out your code looks like

let !(Radix.Split l r) = Radix.split _ _ a in _

At the end of the day this is just the return type for a bunch of functions in this library and if I want to unpack or unbang something later on, I want to have that power. Also means we don't have to depend on three extra libraries.

There are broader points here of Haskell suffering from dependency hells and people shunning custom types, but this isn't a good time to rant about it.

sergv · 2024-04-16T22:04:22Z

I've added you to github collaborators for thi srepo. If you want to become Hackage maintainer so you can release then please provide your hackage username.

BurningWitness · 2024-04-20T08:38:31Z

The maintaner name is OleksiiDivak.

sergv · 2024-04-20T11:40:44Z

I've added you.

jcranch · 2024-04-21T21:38:42Z

Looks great! Are you willing to add it to Stackage? It is preferred that the package maintainer do so, though I am willing to do so if that would be a problem for you.

BurningWitness · 2024-04-22T13:24:21Z

@jcranch I personally don't care about Stackage, but I'm fine with anyone taking that responsibility, especially if they're already maintaining some other package there. I don't expect this package to break much (few dependencies, very wide bounds), so it shouldn't be a nuisance.

This is not my package, but I have explicit permission from a maintainer: sergv/radix-tree#1 (comment)

BurningWitness added 6 commits April 14, 2024 20:26

Initial

0dd44b9

Purging unused integer show functions

ee1650a

and avoiding `showHex` so as to not have to support an eldritch Show constraint in previous versions

Using GHCup instead of hvr's old database.

e13a1fe

Note that versions before GHC 9.0 cannot be tested in CI due to the template-haskell > 2.17 lower bound.

Fixing CI (1/?)

1102415

Fixing CI (2/?)

749024b

Removing unused Numeric import

f865935

unhammer mentioned this pull request Apr 15, 2024

The fastest libraries are missing: vector-hashtables, clutter, finite haskell-perf/dictionaries#15

Open

Documentation tweaks

73e264d

sergv reviewed Apr 16, 2024

View reviewed changes

src/Data/RadixNTree/Word8/Lazy.hs Outdated Show resolved Hide resolved

sergv reviewed Apr 16, 2024

View reviewed changes

Fix typo

524aa19

sergv merged commit 7470d34 into sergv:master Apr 16, 2024

BurningWitness deleted the overhaul branch April 22, 2024 13:31

jcranch added a commit to jcranch/stackage that referenced this pull request Apr 22, 2024

Add radix-tree

58cda2c

This is not my package, but I have explicit permission from a maintainer: sergv/radix-tree#1 (comment)

jcranch mentioned this pull request Apr 22, 2024

Add radix-tree commercialhaskell/stackage#7390

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overhaul #1

Overhaul #1

BurningWitness commented Apr 14, 2024 •

edited

sergv commented Apr 16, 2024

sergv Apr 16, 2024

BurningWitness Apr 16, 2024

sergv Apr 16, 2024

BurningWitness Apr 16, 2024

sergv commented Apr 16, 2024

sergv Apr 16, 2024

BurningWitness Apr 16, 2024

sergv commented Apr 16, 2024

BurningWitness commented Apr 20, 2024

sergv commented Apr 20, 2024

jcranch commented Apr 21, 2024

BurningWitness commented Apr 22, 2024 •

edited



		{-# INLINE splitL0 #-}
		splitL0 :: Openness -> Feed -> RadixTree a -> (RadixTree a, RadixTree a)



		-- \| Result of a tree split.
		data Split l r = Split !(RadixTree l) !(RadixTree r)

Overhaul #1

Overhaul #1

Conversation

BurningWitness commented Apr 14, 2024 • edited

sergv commented Apr 16, 2024

sergv Apr 16, 2024

Choose a reason for hiding this comment

BurningWitness Apr 16, 2024

Choose a reason for hiding this comment

sergv Apr 16, 2024

Choose a reason for hiding this comment

BurningWitness Apr 16, 2024

Choose a reason for hiding this comment

sergv commented Apr 16, 2024

sergv Apr 16, 2024

Choose a reason for hiding this comment

BurningWitness Apr 16, 2024

Choose a reason for hiding this comment

sergv commented Apr 16, 2024

BurningWitness commented Apr 20, 2024

sergv commented Apr 20, 2024

jcranch commented Apr 21, 2024

BurningWitness commented Apr 22, 2024 • edited

BurningWitness commented Apr 14, 2024 •

edited

BurningWitness commented Apr 22, 2024 •

edited