Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tidy data operations with lists as keys #13

Open
ocramz opened this issue May 4, 2021 · 1 comment
Open

tidy data operations with lists as keys #13

ocramz opened this issue May 4, 2021 · 1 comment
Labels
help wanted Extra attention is needed R&D : library UX

Comments

@ocramz
Copy link
Owner

ocramz commented May 4, 2021

There are two internal details of the library that must be reconciled and the UX must be figured out. How to manipulate list-valued indexing keys, with little boilerplate?

  • On one hand, the generic encoding produces Row values which are keyed by lists (since the original values are flattened into a single trie, collecting record names depth-first)

encode :: (Foldable t, Heidi a) => t a -> Frame (Row [TC] VP)

  • On the other, the relational operations are completely polymorphic in the key type (as long as it's TrieKey from generic-trie, i.e. either a primitive type or a list of such etc.)

https://hackage.haskell.org/package/heidi-0.0.0/docs/Heidi-Data-Frame-Algorithms-GenericTrie.html

innerJoin :: (Foldable t, Ord v, TrieKey k, Eq v, Eq k) => k  -> k  -> t (Row k v)  -> t (Row k v)  -> Frame (Row k v)
@ocramz ocramz added help wanted Extra attention is needed R&D : library UX labels May 4, 2021
@adamConnerSax
Copy link

I think it would be useful to have some combinators to build the bits to use gather, spread, join, etc. from simpler things, especially in the Row [TC] VP case. It should be easy, given a known column, to construct the [TC] key required for the various operations as well as the Set [TC] required for gather (and a more general join? I find multi-key-column join to be pretty useful...).

For example:

gatherSet :: (Functor f, Foldable f) => [Heidi.TC] -> f Text -> Set [Heidi.TC]
gatherSet prefixTC = Set.fromList . Foldable.toList . fmap (\t -> reverse $ Heidi.mkTyN (toString t) : prefixTC)

but this assumes all the TC are only representing "Types". Maybe there should be a TC -> Text function for use in various places where this comes up? Also--sorry, OT--why does TC use String rather than Text?

gatherWith requires k -> v. I think there ought to be some reasonable default implementation to cover various cases. I wrote:

tcKeyToTextValue :: [Heidi.TC] -> Heidi.VP
tcKeyToTextValue tcs = Heidi.VPText $ Text.intercalate "_" $ fmap tcAsText tcs where
  tcAsText tc = let n = Heidi.tcTyN tc in toText $ if null n then Heidi.tcTyCon tc else n

But that raises all the same questions about TC.

I confess I haven't tried these things yet, so some may be more obvious/simpler than they seem. But that's part of the point, maybe. This would all be more approachable for beginners if the powerful stuff was easy to use in the typical cases. Using [TC] as the usual key is confusing (maybe hide behind a newtype?) as is bridging the highly polymorphic functions with the usual case.

I do like that they are so polymorphic! I can imagine, once comfortable with the library, using a different set of keys and values for ease of interoperation with, e.g., Frames or hvega or whatever, but still wanting the set and tidy operations available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed R&D : library UX
Projects
None yet
Development

No branches or pull requests

2 participants