Skip to content
robsimmons edited this page Oct 19, 2011 · 5 revisions

Karl Crary has worked on a NJ-lib replacement library that I (Rob Simmons) plan to hack on for hackday to where it can be released; I think some other CMU people (including Karl) plan to do the same.

What's there now

Typeclasses (as signatures, modular-typeclasses style):

  • HASHABLE (see: HASH_KEY in njlib)
  • ORDERED (see: ORD_KEY njlib)
  • STREAMABLE

Algorithmic basics:

  • SplayTree and RedBlackTree - BST implementations
  • UnionFind (Union-find sets)
  • JenkinsHash and MJHash - incremental hash functions
  • Mergesort

Basic structures:

  • SET (sets: implementations as lists, splay trees, and red-black trees)
  • DICT (dictionaries or maps: same as lists)
  • Susp (memoization; uses the implementation's suspensions if possible)
  • Stream (memoizing streams)
  • Queue and IQueue (functional and imperative queues)
  • PSusp and PStream (pollable streams)
  • Coroutine

Implementation-compatibility layer

  • Weak - weak pointers
  • Cont - continuations

Niceties

  • Pos - position-in-file library
  • Symbol - internalized strings (or whatever implements HASHABLE) with constant-time comparison

What should be added

  • PRINTABLE - for toString-enabled types? Is this needed for anything?
  • More MLton-specific implementations (possibly also)
  • Documentation, documentation, documentation - the problem with NJlib is documentation.

CMLib is essentially an NJLib replacement; there's also the possibility of doing more basis library revision/replacemet. The MLton replacement basis is probably the best source for the latter, whereas tsuyoshi's extended basis might be the best inspiration/code source for the former (https://github.com/standardml/extended-basis).

210-lib compatibility

Another library project at CMU has been developed by the 15-150 and 15-210 course staffs. Karl's lib is called "CMlib", but I'll call it "Karl-lib" in this part of document in order to distinguish it from the merger I'd like to consider creating; I'll refer to the collected course-staff-generated code as "210-lib."

The argument for a merger is, in my mind, primarily that it creates a broader userbase for the combined CMlib - the goal should be to make 15-210 students able to use CMlib as the basis for their projects post-210, and make it very easy for the documentation from 15-210 to help people use CMlib, and also make it easy for 15-210 staff to "backport" things from CMlib to 210-lib. A more ambitious goal would be to have CMlib able to become 210-lib, but after looking at the code I'm not convinced that will work out.

The path I'm suggesting leaves both libraries relatively unchanged; I'm essentially suggesting adding a "210-defaults.sml" file to Karl-lib that will open up a default environment that is mostly or completely inter-operable with the environment people get when they're using 210-lib with "defaults.sml".

Suggested changes to Karl-lib

  • QUASILIST and Quasilist should be able to be completely subsumed by 210Lib's SEQUENCE, which should be integrated (along with an implementation of Sequence that is along the lines of Karl-lib's Quasilist), and QUASILIST should be removed.
  • Lists are currently the default sequence type - I think it is desirable to generalize this to use sequences, and change things like the dictionaries accordingly
  • Add 210-lib's Treaps
  • Add the typeclass-signature EQKEY (possibly rename just EQ, since there's nothing particularly key-like about it)?

Suggested changes to 210-lib

(Note - a suggested change in 210-lib doesn't actually have to be made to 210-lib; it would just represent an incompatibility between CMlib and 210-lib.)

  • Both libraries use a "typeclass-like" approach, but it would be confusing and inconsistent to have the signature HASHKEY in 210-lib exist as a union of HASHABLE and ORDERED. Call this ORDERED_HASH, perhaps? Also see note about EQ above.

DICT and TABLE

Karl-lib's DICT and Dict are broadly similar but incompatible relative to 210-lib's TABLE and BSTTable; both accomplish the same thing. Both DICT and TABLE are an improvement on MAP, which is increasingly used as a verb.

I started out wanting to merge these libraries, but I quickly despaired; the design philosophies are too subtly different. Instead, I want to suggest the following line of attack.

  • Change the Karl-lib thing I called TABLE to ITABLE (it's a persistent table, like for a symbol table in a compiler)
  • Generalize DICT to sequences, not lists
  • Look at what functionality can be transported usefully from TABLE to DICT
  • Make a functor TableFn(Dict : DICT) that provides a 210-lib 'view' to dictionaries. There's no reason not to have a functor MapFn(Dict : DICT) that provides a NJ-lib style interface to DICTs as well as well.