Join GitHub today
Collections API #1606
It looks like a lot of code is shared between the collection readers and rdd readers in S3, File, Hadoop, and Cassandra. HBase and Accumulo are exceptions because they have
Reading a collection seems exactly like reading a partition. Maybe we can abstract all of that code into something that reads just the ranges and the splitting/filtering of those ranges can be up to the whichever reader. I'm mostly concerned because this async code is very fiddly and error-prone. I can see us having to refactor it again when we learn something new.
This is the list of places where it looks like we should use
They can not be used in:
.. because one or another those backends read ranges rather than (K,V). I think that's fine, we don't need to abstract over that.