Skip to content
This repository has been archived by the owner on Oct 16, 2018. It is now read-only.

PostingsList Refactoring #65

Open
prateek opened this issue May 13, 2018 · 0 comments
Open

PostingsList Refactoring #65

prateek opened this issue May 13, 2018 · 0 comments

Comments

@prateek
Copy link
Collaborator

prateek commented May 13, 2018

We've got a quite a few data points indicating we should re-work our postings list implementation/API, they are:

  • Support for postings.ID to be uint64 (Revisit segment.DocID as a uint64 #12)
  • Benchmark numbers showing upstream Roaring is worse than Pilosa
  • We're using Pilosa's serialisation mechanism for the FST segment, as a result are having to double serialise/deserialise when accessing data. i.e. on write it goes from our roaring implementation -> pilosa's -> []byte. And reverse that order for reads. Can avoid an extra hop by just using Pilosa directly.

The points above clearly indicate we should use Pilosa instead of upstream roaring. Further, when re-working the APIs we should consider:

  • Our current postings lists implementation currently wraps all access in a RWMutex. This made sense when all we had was a mutable segment. But it's not required when we've got immutable mmap'd data.
  • Impact of immutability semantics on the postings list itself: the postings list in mutable segments only need Add() as a mutator.
  • Pooling of postings lists: upstream roaring's implementation does not lend itself to pooling. We should revisit this with Pilosa, and immutability (i.e. if we can pool cheaply, an immutable API would be even more tempting).
  • Implications of Seek() ahead, which is required for efficient posting list merging at query time.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant