streams & change notifications #24

dch · 2020-11-27T12:54:02Z

Have you any thoughts on adding a lazy stream interface, or a way to allow processes to subscribe to changed entries?

lucaong · 2020-12-10T17:12:57Z

Hi @dch ,
the select/3 function uses lazy streams internally, so if you do, for example, something like:

CubDB.select(db, pipe: [
  map: fn {key, value} -> value end,
  filter: fn x -> Integer.is_even(x) end
  map: fn x -> x * 2 end,
  take: 3
])

The transformations above would be executed as a lazy stream (and, in particular, only maximum 3 entries will be read from disk and processed, because of the final take: 3).

It would be nice to have an interface such as:

# Note: this is NOT the way CubDB actually works, just illustrating a point
CubDB.select() |> Stream.map(fn x -> ... end) |> Stream.filter(...) |> ...

But, unfortunately, it would be tricky and error prone. The issue is the following: remember that CubDB allows you to perform concurrent reads and writes, so you can be executing a long running select while some other process is performing writes. This is because the select runs against a zero-cost immutable snapshot: it basically "sees" the database frozen to the state it was when the select started. Eventually, when a compaction operation runs, it will clean up and remove the old un-compacted database file, but it can only do so when no read operation is "seeing" the old file anymore, or it would remove it from under its feet. For this reason, CubDB has to internally keep track of all readers, and which point in the database history they are seeing.

The way the select/3 API is designed, allows CubDB to perform this internal bookkeeping without user intervention. If the API was something like the fake example above instead, one would have to manually "check in" and "check out" after finishing processing the stream. If a user would forget to "check out", compaction operations would be blocked indefinitely (or, alternatively, if compaction is allowed to run, it could break slow readers).

In conclusion, performance-wise select/3 is already using lazy streams under the hood, so it will minimize disk operations. It would be nice to have a stream-based API, but that would cause the problems described above.

Regarding allowing processes to subscribe to changes, I am curious, what would be your idea? It sounds like an interesting option, maybe better implemented as a library on top of CubDB.

lucaong · 2021-09-09T14:44:52Z

Closing this for now as there is no response. Feel free to comment on it and I will reopen if necessary.

lucaong closed this as completed Sep 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

streams & change notifications #24

streams & change notifications #24

dch commented Nov 27, 2020

lucaong commented Dec 10, 2020 •

edited

Loading

lucaong commented Sep 9, 2021

streams & change notifications #24

streams & change notifications #24

Comments

dch commented Nov 27, 2020

lucaong commented Dec 10, 2020 • edited Loading

lucaong commented Sep 9, 2021

lucaong commented Dec 10, 2020 •

edited

Loading