Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streams & change notifications #24

Closed
dch opened this issue Nov 27, 2020 · 2 comments
Closed

streams & change notifications #24

dch opened this issue Nov 27, 2020 · 2 comments

Comments

@dch
Copy link

dch commented Nov 27, 2020

Have you any thoughts on adding a lazy stream interface, or a way to allow processes to subscribe to changed entries?

@lucaong
Copy link
Owner

lucaong commented Dec 10, 2020

Hi @dch ,
the select/3 function uses lazy streams internally, so if you do, for example, something like:

CubDB.select(db, pipe: [
  map: fn {key, value} -> value end,
  filter: fn x -> Integer.is_even(x) end
  map: fn x -> x * 2 end,
  take: 3
])

The transformations above would be executed as a lazy stream (and, in particular, only maximum 3 entries will be read from disk and processed, because of the final take: 3).

It would be nice to have an interface such as:

# Note: this is NOT the way CubDB actually works, just illustrating a point
CubDB.select() |> Stream.map(fn x -> ... end) |> Stream.filter(...) |> ...

But, unfortunately, it would be tricky and error prone. The issue is the following: remember that CubDB allows you to perform concurrent reads and writes, so you can be executing a long running select while some other process is performing writes. This is because the select runs against a zero-cost immutable snapshot: it basically "sees" the database frozen to the state it was when the select started. Eventually, when a compaction operation runs, it will clean up and remove the old un-compacted database file, but it can only do so when no read operation is "seeing" the old file anymore, or it would remove it from under its feet. For this reason, CubDB has to internally keep track of all readers, and which point in the database history they are seeing.

The way the select/3 API is designed, allows CubDB to perform this internal bookkeeping without user intervention. If the API was something like the fake example above instead, one would have to manually "check in" and "check out" after finishing processing the stream. If a user would forget to "check out", compaction operations would be blocked indefinitely (or, alternatively, if compaction is allowed to run, it could break slow readers).

In conclusion, performance-wise select/3 is already using lazy streams under the hood, so it will minimize disk operations. It would be nice to have a stream-based API, but that would cause the problems described above.

Regarding allowing processes to subscribe to changes, I am curious, what would be your idea? It sounds like an interesting option, maybe better implemented as a library on top of CubDB.

@lucaong
Copy link
Owner

lucaong commented Sep 9, 2021

Closing this for now as there is no response. Feel free to comment on it and I will reopen if necessary.

@lucaong lucaong closed this as completed Sep 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants