Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

View is never computed #256

Closed
FlixCoder opened this issue May 29, 2022 · 7 comments
Closed

View is never computed #256

FlixCoder opened this issue May 29, 2022 · 7 comments
Labels
bug Something isn't working collections Issues impacting collections or views storage Issues impacting the storage layer

Comments

@FlixCoder
Copy link

Hi! I found another thing, this time a bug:

I am starting an example with an empty database and regularly poll a view on my collection. Initially it is empty of course, so no need to compute the view.

However, when I add just a single entry, it stochastically sometimes updates the view, sometimes it doesn't (it doesn't call the map function). This is not a problem when adding many entries, because it updates often enough then. But when adding just a single entry, this makes the view be empty forever, even if there should be an entry (again, map is never called, despite an entry being added to the collection).

I hope this was understandable, please tell me whether you need any more information.

Thank you again! :)

@ecton
Copy link
Member

ecton commented May 29, 2022

Views are lazy by default, which means until a query is executed, the map function will not be called. If a view is marked as Unique, this no longer applies and the view is updated during the transaction.

In changes that haven't been released in main, views can be marked as "eager" to force them to be updated always without requiring the unique constraint.

Does this match your experience? If not, I'll want to see some code to try to understand what's different from your usage than what I've tested so far.

@FlixCoder
Copy link
Author

I have tried making a minimalized example, but I failed to replicate what is happening here: https://github.com/FlixCoder/bonsaimq/blob/main/examples/simple.rs

I am very sure that indeed the mal function is never called, even when I query the view. Again, it is stochastic, so for the same code, sometimes it updates the view, sometimes it does not.

I will try again to find a smaller example than my whole project, but it might take some time until I find the time.

@FlixCoder
Copy link
Author

But to save at least a little time of yours to searching around my code:

  • I have a runner that regularly polls a view for documents (I tested even without key, so it is really empty).
  • From another async task I push a new document to the database (via a cloned database handle).
  • So now it should show the new document in the view, but it sometimes appears, sometimes it does not.
  • When pushing 100 new messages, it never happens that the view is not updated, because there is enough changes to trigger the recognition of a change against lazyness.

@ecton
Copy link
Member

ecton commented May 29, 2022

From looking through your code, it seems pretty straightforward, and I can't see how your code would be at fault. It seems like there's an edge case in detecting new changes.

I hesitate to ask you to narrow it down too much now because the entire view indexing system has been rewritten in #250 (not yet merged to main). I don't remember discovering any edge cases like this when doing the rewrite, but the new system reduces the amount of state needed to keep track of what has been and hasn't been indexed, and it improves MVCC guarantees.

Due to my refactor in #250 not solving the performance issues I was seeing, I'm not sure how quickly v0.5.0 will be released -- optimistically it might be the end of June.

I'm currently focusing on how to get BonsaiDb's performance back to where I want it to be, but I'll try to switch in the next few days to see about a fix for v0.4.

@FlixCoder
Copy link
Author

Alright, thank you!

@ecton ecton added bug Something isn't working collections Issues impacting collections or views storage Issues impacting the storage layer labels May 29, 2022
@ecton
Copy link
Member

ecton commented May 29, 2022

My mind couldn't let this puzzle go. I tracked it down to a bug I had already fixed but haven't released in Nebari: khonsulabs/nebari@32691e5#diff-00154e62b20bc4b1c2638af3eb876665458523253415bf762974217d56e52c87

There was an edge case where one thread/task causes a mapping job to happen while a transaction is occurring on another thread/task. Nebari's implementation of current_transaction_id was just flat out wrong -- dating back to the early days of the library. Instead of behaving as documented, it would return the most recently allocated transaction id. The view indexer would not find any invalidated documents, and note the value of current_transaction_id before it looked for invalidated documents.

The next time the query happened, even though the transaction is now complete, the internal state of BonsaiDb thinks that the view is already on the current transaction, so it never calls the mapping function. When the next transaction completed, it allowed the mapper to run again which correctly indexes the documents that hadn't been indexed yet as well as the most recently changed ones.

I just released Nebari v0.5.4. Hopefully after you cargo update you will not be able to reproduce this issue anymore.

@FlixCoder
Copy link
Author

It is fixed indeed, thank you! It was the transaction that was the difference :D
So everything seems to work now :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working collections Issues impacting collections or views storage Issues impacting the storage layer
Projects
None yet
Development

No branches or pull requests

2 participants