Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: check if prefix exists without seeking #12396

Closed
zaidoon1 opened this issue Feb 28, 2024 · 5 comments
Closed

Feature request: check if prefix exists without seeking #12396

zaidoon1 opened this issue Feb 28, 2024 · 5 comments

Comments

@zaidoon1
Copy link
Contributor

zaidoon1 commented Feb 28, 2024

Currently, my service is performing thousands of prefix matches per second to check if a given prefix matches any key in a cf/db. The current approach I'm using is to do a seek to the prefix and if found then return true otherwise return false.

My setup is as follows:

say my key format is: <uid>:<some dynamic value>
my prefix extractor is setup to extract the <uid>
my prefix exists check are in the form of: <uid>:<some prefix> so while i always have the uid and I am benefiting from the prefix extractor/prefix bloom. I still have to seek to to check if there exists a key that starts with this prefix

the feature request:
instead of seeking to the first key that matches the prefix being searched, we stop searching once we find any key that matches the prefix being searched for.

@zaidoon1
Copy link
Contributor Author

zaidoon1 commented Mar 15, 2024

@ajkr I wanted to get your opinion on this, does this sound like it's doable? Will it make a big difference like I expect? And in terms of complexity, is it a simple thing I can do with some directions/pointers from you or does it require more deep level of how rocksdb works?

@ajkr
Copy link
Contributor

ajkr commented Mar 15, 2024

It's a good question. The core of the issue sounds familiar. Users want ways to prevent lookups from searching older LSM components when they already got what they need.

I will need to think about it more. Coming up with a good API for this may be challenging. Here are a couple use cases we heard about recently that felt like the same core issue.

I dealt with such a case yesterday in #12438. Actually, that user would have preferred an iterator interface, but currently the iterator does not support returning unmerged operands it finds (GetMergeOperands()'s feature), so there were too many missing pieces.

There was also a question last week: for a SQL query like SELECT * FROM t LIMIT 10, why does RocksDB need to look at all the LSM components? The typical implementation is create an iterator, seek to the start of the table, and call Next() until the tenth entry. But, if there were an iterator where the first-order of the sort was the LSM component (new to old), and second-order of the sort was the key, we could probably get all 10 entries just from the first LSM component (memtable). The assumption is the SQL query doesn't care what 10 entries it gets, just that there are ten of them.

@ajkr
Copy link
Contributor

ajkr commented Mar 15, 2024

Coming up with a good API for this may be challenging.

I remembered @pdillinger's proposal in #11644. Assuming it supports the limit options ("There would likely be limit options on number of keys and value size"), for your use case you could call the hypothetical MultiGetRange() function with your prefix as the range, and the number of keys limit set to one.

The SELECT * FROM t LIMIT 10 case is similar; we just make the range cover the whole table, and set the number of keys limit to 10.

And in terms of complexity, is it a simple thing I can do with some directions/pointers from you or does it require more deep level of how rocksdb works?

I think it's difficult. The whole iterator hierarchy today uses key order; it doesn't consider LSM component order at all for point keys. Once you introduce LSM component order, there are all sorts of challenges. For example, how to tell if keys were overwritten or deleted in an earlier component that you already processed a while ago. Or how to deal with merge operands - one key's merge operands may be spread over multiple LSM components but all are needed to obtain the value.

@zaidoon1
Copy link
Contributor Author

zaidoon1 commented Mar 18, 2024

got it, this makes sense, and thanks for the insight. The Select * from t limit n is a great generalization of the problem I'm trying to solve where we want to fetch any records but don't care about order. It looks like it's best if I close this ticket in favour of #11644 ?

@ajkr
Copy link
Contributor

ajkr commented Mar 18, 2024

Sure, if you agree the MultiGetRange() API would solve this use case, we can close this issue. How we can actually implement it will hopefully be figured out eventually in #11644...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants