New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
avoid memcpy from mmapped plain table tables #8375
base: main
Are you sure you want to change the base?
Conversation
4fd0214
to
ac1012e
Compare
It appears I've tickled something in MultiGet. working on sorting that out. the refcount of Version is not atomically updated so I'm guessing I need to hold on to something wider to keep it alive. |
// guessing these files can be used by multiple versions, since they are | ||
// immutable). We can ensure this lives by holding a ref to this | ||
// version. | ||
get_context.set_table_pinner(&table_pinner); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking on the problem. I think you are making a very interesting optimization: prevent memcopy for mmapped, uncompressed data (potentially for both of plain table and block based table), so I appreciate your effort. I'm just curious, why do we need another pinner for this, rather than directly registering cleaner to GetContext.pinnable_val_?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind. Ignore my previous comment. I misunderstood the code and figured out why it is done this way. Let me think about it a little bit more whether there is a potentially better way.
Agreed, thanks for working on this. I had forgotten we are currently unable to do zero copy point lookups with mmap. Does this limitation apply to read-only mode too? I made public another known issue (#8422) with mmap efficiency that has been sitting a while in case you find it interesting. Perhaps such a MappedRead() could be used on pinned files. |
@ajkr I thought about it and feel this approach is reasonable. Thought? |
I don't believe so -- there are branches for 'immortal' tables that are taken in read-only mode that provide direct pointers into the mapped region without any associated cleaner required (this is my recollection from some months ago).
My understanding is that such a fix for immortal tables won't aid reads from tables in normally opened DBs (i.e. not opened as secondary or RO) because such tables can't be immortal as they may be eventually discarded as compactions cause them to be superseded by new tables. But this is again a months-old recollection, so I may be off? |
Intended to fix #8374.
Things might be a bit rough now (in particularly I've left various comments I left along the way that I don't intend to keep, the commit message is garbage, ...), but I wanted to get feedback on the implementation before I clean it all up.
plain_table_db_test passes as patched (it no longer copies when files are mapped in and is updated as such). I've seen stress testing tools that I intend to run; any suggestions and guidance is very much welcome.