-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design Discussion of Async Support #3254
Comments
I thought about the code and proposed we refactor the code in this way to make it easier to have better mantainable code for both of async and sync. This is just thoughts. Let me know your opinion on this. Get(): have a global context and pass it everywhere.
And along the path a function is rewritten in a way that I/O path is clearly split. If most 1 I/O is possible, it can be this:
In this way, in the the callback OnTableReaderGetFinish() can be simply post_io_step(). Similarly, if there is a loop of possible I/O, it can look like this:
In this way, I hope the sync code path will be even simpler and the performance regression will be less. Iterators
In this way, async's OnReadDataBlockFinish() can be BlockBasedTableIterator::SearchInDataBlock(). To make the code change more incremental. I suggest we first focus on two things: (1) code refactoring step by step; I hope each code refactoring should by itself improve the maintainability of the code base; (2) first focus on the step of data block reading part to async. Other possible I/O path, like index block and table reader constructing can be added later; (3) focus on block based table reader first. Skip other table readers for now. |
@siying The general principles are sound. Both 1) and 2) are implemented within the PRs are submitted. I have contemplated introducing the global context but rejected the idea due to the following a) Maintenance. This would introduce full stack dependencies with the same class. b) Most of the context would go un-used as a lot of ops are satisfied from cache. |
@yuslepukhin of course it is oversimplified. I just want to show the idea that, if it is like a state-machine. The callback function can do the next step as the state machine, and the sync function can just flow in the natural way. How about 3? I think this can help we move incrementally. |
@siying Can you give an example of the code that depicts the async functionality, so I could better understand the idea? Breaking the function in pre- post- parts is good but this is still sync code. Base design questions here before we move forward.
If we want a separate async path that only utilizes parts of the sync code, then it would be extremely hard on the development/maintenance and testing. |
@siying I am throwing together a prototype in a form of the unit test that makes use of co_routines. It has many advantages, such as we do not need to refactor code or change its sync nature. |
It is a good idea but pay attention to how to design the api for coroutines or fibers in rocksdb: It could be not compatible with JNI extensions. In a old test I tried to use boost fibers inside JNI . The thread inner status inside c++ is not compatible with jvm thread model. It is crashing JVM. The better solution is to abstract the library providing fibers or coroutines. In this way it can come from java or from c++ based on what are you using. You can find a old my discussion also in stackoverflow. |
@publicocean0 Thanks for the input, I will check that out. The coroutines proposal as implemented by MS is significantly different from that of the fibers and/or boost fibers and boost coroutines. It is probable that on JNI boundary the the chain of coroutines may need to be stopped and converted either back to a sync model OR to a callback model as supported by JVM. In either case, I am not seeing how it would be possible to reuse Java threadpool for I/O completion although internally JVM does just that. |
JVM is surelly in not a good moment , there is a crossroads for future: sun is too much slow to propose new solutions . Anyway here we can t wait for the distant future , probbaly the better solution is to study a solution flexible. My opinion is fibers might be embedded at OS levels not at language level... but the way for seeing it in win,linux,mac is not so fast probably . For example there are already new sperimental OS removing completly old threads concept at user level , replacing it with fibers. Actually the most important problem with fibers are locks. Locks block the thread and not just the fiber so they have a bad side effect for concurrency. Java Quasar (fibers library for java) implemented a special extension (with same api) for locks for fibers. Anyway all synchonized blocks might be replaced. What i verified personally is to try to use boost fibers in c++ and to call it using jni because it is for long time i wished to use fibers with rocksdb. Unfurtunally it is not possible to call back a java method from inside a c++ fiber : jvm crashes. It seams JVM checks stack consistency at JNI boundary so using fibers/courotines it finds a inconsistency based on old thread model. Anyway you can wait until the result is ready and then returns the result. However if we wait for the result when it is ready (synchronous model) at JNI level ... you effectively limit the power offered by fibers/coroutines. I m not sure but with coroutines the status is not attached to thread local like with fibers: maybe with coroutiines is more simplier to use the same thread in c++ and in java without conficts in the status of thread.Anyway you have to use 2 different coroutines using isolated schedulers 1 in java , 1 in c++ in pipeline. |
Now that coroutine is in C++20 and you have io_uring in linux, maybe this effort should be revitalized. |
Thanks @kennthhz for reviving this discussion. |
Thanks Yangqin. Are you one of the maintainers of rocksdb? Coroutine
support is officially shipped with c++20 compilers so it is no longer WIP.
In fact, the entire C++20 spec was finalized a while ago and the standard
committee has moved on to C++23.
I'd like to propose and collaborate with like minded people to add support
for coroutine based read async API for rocksdb with underlying io_uring
support. Thanks
…On Mon, Jul 19, 2021 at 10:31 AM Yanqin Jin ***@***.***> wrote:
Thanks @kennthhz <https://github.com/kennthhz> for reviving this
discussion.
I believe RocksDB will continue to rely on C++11 (let me know otherwise).
Besides, coroutine support is still WIP even for C++20 with a lot of
different opinions.
For io_uring, @siying <https://github.com/siying> and @anand1976
<https://github.com/anand1976> have some support via liburing.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3254 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFQVWVLHUYX2DOCQABUHISTTYROODANCNFSM4EHZAICQ>
.
|
IMHO the right thing to do is to create a separate coroutine API chain for non-blocking async operations. Opportunistically reuse some sync code but should reduce the call stack depth and use nested co-await. It is pretty much like C#. |
I want to fully reviving this issue/work. My company uses RocksDB as storage engine for a large cloud based NoSql service. The efficiency and performance is critical. So support non-blocking IO async API is critical. We believe RocksDB should take advantage of new technologies such as io_uring and C++ coroutine for this. We have a fork that is a working progress: https://github.com/kennthhz/rocksdb |
@kennthhz rocksdb already supports io_uring, it's enabled by default: rocksdb/build_tools/build_detect_platform Line 174 in e970248
For moving to c++20, rocksdb needs to consider users that still use compiler not supporting c++20. |
@jay-zhuang RocksDB's current support for io_uring is not truly async. It still blocks the thread while waiting for IO completion. That is why it only make some sense in MultiGet Api, but not Get. Also it is not supported in flushing and compaction. To be real async, we need to use coroutine. I agree it is premature to require all clients to upgrade gcc 10. That is why we will use #define to isolate them. |
I have created in the past a fully async implementations for Get/MultiGet that does not block on IO on Windows. However, there is a very small Windows specific part in it having to do with async IO. I have faced multiple issues with it. First of all, it required a lot of code refactoring to make it run in both sync and async modes. Making iterators async presented a particular challenge. It worked, but it yielded only marginal improvements under specific circumstances. In exchange, you would get a programming paradigm that is not familiar to many and difficult to maintain. |
Thanks @yuslepukhin. I did looked at your implementation back then. Coroutine definitely was a better programming paradigm, not to mention an actual language standard. My current fork with coroutine + IO_URING (https://github.com/kennthhz/rocksdb) doesn't have callback stack overflow issue (through promise chaining) and worked as expected. It has been almost 4 years since your last effort. Coroutine in gcc compiler and MS compiler have improved to the point that can be used in production. Through my ex-Microsoft colleague I know that is the case in Microsoft. My current focus is coroutine in Linux with IO_URING as async IO. We use RocksDB with higher performance in mind to support cloud based multi-tenant partitioned storage (in our new NoSQL DB) scenario at the CPU level. Each db instance represents a tenant's partition of a collection (i.e. table) to achieve isolation. A node of 10 vCore (we use K8s container) can hosts tens or even a hundred plus partitions (each partition can be small as 10-20GB in disk of 1TB. So a node can host up 100 RocksDB instance). RocksDB is used as a library with threading controlled by our DB service's resource governor logic because it knows how to manage resource usage better. Threading model is similar to Seastar in that it has a single worker thread per CPU and manage all requests at task level through a lock-free task queue. So having non-blocking async IO is critical. Coroutine + IO_URING give us much better latency and throughput because we use much less threads, have higher CPU utilization, and in most case is running under share-nothing, lock-free pattern. This architecture approach in database system is becoming more and more common (ScallyDB, YottaByte etc); and RocksDB has been used more and more by DBs as default storage engine. Hence the coroutine support is important to keep up with evolution. My propose is
People in this thread please share your thoughts. |
@kennthhz I haven't checked your implementation yet. A few questions:
|
@riversand963 coroutine's exception handling has two part:1) handled exception is no different than normal function in that you need to use try-catch. 2) unhandled exception: coroutine has unhandled exception event. But I haven't use them since current RocksDB code uses return code for error handling. I don't see any internal try-catch. |
@kennthhz After read code with "async get with coroutine",I have a questions. Thanks. |
Hi, I ported the c++20 coroutine and io_uring changes from https://github.com/kennthhz/rocksdb fork. to the TiKV RocksDB fork which IIRC has the latest merges from upstream. My one concern was around thread local variables and the first one I've come across is the SuperVersion thread local infrastructure. I've hacked around it but it's unsatisfactory (for now). This will be a generic issue with any thread local state that is going to be shared across multiple invocations of Async[Multi]Get() from the same thread etc. Are there other similar issues I should be aware of? I think this is a very cool feature and kennthhz has done great work with his prototype, it's a good starting point. Regards, |
@subains RocksDB uses thread-local in SuperVersion and some other places. SuperVersion doesn't have to be thread local. As long as it can be fetched from a pool of SVs efficiently, it will be fine. Getting it from a pool rather than thread local also has other benefits too. Feel free to give it approach a try. You can also help us develop the pooled SuperVersion implementation too. |
@siying thanks for the quick reply. I will give the pool version a try. My current hack passes Valgrind but it's still a hack :-). I'm not very familiar with RocksDB code. My immediate plan is to improve the hack so that I can test the changes in a larger context within TiKV and also test out the C++ <-> Rust bindings with async control flow. |
Even though MS defeated Google in terms of letting their stackless coroutine model be accepted by the standard committee, we must admit that the stackful model is always a strong competitor. It has a much longer history, and has been widely used, for instance, boost coroutine. And it's less intrusive to the legacy code. We don't have to add the co_wait and co_return keyword to every piece of code. This is an experiment we did to integrate stackful coroutine into RocksDB. It's not only focused on MultiGet, but a overall transformation for locks, I/O, and task scheduling. The async runtime is driven by io_uring. 200 lines of code to rewrite the 600'000 lines RocksDB into a coroutine program |
We decided to have the design discussion of async support here. Let's see how it works.
The initial discussion started at the prototyping work by @yuslepukhin
#3063
and
#2727
The text was updated successfully, but these errors were encountered: