-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does RocksDB support multi-process read access? #908
Comments
Unfortunately, the answer is No. When I opened the database using two processes, it says "LOCK: Resource temporarily unavailable", just like the LevelDB. |
If the database is not being written to when the readers are reading, you can potentially open the database in readonly-mode (https://github.com/facebook/rocksdb/wiki/RocksDB-Basics#readonly-mode). That might allow you to open the same database from multiple processes. |
@dhruba , thank you very much for your valuable advice! Yes, the data is not changable after it is generated. No inserted, no deleted, no modification, just for read. I'll test that. UPDATE: |
@dhruba Yes, the You save me a lot of time! Thank you very much! |
@strongbanker Great to be of any help. If there is anything else that you need from RocksDB, pl do let us know. |
@dhruba, if the database was changing, can I keep closing and re-opening it every minute or so ? Would that be a viable strategy if there are the update volume is not high. |
@sanjosh How did you end-up dealing with visibility of changes made by write-process in read-only processes ? Is re-opening DB the only way to see changes ? |
@unoexperto , yes you have to reopen the DB because the SST files get mapped in per-process memory. The reader cannot see what the writer has written until the DB is reopened. See https://github.com/sanjosh/smallprogs/tree/master/rocksdb_test |
Is the comment at #908 (comment) still accurate? That is, a process with a read-only handle will not see any writes made by another process until it re-opens the file, and that the writes will not corrupt the on-disk file from the point of view of the reader? The information in this bug report conflicts with the entry in the FAQ about multiple processes accessing a single rocksdb database. |
@dgryski Yes, that still seems to be the case. |
@cburgdorf @dgryski , we are currently working on the support for this, allowing multiple processes to read from the same database and tailing the logs of a primary process. :) |
@riversand963 ha! Sounds awesome 👍 ! |
"we are currently working on the support for this, allowing multiple processes to read from the same database and tailing the logs of a primary process. :)" |
@riversand963 I'm trying to figure out if the feature I'm looking for is already available and if so how I would use it. I tried digging into #4899 and the changelog but I don't have much understanding of the internals to get the plain English answer that I'm looking for. Can you help me out? Basically, what I need is:
If this is already available (from version 6.0 I'd suspect?). Are there any special options that I need to pass upon connection of the primary or secondary instances? Btw, I'm using python-rocksdb so chances are that it may not expose any new options but if I know what to look for I may be able to find a way to get there :) |
Hey @cburgdorf, #4899 is available in 6.1, though we have not officially announced this feature yet. |
Hey @riversand963 thank you very much for your detailed answer! That is very helpful 👍 |
I'm currently trying to find out if it's feasible to create a storage backend for Postgres based on RocksDB. Since postgres uses a multi process architecture (every connection opens a process) this is one of the problems I'm running into. My current idea is to use the RocksDB Allocator API to make rocksdb allocate in shared memory by giving it an allocator that does that. Do you see any problems with this approach. Just to be clear on the goal, opening the DB as readonly is a no go. Since multiple connections need to be able to write to the database. |
@JelteF I can not answer your question (and frankly priorities shifted so I haven't gotten around trying the |
Somewhat related to multi-process read access to one database I wonder If one can backup a database from another process i.e. open a database as read-only and then use that for backup? |
Can secondary instances read statistics of the primary? For example, getting the |
@ofek currently there is no support for that. A slightly longer version:
Out of curiosity, why read the stats from secondary instance? Is it possible to have the application (in the same process as primary) periodically publish the stats to desired destination? |
@riversand963 In our case, we have an external collector of data (the Datadog Agent) that would periodically poll the database for metrics. I also tried opening the path as read-only and triggering some writes but the stats would not update. Are stats per process rather than per database? |
If you open the db as read-only, then you won't be able to write. The Can you enable stats collection in the primary instance and check? They should be updated. The stats feature is used broadly and intensively. https://github.com/facebook/rocksdb/wiki/Statistics. |
I triggered writes in a different process. Can the |
No, these C++ objects are not shared by multiple processes. |
Okay, thanks! So for monitoring really the only option is to push the data elsewhere from the writer processes, correct? |
I am afraid so: at least that's what comes to my mind. Other people can correct me. |
Is it possible to check a rocksdb is open by another process? I want to trigger some |
not atomically, you can check if LOCK file is hold by another process, internally rocksdb uses that to make sure only 1 process is able to open the DB.
If it's possible, try make the DB process to handle all the operations with different threads to handle |
The data contains about 1.3 million records(key/value pairs), with total 50 Gb storage. I save it as a RocksDB database on an NFS share file system. In the cluster, I want to use the MPI I/O to read this database using multiple client processes. In each slave node, the client process will open this database and read the part according to the cursor position given by the master process. In this way, there is no need to transfer the actual data via network, and consequently it can decrease the time of data transportation.
Is the RocksDB suitable for this project?
The text was updated successfully, but these errors were encountered: