Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does RocksDB support multi-process read access? #908

Closed
strongbanker opened this issue Dec 29, 2015 · 30 comments
Closed

Does RocksDB support multi-process read access? #908

strongbanker opened this issue Dec 29, 2015 · 30 comments

Comments

@strongbanker
Copy link

The data contains about 1.3 million records(key/value pairs), with total 50 Gb storage. I save it as a RocksDB database on an NFS share file system. In the cluster, I want to use the MPI I/O to read this database using multiple client processes. In each slave node, the client process will open this database and read the part according to the cursor position given by the master process. In this way, there is no need to transfer the actual data via network, and consequently it can decrease the time of data transportation.
Is the RocksDB suitable for this project?

@strongbanker
Copy link
Author

Unfortunately, the answer is No. When I opened the database using two processes, it says "LOCK: Resource temporarily unavailable", just like the LevelDB.

@dhruba
Copy link
Contributor

dhruba commented Dec 29, 2015

If the database is not being written to when the readers are reading, you can potentially open the database in readonly-mode (https://github.com/facebook/rocksdb/wiki/RocksDB-Basics#readonly-mode). That might allow you to open the same database from multiple processes.

@strongbanker
Copy link
Author

@dhruba , thank you very much for your valuable advice!

Yes, the data is not changable after it is generated. No inserted, no deleted, no modification, just for read. I'll test that.
In fact, I tried the LMDB, but in the NFS share file system, when the disk load is large, the process of reading the LMDB database usually died unexpectedly. And It often shows "No locks available". Thus I'm looking for a database that are suitable in this circumstance.

UPDATE:
Yes, I changed the Open to OpenForReadOnly, it works!

@strongbanker
Copy link
Author

@dhruba Yes, the OpenForReadOnly and Seek can help achieve my goal, I also wish the high efficiency of doing the work as expected.

You save me a lot of time! Thank you very much!

@dhruba
Copy link
Contributor

dhruba commented Dec 29, 2015

@strongbanker Great to be of any help. If there is anything else that you need from RocksDB, pl do let us know.

@sanjosh
Copy link
Contributor

sanjosh commented Oct 24, 2016

@dhruba, if the database was changing, can I keep closing and re-opening it every minute or so ? Would that be a viable strategy if there are the update volume is not high.

@unoexperto
Copy link

@sanjosh How did you end-up dealing with visibility of changes made by write-process in read-only processes ? Is re-opening DB the only way to see changes ?

@sanjosh
Copy link
Contributor

sanjosh commented Aug 29, 2017

@unoexperto , yes you have to reopen the DB because the SST files get mapped in per-process memory. The reader cannot see what the writer has written until the DB is reopened. See https://github.com/sanjosh/smallprogs/tree/master/rocksdb_test

@dgryski
Copy link

dgryski commented Feb 15, 2018

Is the comment at #908 (comment) still accurate? That is, a process with a read-only handle will not see any writes made by another process until it re-opens the file, and that the writes will not corrupt the on-disk file from the point of view of the reader? The information in this bug report conflicts with the entry in the FAQ about multiple processes accessing a single rocksdb database.

@cburgdorf
Copy link

@dgryski Yes, that still seems to be the case.

@riversand963
Copy link
Contributor

@cburgdorf @dgryski , we are currently working on the support for this, allowing multiple processes to read from the same database and tailing the logs of a primary process. :)

@cburgdorf
Copy link

@riversand963 ha! Sounds awesome 👍 !

@sayaji15
Copy link

"we are currently working on the support for this, allowing multiple processes to read from the same database and tailing the logs of a primary process. :)"
@riversand963 Can you please elaborate on this? When will this be available?

@riversand963
Copy link
Contributor

@sayaji15, there is #4899 to achieve part of the goals. With this PR, a RocksDB instance can tail the MANIFEST of another active instance. I have not implemented the part of WAL-tailing-and-replaying yet, but plan to do so in the near future.

@cburgdorf
Copy link

@riversand963 I'm trying to figure out if the feature I'm looking for is already available and if so how I would use it. I tried digging into #4899 and the changelog but I don't have much understanding of the internals to get the plain English answer that I'm looking for. Can you help me out?

Basically, what I need is:

  1. One processes maintaining a read/write connection to the DB
  2. Multiple read connections that get updates about new data, changed data and deleted data.

If this is already available (from version 6.0 I'd suspect?). Are there any special options that I need to pass upon connection of the primary or secondary instances? Btw, I'm using python-rocksdb so chances are that it may not expose any new options but if I know what to look for I may be able to find a way to get there :)

@riversand963
Copy link
Contributor

riversand963 commented May 10, 2019

Hey @cburgdorf, #4899 is available in 6.1, though we have not officially announced this feature yet.
If your code always disables WAL, you can already try it out with 6.1. There are two new functions rocksdb::DB::OpenAsSecondary that you can call to open a DB in secondary mode. You can check https://github.com/facebook/rocksdb/blob/master/include/rocksdb/db.h#L184 for more details.
The secondary (read) instance is able to see the state of the database up to the most recent MANIFEST write by the primary (read-write) instance. Current limitation is that you need to set max_open_files to -1 because this will make the secondary instance DBImplSecondary hold open file descriptors on all SST files, so that they are still accessible to the secondary even after the primary instance deletes them. This also implies that the underlying system should support this behavior (I believe POSIX does). Future plan is to relax this via hard-link or efficient copy.
If your system enables WAL, then we also have #5282 and #5161 to support WAL tailing (not in 6.1 yet). Current limitation is that we haven't implemented the support to trim memtable. If your system keeps running and writing to WAL, the secondary instance's memtable size will grow. I do plan to support this very soon.

@cburgdorf
Copy link

Hey @riversand963 thank you very much for your detailed answer! That is very helpful 👍
I'll try to see if I can get that working. I have created a tracking issue in python-rocksdb to expose the new OpenAsSecondary function.

@JelteF
Copy link
Contributor

JelteF commented Jan 20, 2020

I'm currently trying to find out if it's feasible to create a storage backend for Postgres based on RocksDB. Since postgres uses a multi process architecture (every connection opens a process) this is one of the problems I'm running into. My current idea is to use the RocksDB Allocator API to make rocksdb allocate in shared memory by giving it an allocator that does that. Do you see any problems with this approach.

Just to be clear on the goal, opening the DB as readonly is a no go. Since multiple connections need to be able to write to the database.

@cburgdorf
Copy link

@JelteF I can not answer your question (and frankly priorities shifted so I haven't gotten around trying the OpenAsSecondaryapproach yet) but if your approach turns out viable I'd love if you report it back here :)

@stereobutter
Copy link

Somewhat related to multi-process read access to one database I wonder If one can backup a database from another process i.e. open a database as read-only and then use that for backup?

@ofek
Copy link

ofek commented Jan 7, 2022

Can secondary instances read statistics of the primary? For example, getting the rocksdb.stats property https://github.com/facebook/rocksdb/blob/v6.27.3/include/rocksdb/db.h#L770-L772

@riversand963
Copy link
Contributor

riversand963 commented Jan 7, 2022

@ofek currently there is no support for that. A slightly longer version:

  1. For the primary's in-memory stats, there is no direct way for the secondary to access them if they are in different processes.
  2. If the primary sets options.persist_stats_to_disk to true, then the primary will periodically save the in-memory stats to a column family and the stats data will be flushed to disk at some point. Technically, the secondary can have access to the flushed stats, but support is missing.

Out of curiosity, why read the stats from secondary instance? Is it possible to have the application (in the same process as primary) periodically publish the stats to desired destination?

@ofek
Copy link

ofek commented Jan 7, 2022

@riversand963 In our case, we have an external collector of data (the Datadog Agent) that would periodically poll the database for metrics.

I also tried opening the path as read-only and triggering some writes but the stats would not update. Are stats per process rather than per database?

@riversand963
Copy link
Contributor

riversand963 commented Jan 7, 2022

tried opening the path as read-only and triggering some writes

If you open the db as read-only, then you won't be able to write.

The Statistics object passed to the database when open (https://github.com/facebook/rocksdb/blob/6.28.fb/include/rocksdb/options.h#L590) can be shared by multiple databases in the same process, and the results are for all the databases.
If multiple databases are in different processes, you need to aggregate the metrics yourself.

Can you enable stats collection in the primary instance and check? They should be updated. The stats feature is used broadly and intensively. https://github.com/facebook/rocksdb/wiki/Statistics.

@ofek
Copy link

ofek commented Jan 7, 2022

I triggered writes in a different process. Can the Statistics object opened in the main writer process be accessed by the read-only process?

@riversand963
Copy link
Contributor

No, these C++ objects are not shared by multiple processes.

@ofek
Copy link

ofek commented Jan 7, 2022

Okay, thanks! So for monitoring really the only option is to push the data elsewhere from the writer processes, correct?

@riversand963
Copy link
Contributor

I am afraid so: at least that's what comes to my mind. Other people can correct me.

@dufferzafar
Copy link

Is it possible to check a rocksdb is open by another process? I want to trigger some Delete & Compact calls on a DB, but before doing that I want to check if another process has the DB open. In that case, I'd just error out, & exit.

@jay-zhuang
Copy link
Contributor

Is it possible to check a rocksdb is open by another process?

not atomically, you can check if LOCK file is hold by another process, internally rocksdb uses that to make sure only 1 process is able to open the DB.

I want to trigger some Delete & Compact calls on a DB

If it's possible, try make the DB process to handle all the operations with different threads to handle Delete, Compact requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests