-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ALTER TABLE ... ENGINE=ROCKSDB uses too much memory #692
Comments
We have plans to implicitly use bulk loading mode on rebuilding tables. Meanwhile, could your customer use the following instead? set session rocksdb_bulk_load=1; |
Peter Zaitsev already ran into this same OOM situation on large transactions with out-of-the-box settings. This is not likely to pass as acceptable behavior. If there is an upper limit to the size of a transaction, then it should result in a SQL error when reached, not a server assertion. As far as bulk load behavior, TokuDB implemented a somewhat simple idea in that if we were starting a handler bulk insert and the table is empty, assume that it is a bulk load operation and activate the bulk loader. Another thing that TokuDB allows is out-of-order key insertions. It can handle this because it 'stages' all of the inserts first and sorts the data, then finalizes and actually performs the bulk operation once all inserts have been accepted and sorted. The down side to this is the performance and storage hit to do the staging and sorting of data, but the end result is still a ~15% performance improvement over normal insertions. So maybe there is some idea in there that might be beneficial for MyRocks/RocksDB. |
@georgelorchpercona We have a session variable rocksdb_max_row_locks. If the number of row locks exceeds the variable, the transaction gets a SQL error. Currently default is 1B row locks, which is way too high to return before getting OOM. We are talking internally to reduce defaults, but have not done yet. Maybe we'll reduce to around 1M. Implicit bulk load on DDL has been discussed as well. We can implement it since quite a lot of people have been hitting the OOM on DDL. The idea about out-of-order key insertions is interesting. Thanks for your suggestion! |
Longer term we have plans to support long running transactions in RocksDB, which stores intermediate data on storage to prevent from OOM. |
Thanks for the tip on rocksdb_max_row_locks @yoshinorim ! That is a perfect short term solution that we can implement ourselves. We have a similar issue with rocksdb_max_open_files default and are reducing it to sane levels and documenting how to tune it properly. The whole point is that a basic out-of-the-box installation should not fail/assert on simple things. If there are some limitations, we should be erroring to the client when hit and not asserting the instance. I realize that this is a different thought process than when in early development and prototyping and you really don't know what is 'correct' values yet and want a very in your face failure to let you know you hit some limit. Thanks again for the info! I will pass this on to our team and start looking at changing our defaults for these and ensure we have them documented. |
There's also rocksdb_write_batch_max_bytes which offers similar functionality. |
@georgelorchpercona We have recently added support for out-of-order key insertion in bulk load mode. Setting |
Bumping this. Our CTO Vadim hit this today immediately while testing a release candidate build. So we know about https://github.com/facebook/mysql-5.6/wiki/Migrating-from-InnoDB-to-RocksDB and https://github.com/facebook/mysql-5.6/wiki/data-loading and the memory limitations, but this make for a terrible user experience where a 'new' user wanting to try out MyRocks immediately hits this, then has to go and figure out what happened. Users typically do not run right out and read the docs before firing things off other than to follow the fastest path from 'A' to 'B'. So we now have Sergei Petrunia, Peter Zaitsev, and Vadim Tkachenko, all fairly skilled people with MySQL, hitting (or reporting) this out-of-the box, and all asking, "can't we just not crash and issue some error early on by default?" Please consider some way to accomplish a reasonable default transaction size threshold and error back to the client and try to get RocksDB guys to fix it (maybe mmap the memory needed from some temp file or something). Also please consider this as one of the highest priority issues we are currently facing as we attempt to release MyRocks as GA in Percona Server. |
We're going to set default rocksdb_max_row_locks from 1B to 1M soon. Longer term, we'll automatically use bulk loading mode on ALTER. |
Committed e2c6868 , which reduced default rocksdb_max_row_locks to 1M. |
@yoshinorim Is there any update of using bulk loadind mode on ALTER? |
This is a known property, but I am filing it as a bug because it gives bad user experience.
A trivial example: let's create a non-MyRocks table
now, suppose one is considering migrating to MyRocks. Their first likely step might be (*):
Unfortunately, when the table is sufficiently big, this statement will cause the server to consume all the memory and then be killed due to uncaught std::bad_alloc, or by OOM killer.
This is because MyRocks will try to write the whole table contents as one big transaction.
MyRocks actually already has ways to break big bulk load operations into smaller chunks. The issue here is that they are not enabled for this statement, and this gives a bad user experience.
(*) - this bug is actually inspired by a real-world attempt: https://jira.mariadb.org/browse/MDEV-13609 (if you're not a MariaDB member you won't see much in that MDEV as crash details are interspersed with user data and so were made private)
A paste from debugger proving that ALTER TABLE ENGINE=... indeed accumulated all of the table contents in memory:
The text was updated successfully, but these errors were encountered: