-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
We allocate CPU_SHARDING_FACTOR*cache_size cache space, not cache_size #1538
Comments
Assigning to 1.11 because this can break existing installations in the sense that their performance might degrade significantly. A point release in my opinion should not introduce such changes. |
I don't think there's much reason to fix this. Even if it's not in a point release it's going to be at least a minor annoyance when suddenly their tables are using less memory and performing worse and it's going to become moot once we have the cache changes in place. There really isn't much upside to fixing this right now and there's a clear cost. Also I don't think this was done by mistake. The cache size was always meant to be a per shard setting. You can certain claim this isn't the most intuitive definition but changing it right now and then changing it again a month from now is probably not going to make things much less confusing at this point. |
I'd vote just close this and wait for real cache changes. |
No. If I assign a table 1 GB of cache then I expect it to consume no more than 1 GB + overhead of memory. Fixing this is a huge improvement. |
Also we are not going to change this definition again, are we? The new interface will certainly have the same semantics (except that it might not be on a per-table but on a per-machine level or something like that. But still it will be a limit on how much memory the cache can use). |
I'm wary of waiting for the real cache changes when the current behavior makes it extremely non-obvious how much memory we will use (especially given how many OOM complaints we've had). This is a one-line fix; I think we should just do it. |
The new solution isn't going to have a cache limit for tables at all. Also this is harder than you think it is and the solution you've proposed won't have the effect you're claiming it will. Again, it isn't worth doing this to give users a month of a slightly better behavior until we change the behavior again. Especially since no one has complained about it. |
As I said, maybe not per table. But it will have a cache size limit that limits the amount of memory the caches take. On whatever scope. Also the solution does have exactly the claimed effect. Why do you think it would not have the claimed effect? ...and lots of people complain about out-of-memory issues all the time, don't they? |
I think people complain about it by setting an appropriate cache size, running out of memory, then complaining to us that the OOM killer sniped their process. If I were 100% confident that the new cache would be in next and working by the next release, I don't think this would be worth it, but we don't have the best track record with deadlines. |
I'm almost certain that the new cache is not going to be in the next release. Swapping out the cache is a pretty huge step. |
(this is implemented together with a few other memory-saving changes in https://github.com/rethinkdb/rethinkdb/compare/daniel_writeback_memory_savings by the way) |
Hmm, it actually might have the correct affect. How do we handle cache size for range shards? I still think that this is going to make more users lives worse rather than better. People using rethink in production have cache size settings which are working for them and now when they upgrade suddenly performance is going to degrade a lot. This is going to be particularly painful because they can't change the cache size at runtime. Even if this value has a more intuitive meaning I don't think we're making anyone's lives that much better because it still doesn't have a very intuitive meaning and is just an unwieldy tool to use which is why it's going away. |
Well, then we can just set the new default cache size to 4096 MB. People will probably have to re-import their data anyway, which wipes any cache_size settings. The behavior will be exactly the same, except that if a user adjusts the cache size manually, what they get is a lot closer to what they probably expect to get. All range shards of a given table use the same CPU_SHARDING_FACTOR caches in the background, don't they? |
@danielmewes that still screws over people who have set their |
It will be mentioned in the release notes and blog. We shouldn't forget that a lot of admins (including myself) have set the cache size limit in the believe that it more or less works as described, and will be totally screwed once the data in their tables actually approaches that limit. |
I think we should fix this, even if this will go away all together a month later. We're not yet at a stage where we should worry about being backwards compatible with our own bugs. On a different note, I spent the past week talking to a lot of users over the phone/Skype. When I asked them what they don't like about Rethink, #97 and #1375 came up in almost every conversation. Performance/scalability was a very distant (and much more vague) third complaint. So we should try really hard to get #97 and #1375 out the door ASAP. |
I put the fix for this into code review 979 together with two other small improvements to the cache's memory handling (one bug fix for a rather insignificant corner case, and one optimization through which we can release memory earlier during writebacks). |
We allocate four times the cache space that the user has specified.
This is trivial to fix. We just apparently forgot to divide the cache_size by CPU_SHARDING_FACTOR before creating the individual btree_store_t.
It should be noted though that some users might experience degraded performance once this is fixed. They should be advised to increase their cache sizes if necessary.
The text was updated successfully, but these errors were encountered: