Performance improvements around dictionary locking #54

juancampa · 2019-06-12T22:15:58Z

I was just reading the performance issue with locking and I'd like to suggest a possible solution: double buffering.

In a nutshell, write everything to one shared dict (A). Then, when it's time to collect, we redirect all writes to a second dict (B) so they don't block. collect is done only from A's data. For the next scrape, we treat A as B and vice versa.

After collect finishes we'll have to:

Clear A it so there's no leftover data.
Since B was "empty" when we swapped, we also need to update B with the data that was previously in A.

(2) is where it can get a bit hairy but I think it's doable (maybe using CRDTs)?

The text was updated successfully, but these errors were encountered:

dolik-rce · 2019-06-13T12:31:43Z

This might work in theory... I see few practical problems:

In step 1, the only way to clear the dict A is to iterate over get_keys, adding even more overhead.
In step 2, copying the previous contents from A to B
a. should not overwrite gauges, only increase counters
b. requires (again) iterating over get_keys
Switches between writing to A/B would have to be atomic, i.e. happening in all nginx workers at the same time.
You'd have to make sure that there is always at most one nginx worker calling collect. That means using locks and potentially blocking other requests asking for metrics. This might actually be acceptable trade-off, since metrics are usually collected just a few times per minute.

First two problems are probably easily solvable (e.g. by doing everything in one iteration). The other two have solution too... Locks can be implemented on top of dict. Atomicity can also be ensured by using some indirection and dict. The only question is, how much slower it would be and how much of a trouble is that for users.

Anyway, I would be happy to test it, if you have time to code it.

knyar · 2019-06-13T12:36:01Z

That's a good idea, thanks for suggesting it.

As you mentioned, actually merging results back is the tricky bit; while counters can just be incremented atomically, for gauges there will be a race between the thread copying old data from A to B and other threads changing gauge values. One way to avoid this is to actually have three dicts - two for counters, and a single one for gauges (that will not be swapped). I suspect most users will have significantly more counters than gauges (especially given that histograms are implemented with counters), so this will have the desired performance benefit.

Note, that shared dictionaries are defined outside of the library, so this will not be backwards compatible; users will need to provide several dicts now instead of one.

Finally, we'll need to implement locking to make sure collect is not called concurrently. There are libraries (example) implementing locks on top of dictionaries, but this will add additional complexity.

We'd probably also want to have some sort of performance benchmark added first, and then use it to validate performance improvements from this change.

In general, this is not something I am planning to implement myself, but will be happy to collaborate on design and review the code.

knyar · 2019-06-13T12:45:05Z

Oh nice, @dolik-rce beat me to the first response with a very similar set of concerns :) Thanks!

I agree that switching between dicts atomically is another tricky bit about this.

My previous thinking about resolving the performance issue was to add support for other key/value stores (besides builtin dict) that provide a similar API - for example, an external Redis instance. I suspect this actually might be easier to implement correctly than doing the double dictionary dance.

juancampa · 2019-06-13T13:26:13Z

Thanks for reading and writing back your concerns @knyar @dolik-rce, that's really helpful. We're thinking about using this library and just wanted to make sure that, if we ever hit a performance wall, there are ways around it. 🙌

I won't be working on this any time soon, unless it affects us, which it hasn't yet. So I'm gonna go ahead and close this one.

Thanks again!

dolik-rce · 2019-06-13T19:03:50Z

My previous thinking about resolving the performance issue was to add support for other key/value stores (besides builtin dict) that provide a similar API

Well, I actually use a patched version of this module using yet another approach to avoid get_keys... If you store the keys as values in the dictionary, with well known keys (e.g. using sequence of integers), it is quite simple to read them back later using just non-blocking get calls. The trade-off here is that you use more space in the dictionary. The only tricky bit in this approach is storing the keys atomically, which I solved by using locks. But thinking of it now, it might be possible to do using just the properties of the dict. I'll have to look at it again 😄

knyar · 2019-06-14T09:46:35Z

That's also a nice idea, @dolik-rce. I think you don't need locking if you can accept the same key being stored multiple times when several threads are racing to create a new metric, and then ignore duplicate names while reading the keys.

I will be curious to see your current implementation. The way I would probably do it is:

reserve part of the dict key space for tracking metric names - for example, keys starting with __metric.
store metric names using keys like __metric_{idx} (where idx is an integer counter).
create a __metric_last integer key which will store the index of the last metric name.

Then, while writing a value:

check if there is an existing value stored for this metric.
if there is no value (metric does not exist), increment __metric_last. incr will return the new value (idx).
write metric value.
write metric name as __metric_{idx}.

If multiple threads race between steps 1 & 3, the same metric name will be written with several indexes.

While collecting metrics:

use a separate lua table to keep track of metric names that have been processed.
iterate from 1 to __metric_last to get all metric names. Discard duplicates.
read all metric values.

What do you think?

dolik-rce · 2019-06-17T07:14:13Z

That is actually almost exactly the algorithm I use, good to see you arrived to same conclusions :-) The only part I didn't think of is resolving the race condition when collecting the metrics (I actually found there is one just last week, when looking at the code because of this issue). Additionally, I used a table to keep all the metrics keys in each worker. This table can be quickly updated by iterating only over newer indexes (since we only add metrics, never modify or remove) and I used it to implement nice and fast prometheus::get_keys() method.

I think it could actually be even slightly simpler if you omit the __metric_last. You don't actually need it, since you can iterate over the __metric_{idx} keys in a while loop and stop when you get first nil. Inserting would be slightly more complicated, but I think it might be possible to do it in safe and lock-less way. I'd keep a variable to track last known metric count and just try safe_add, until it succeeds. In most cases, all the metrics are created very soon after start of the server, so it would not present much overhead later when the application is already warmed up.

knyar · 2020-04-17T07:52:39Z

FYI, there have been some recent performance improvements for counter metrics that should decrease lock contention: b7c8feb...d8c2813

Counters are now incremented independently by each worker process, with deltas flushed regularly to the shared dictionary asynchronously. Changes are now in master, and I will cut a new release in a few days.

Even though implementation is different to the one described in this issue, I think it largely addresses performance concerns. I will close this issue now, but please feel free to reopen (or create a new issue or a PR) if you would like to suggest further improvements.

juancampa closed this as completed Jun 13, 2019

knyar reopened this Jun 14, 2019

knyar changed the title ~~Double buffering to prevent complete lock~~ Performance improvements around dictionary locking Jun 14, 2019

fffonion mentioned this issue Mar 13, 2020

optimize for performance #75

Merged

knyar closed this as completed Apr 17, 2020

dolik-rce mentioned this issue Apr 21, 2020

Avoid ngx.shared.dict:get_keys #82

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements around dictionary locking #54

Performance improvements around dictionary locking #54

juancampa commented Jun 12, 2019 •

edited

Loading

dolik-rce commented Jun 13, 2019

knyar commented Jun 13, 2019

knyar commented Jun 13, 2019

juancampa commented Jun 13, 2019

dolik-rce commented Jun 13, 2019

knyar commented Jun 14, 2019

dolik-rce commented Jun 17, 2019

knyar commented Apr 17, 2020

Performance improvements around dictionary locking #54

Performance improvements around dictionary locking #54

Comments

juancampa commented Jun 12, 2019 • edited Loading

dolik-rce commented Jun 13, 2019

knyar commented Jun 13, 2019

knyar commented Jun 13, 2019

juancampa commented Jun 13, 2019

dolik-rce commented Jun 13, 2019

knyar commented Jun 14, 2019

dolik-rce commented Jun 17, 2019

knyar commented Apr 17, 2020

juancampa commented Jun 12, 2019 •

edited

Loading