Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvements around dictionary locking #54

Closed
juancampa opened this issue Jun 12, 2019 · 8 comments
Closed

Performance improvements around dictionary locking #54

juancampa opened this issue Jun 12, 2019 · 8 comments

Comments

@juancampa
Copy link

juancampa commented Jun 12, 2019

I was just reading the performance issue with locking and I'd like to suggest a possible solution: double buffering.

In a nutshell, write everything to one shared dict (A). Then, when it's time to collect, we redirect all writes to a second dict (B) so they don't block. collect is done only from A's data. For the next scrape, we treat A as B and vice versa.

After collect finishes we'll have to:

  1. Clear A it so there's no leftover data.
  2. Since B was "empty" when we swapped, we also need to update B with the data that was previously in A.

(2) is where it can get a bit hairy but I think it's doable (maybe using CRDTs)?

@dolik-rce
Copy link
Contributor

This might work in theory... I see few practical problems:

  1. In step 1, the only way to clear the dict A is to iterate over get_keys, adding even more overhead.
  2. In step 2, copying the previous contents from A to B
    a. should not overwrite gauges, only increase counters
    b. requires (again) iterating over get_keys
  3. Switches between writing to A/B would have to be atomic, i.e. happening in all nginx workers at the same time.
  4. You'd have to make sure that there is always at most one nginx worker calling collect. That means using locks and potentially blocking other requests asking for metrics. This might actually be acceptable trade-off, since metrics are usually collected just a few times per minute.

First two problems are probably easily solvable (e.g. by doing everything in one iteration). The other two have solution too... Locks can be implemented on top of dict. Atomicity can also be ensured by using some indirection and dict. The only question is, how much slower it would be and how much of a trouble is that for users.

Anyway, I would be happy to test it, if you have time to code it.

@knyar
Copy link
Owner

knyar commented Jun 13, 2019

That's a good idea, thanks for suggesting it.

As you mentioned, actually merging results back is the tricky bit; while counters can just be incremented atomically, for gauges there will be a race between the thread copying old data from A to B and other threads changing gauge values. One way to avoid this is to actually have three dicts - two for counters, and a single one for gauges (that will not be swapped). I suspect most users will have significantly more counters than gauges (especially given that histograms are implemented with counters), so this will have the desired performance benefit.

Note, that shared dictionaries are defined outside of the library, so this will not be backwards compatible; users will need to provide several dicts now instead of one.

Finally, we'll need to implement locking to make sure collect is not called concurrently. There are libraries (example) implementing locks on top of dictionaries, but this will add additional complexity.

We'd probably also want to have some sort of performance benchmark added first, and then use it to validate performance improvements from this change.

In general, this is not something I am planning to implement myself, but will be happy to collaborate on design and review the code.

@knyar
Copy link
Owner

knyar commented Jun 13, 2019

Oh nice, @dolik-rce beat me to the first response with a very similar set of concerns :) Thanks!

I agree that switching between dicts atomically is another tricky bit about this.

My previous thinking about resolving the performance issue was to add support for other key/value stores (besides builtin dict) that provide a similar API - for example, an external Redis instance. I suspect this actually might be easier to implement correctly than doing the double dictionary dance.

@juancampa
Copy link
Author

Thanks for reading and writing back your concerns @knyar @dolik-rce, that's really helpful. We're thinking about using this library and just wanted to make sure that, if we ever hit a performance wall, there are ways around it. 🙌

I won't be working on this any time soon, unless it affects us, which it hasn't yet. So I'm gonna go ahead and close this one.

Thanks again!

@dolik-rce
Copy link
Contributor

My previous thinking about resolving the performance issue was to add support for other key/value stores (besides builtin dict) that provide a similar API

Well, I actually use a patched version of this module using yet another approach to avoid get_keys... If you store the keys as values in the dictionary, with well known keys (e.g. using sequence of integers), it is quite simple to read them back later using just non-blocking get calls. The trade-off here is that you use more space in the dictionary. The only tricky bit in this approach is storing the keys atomically, which I solved by using locks. But thinking of it now, it might be possible to do using just the properties of the dict. I'll have to look at it again 😄

@knyar
Copy link
Owner

knyar commented Jun 14, 2019

That's also a nice idea, @dolik-rce. I think you don't need locking if you can accept the same key being stored multiple times when several threads are racing to create a new metric, and then ignore duplicate names while reading the keys.

I will be curious to see your current implementation. The way I would probably do it is:

  1. reserve part of the dict key space for tracking metric names - for example, keys starting with __metric.
  2. store metric names using keys like __metric_{idx} (where idx is an integer counter).
  3. create a __metric_last integer key which will store the index of the last metric name.

Then, while writing a value:

  1. check if there is an existing value stored for this metric.
  2. if there is no value (metric does not exist), increment __metric_last. incr will return the new value (idx).
  3. write metric value.
  4. write metric name as __metric_{idx}.

If multiple threads race between steps 1 & 3, the same metric name will be written with several indexes.

While collecting metrics:

  1. use a separate lua table to keep track of metric names that have been processed.
  2. iterate from 1 to __metric_last to get all metric names. Discard duplicates.
  3. read all metric values.

What do you think?

@knyar knyar reopened this Jun 14, 2019
@knyar knyar changed the title Double buffering to prevent complete lock Performance improvements around dictionary locking Jun 14, 2019
@dolik-rce
Copy link
Contributor

That is actually almost exactly the algorithm I use, good to see you arrived to same conclusions :-) The only part I didn't think of is resolving the race condition when collecting the metrics (I actually found there is one just last week, when looking at the code because of this issue). Additionally, I used a table to keep all the metrics keys in each worker. This table can be quickly updated by iterating only over newer indexes (since we only add metrics, never modify or remove) and I used it to implement nice and fast prometheus::get_keys() method.

I think it could actually be even slightly simpler if you omit the __metric_last. You don't actually need it, since you can iterate over the __metric_{idx} keys in a while loop and stop when you get first nil. Inserting would be slightly more complicated, but I think it might be possible to do it in safe and lock-less way. I'd keep a variable to track last known metric count and just try safe_add, until it succeeds. In most cases, all the metrics are created very soon after start of the server, so it would not present much overhead later when the application is already warmed up.

@knyar
Copy link
Owner

knyar commented Apr 17, 2020

FYI, there have been some recent performance improvements for counter metrics that should decrease lock contention: b7c8feb...d8c2813

Counters are now incremented independently by each worker process, with deltas flushed regularly to the shared dictionary asynchronously. Changes are now in master, and I will cut a new release in a few days.

Even though implementation is different to the one described in this issue, I think it largely addresses performance concerns. I will close this issue now, but please feel free to reopen (or create a new issue or a PR) if you would like to suggest further improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants