Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: limit lookup table size #151

Merged
merged 8 commits into from
Nov 1, 2022
Merged

Conversation

kingluo
Copy link
Contributor

@kingluo kingluo commented Oct 17, 2022

Each labels values combination would insert a new tree into the lookup table.
If that combinatioin is discarded later, and keep creating new combinations constantly, the lookup table would grow forever and cause memory leak.

Check apisix issue here as example:

apache/apisix#7949 (comment)

The simplest solutioin is to truncate the whole table when it exceeds a max limit, and the limit size is configurable.

Related to #150

Copy link
Owner

@knyar knyar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be interested in also adding a test for this?

prometheus.lua Outdated
@@ -855,18 +866,18 @@ do
end

-- Public function to register a counter.
function Prometheus:counter(name, help, label_names)
return register(self, name, help, label_names, nil, TYPE_COUNTER)
function Prometheus:counter(name, help, label_names, lookup_max_size)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless this something that we would expect many users have a need to set separately for each metric, I would suggest to instead add this as a global parameter passed as part Prometheus.init (alongside error_metric_name and sync_interval)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Global setting is ok.

prometheus.lua Outdated
@@ -807,6 +816,8 @@ local function register(self, name, help, label_names, buckets, typ)
-- ['my.net']['200'][LEAF_KEY] = 'http_count{host="my.net",status="200"}'
-- ['my.net']['500'][LEAF_KEY] = 'http_count{host="my.net",status="500"}'
lookup = {},
lookup_size = 0,
lookup_max_size = lookup_max_size or 100,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts on whether 100 will be sufficient for most users? On my personal web server I currently have ~500 metric label combinations for the nginx_http_requests_total metric that has two fields (server name and response code).

Given that most users will probably have a small number of metrics, but potentially large metric cardinality (at least host + status) I would probably suggest a much higher default (at least 1000).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is definitely not enough 🙂 Our servers range from 200 to about 3000 timeseries in total, and we have quite simple APIs with just a few endpoints (but a lot of various metrics).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

@dolik-rce
Copy link
Contributor

I'm not sure if I understand the problem correctly, but it seems there is something amiss. I see at least two issues here:

First, the linked issue in apisix mentions that they create and discard large amounts of metric. That is definitely not a regular and intended usage of this plugin and it's not surprising that it might cause unexpected problems.

Second, if the lookup table grows indefinitely when adding metrics (and does not shrink when metrics are removed) wouldn't it be a better to fix it rather then put hard limit on its size? If there is a bug that causes old data to remain in the lookup table, it seems to me that the correct fix should be to make sure that the outdated data are discarded properly.

@kingluo
Copy link
Contributor Author

kingluo commented Oct 20, 2022

Would you be interested in also adding a test for this?

Yes, should I add code into the existing integration test?

@kingluo
Copy link
Contributor Author

kingluo commented Oct 20, 2022

I'm not sure if I understand the problem correctly, but it seems there is something amiss. I see at least two issues here:

First, the linked issue in apisix mentions that they create and discard large amounts of metric. That is definitely not a regular and intended usage of this plugin and it's not surprising that it might cause unexpected problems.

Second, if the lookup table grows indefinitely when adding metrics (and does not shrink when metrics are removed) wouldn't it be a better to fix it rather then put hard limit on its size? If there is a bug that causes old data to remain in the lookup table, it seems to me that the correct fix should be to make sure that the outdated data are discarded properly.

I don't think so.
The lookup table is an auxiliary table tree to boost the metric full name generation. It should be size limited, because you could not assume how many label values combinations. Sometimes, there is no old data, for example, you add a label session_id to represent the unique id of one http session, and in busy site, there may be ton of thousands of alive and long sessioins, and the number of sessions are keep growing. You could not break the memory usage only for the sake of string concat efficiency, after all, the correctness and robustness is the first factor, right?

Of course, for the referenced case, it's an extreme case because the data keep growing and deleted. We could also make a workaround upon this library, i.e. reset the metric. But it's always a good thing to improve the robustness of the library, and note that it would not break the normal use cases, because normally the number of combinations is small.

@dolik-rce
Copy link
Contributor

for example, you add a label session_id to represent the unique id of one http session

That is exactly what I'm saying - this is not intended use of prometheus, therefore this plugin is not optimized for such use. Even the official prometheus docs warn that you should not use any unbound IDs or similar high-cardinality values in labels. See for example here: https://prometheus.io/docs/practices/naming/#labels.

@kingluo
Copy link
Contributor Author

kingluo commented Oct 20, 2022

for example, you add a label session_id to represent the unique id of one http session

That is exactly what I'm saying - this is not intended use of prometheus, therefore this plugin is not optimized for such use. Even the official prometheus docs warn that you should not use any unbound IDs or similar high-cardinality values in labels. See for example here: https://prometheus.io/docs/practices/naming/#labels.

I check the doc, and I also agree that we should make a good business logic to use metrics.

But in reality, how to handle session_id case? You could not say it's wrong, because it's real requirement. API gateway serves many upstream web servers and gather metrics for them, so this issue is more significant.

I think we should distinguish the memory to store the metric data and internal cache. For the previous one, it must not be optmized out, because it stores the real data we need. But for the latter one, it should be controlled in a reasonable threshold, after all, it's just lookup cache, which could be removed safely. The internal cache is implementation related only.

@knyar
Copy link
Owner

knyar commented Oct 20, 2022

This lookup table is indeed a cache introduced to decrease latency, and I agree that we should either invalidate it explicitly, or build some other mechanism to avoid unbounded growth.

@dolik-rce is totally right saying that Prometheus is typically not the best choice for high cardinality labels, but in practice Prometheus metric exposition format is becoming pretty standard in the industry. I can imagine that some alternatives to the original Prometheus server might be built with better support for high-cardinality use cases, and users of such systems might find this library useful.

The lookup table is stored per worker, and while in theory I guess you could expand del to remove full metric names from it when corresponding labels are removed, this will require some cross-worker coordination that will increase complexity of the library. Truncating the table when it gets too large seems much simpler, and if we choose lookup_max_size to be large enough, it should not impact most of the users that have a sensible number of metric labels.

@knyar
Copy link
Owner

knyar commented Oct 20, 2022

It is definitely not enough 🙂 Our servers range from 200 to about 3000 timeseries in total, and we have quite simple APIs with just a few endpoints (but a lot of various metrics).

@dolik-rce – note that the proposed max size limit is per metric. Do you have any individual metrics with a total number of time series > 1000? I am trying to get a feel for what a good default value for this should be.

Yes, should I add code into the existing integration test?

Not sure if the integration test is the right place for this, unless there is a good way to actually test for memory leaks. But at the minimum we should update the lua tests to make sure that the table gets cleaned up when it grows too large, for all metric types.

@dolik-rce
Copy link
Contributor

Do you have any individual metrics with a total number of time series > 1000? I am trying to get a feel for what a good default value for this should be.

By a quick look on our servers, the biggest single metric I found has around 1200 unique label combinations right now. This is for request duration histogram, with only 25 distinct endpoints, 14 http status codes in 12 buckets. I can imagine that some real-life servers might have even more endpoints, which would make the number grow even larger, if they are all called over time.

@knyar
Copy link
Owner

knyar commented Oct 21, 2022

By a quick look on our servers, the biggest single metric I found has around 1200 unique label combinations right now. This is for request duration histogram, with only 25 distinct endpoints, 14 http status codes in 12 buckets.

Thank you! I will note that for histograms the lookup table value is a list of metrics for each "real" label combination, so in your case it should contain 350 entries (25 * 14) with 14 metric names in each.

@kingluo – thanks again for investigating this issue and for preparing the PR. What I would suggest next is:

  • move this configuration parameter into the options_or_prefix argument of Prometheus.init.
  • set the default value to 1000.
  • expand tests to cover this behavior.

@kingluo
Copy link
Contributor Author

kingluo commented Nov 1, 2022

@knyar Please review. I add test case to check if lookup_size is updated and if lookup table get truncated.


info := get_lookup_size(client)
j := i % 1000 + 1;
if info.Counter != j || info.Gauge != j || info.Histogram != j {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will actually work, since lookup table is per worker, and clients cannot control which of the workers a request will be processed by.

A simple lua test should be sufficient here I believe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@knyar Sorry, it's my mistake. In fact, I test it in my local env with worker_processes 1. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants