bugfix: limit lookup table size #151

kingluo · 2022-10-17T08:53:04Z

Each labels values combination would insert a new tree into the lookup table.
If that combinatioin is discarded later, and keep creating new combinations constantly, the lookup table would grow forever and cause memory leak.

Check apisix issue here as example:

apache/apisix#7949 (comment)

The simplest solutioin is to truncate the whole table when it exceeds a max limit, and the limit size is configurable.

Related to #150

knyar

Would you be interested in also adding a test for this?

knyar · 2022-10-19T22:08:47Z

prometheus.lua

@@ -855,18 +866,18 @@ do
 end

 -- Public function to register a counter.
-function Prometheus:counter(name, help, label_names)
-  return register(self, name, help, label_names, nil, TYPE_COUNTER)
+function Prometheus:counter(name, help, label_names, lookup_max_size)


Unless this something that we would expect many users have a need to set separately for each metric, I would suggest to instead add this as a global parameter passed as part Prometheus.init (alongside error_metric_name and sync_interval)

Global setting is ok.

knyar · 2022-10-19T22:12:46Z

prometheus.lua

@@ -807,6 +816,8 @@ local function register(self, name, help, label_names, buckets, typ)
    -- ['my.net']['200'][LEAF_KEY] = 'http_count{host="my.net",status="200"}'
    -- ['my.net']['500'][LEAF_KEY] = 'http_count{host="my.net",status="500"}'
    lookup = {},
+    lookup_size = 0,
+    lookup_max_size = lookup_max_size or 100,


Any thoughts on whether 100 will be sufficient for most users? On my personal web server I currently have ~500 metric label combinations for the nginx_http_requests_total metric that has two fields (server name and response code).

Given that most users will probably have a small number of metrics, but potentially large metric cardinality (at least host + status) I would probably suggest a much higher default (at least 1000).

It is definitely not enough 🙂 Our servers range from 200 to about 3000 timeseries in total, and we have quite simple APIs with just a few endpoints (but a lot of various metrics).

dolik-rce · 2022-10-20T06:39:14Z

I'm not sure if I understand the problem correctly, but it seems there is something amiss. I see at least two issues here:

First, the linked issue in apisix mentions that they create and discard large amounts of metric. That is definitely not a regular and intended usage of this plugin and it's not surprising that it might cause unexpected problems.

Second, if the lookup table grows indefinitely when adding metrics (and does not shrink when metrics are removed) wouldn't it be a better to fix it rather then put hard limit on its size? If there is a bug that causes old data to remain in the lookup table, it seems to me that the correct fix should be to make sure that the outdated data are discarded properly.

kingluo · 2022-10-20T08:46:30Z

Would you be interested in also adding a test for this?

Yes, should I add code into the existing integration test?

kingluo · 2022-10-20T08:58:37Z

I'm not sure if I understand the problem correctly, but it seems there is something amiss. I see at least two issues here:

First, the linked issue in apisix mentions that they create and discard large amounts of metric. That is definitely not a regular and intended usage of this plugin and it's not surprising that it might cause unexpected problems.

Second, if the lookup table grows indefinitely when adding metrics (and does not shrink when metrics are removed) wouldn't it be a better to fix it rather then put hard limit on its size? If there is a bug that causes old data to remain in the lookup table, it seems to me that the correct fix should be to make sure that the outdated data are discarded properly.

I don't think so.
The lookup table is an auxiliary table tree to boost the metric full name generation. It should be size limited, because you could not assume how many label values combinations. Sometimes, there is no old data, for example, you add a label session_id to represent the unique id of one http session, and in busy site, there may be ton of thousands of alive and long sessioins, and the number of sessions are keep growing. You could not break the memory usage only for the sake of string concat efficiency, after all, the correctness and robustness is the first factor, right?

Of course, for the referenced case, it's an extreme case because the data keep growing and deleted. We could also make a workaround upon this library, i.e. reset the metric. But it's always a good thing to improve the robustness of the library, and note that it would not break the normal use cases, because normally the number of combinations is small.

dolik-rce · 2022-10-20T12:43:51Z

for example, you add a label session_id to represent the unique id of one http session

That is exactly what I'm saying - this is not intended use of prometheus, therefore this plugin is not optimized for such use. Even the official prometheus docs warn that you should not use any unbound IDs or similar high-cardinality values in labels. See for example here: https://prometheus.io/docs/practices/naming/#labels.

kingluo · 2022-10-20T13:40:44Z

for example, you add a label session_id to represent the unique id of one http session

That is exactly what I'm saying - this is not intended use of prometheus, therefore this plugin is not optimized for such use. Even the official prometheus docs warn that you should not use any unbound IDs or similar high-cardinality values in labels. See for example here: https://prometheus.io/docs/practices/naming/#labels.

I check the doc, and I also agree that we should make a good business logic to use metrics.

But in reality, how to handle session_id case? You could not say it's wrong, because it's real requirement. API gateway serves many upstream web servers and gather metrics for them, so this issue is more significant.

I think we should distinguish the memory to store the metric data and internal cache. For the previous one, it must not be optmized out, because it stores the real data we need. But for the latter one, it should be controlled in a reasonable threshold, after all, it's just lookup cache, which could be removed safely. The internal cache is implementation related only.

knyar · 2022-10-20T19:17:38Z

This lookup table is indeed a cache introduced to decrease latency, and I agree that we should either invalidate it explicitly, or build some other mechanism to avoid unbounded growth.

@dolik-rce is totally right saying that Prometheus is typically not the best choice for high cardinality labels, but in practice Prometheus metric exposition format is becoming pretty standard in the industry. I can imagine that some alternatives to the original Prometheus server might be built with better support for high-cardinality use cases, and users of such systems might find this library useful.

The lookup table is stored per worker, and while in theory I guess you could expand del to remove full metric names from it when corresponding labels are removed, this will require some cross-worker coordination that will increase complexity of the library. Truncating the table when it gets too large seems much simpler, and if we choose lookup_max_size to be large enough, it should not impact most of the users that have a sensible number of metric labels.

knyar · 2022-10-20T23:40:09Z

It is definitely not enough 🙂 Our servers range from 200 to about 3000 timeseries in total, and we have quite simple APIs with just a few endpoints (but a lot of various metrics).

@dolik-rce – note that the proposed max size limit is per metric. Do you have any individual metrics with a total number of time series > 1000? I am trying to get a feel for what a good default value for this should be.

Yes, should I add code into the existing integration test?

Not sure if the integration test is the right place for this, unless there is a good way to actually test for memory leaks. But at the minimum we should update the lua tests to make sure that the table gets cleaned up when it grows too large, for all metric types.

dolik-rce · 2022-10-21T04:21:26Z

Do you have any individual metrics with a total number of time series > 1000? I am trying to get a feel for what a good default value for this should be.

By a quick look on our servers, the biggest single metric I found has around 1200 unique label combinations right now. This is for request duration histogram, with only 25 distinct endpoints, 14 http status codes in 12 buckets. I can imagine that some real-life servers might have even more endpoints, which would make the number grow even larger, if they are all called over time.

knyar · 2022-10-21T08:05:51Z

By a quick look on our servers, the biggest single metric I found has around 1200 unique label combinations right now. This is for request duration histogram, with only 25 distinct endpoints, 14 http status codes in 12 buckets.

Thank you! I will note that for histograms the lookup table value is a list of metrics for each "real" label combination, so in your case it should contain 350 entries (25 * 14) with 14 metric names in each.

@kingluo – thanks again for investigating this issue and for preparing the PR. What I would suggest next is:

move this configuration parameter into the options_or_prefix argument of Prometheus.init.
set the default value to 1000.
expand tests to cover this behavior.

This reverts commit 21275ed.

kingluo · 2022-11-01T11:45:33Z

@knyar Please review. I add test case to check if lookup_size is updated and if lookup table get truncated.

knyar · 2022-11-01T22:50:28Z

integration/test.go

+
+        info := get_lookup_size(client)
+        j := i % 1000 + 1;
+        if info.Counter != j || info.Gauge != j || info.Histogram != j {


I don't think this will actually work, since lookup table is per worker, and clients cannot control which of the workers a request will be processed by.

A simple lua test should be sufficient here I believe.

@knyar Sorry, it's my mistake. In fact, I test it in my local env with worker_processes 1. :)

bugfix: limit lookup table size

91bcff5

knyar requested changes Oct 19, 2022

View reviewed changes

kingluo added 7 commits November 1, 2022 19:02

fix PR: globalize setting and add test case

c32fe70

add port 18003 mapping to run docker

50b09cc

luarocks install lua-cjson

21275ed

Revert "luarocks install lua-cjson"

55c74bb

This reverts commit 21275ed.

format plain json string

d8bd8a4

fix PR

2bd4c82

fix PR

19bbd5f

knyar reviewed Nov 1, 2022

View reviewed changes

knyar merged commit eb1876d into knyar:master Nov 1, 2022

leslie-tsang mentioned this pull request May 25, 2023

chore: upgrade the nginx-lua-prometheus version to 0.20221218 apache/apisix#9545

Merged

5 tasks

kingluo mentioned this pull request Jun 9, 2023

bug: apisix 内存管理-过期监控key不释放问题 apache/apisix#9627

Closed

theweakgod mentioned this pull request Jan 15, 2024

help request: 3.2.1 memory leak apache/apisix#10618

Open

knyar mentioned this pull request Apr 17, 2024

Reset metric name lookup table when another worker deletes a metric #171

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bugfix: limit lookup table size #151

bugfix: limit lookup table size #151

kingluo commented Oct 17, 2022 •

edited

Loading

knyar left a comment •

edited

Loading

knyar Oct 19, 2022

kingluo Oct 20, 2022

knyar Oct 19, 2022

dolik-rce Oct 20, 2022

kingluo Oct 20, 2022

dolik-rce commented Oct 20, 2022

kingluo commented Oct 20, 2022

kingluo commented Oct 20, 2022

dolik-rce commented Oct 20, 2022

kingluo commented Oct 20, 2022

knyar commented Oct 20, 2022 •

edited

Loading

knyar commented Oct 20, 2022

dolik-rce commented Oct 21, 2022

knyar commented Oct 21, 2022

kingluo commented Nov 1, 2022

knyar Nov 1, 2022

kingluo Nov 2, 2022

bugfix: limit lookup table size #151

bugfix: limit lookup table size #151

Conversation

kingluo commented Oct 17, 2022 • edited Loading

knyar left a comment • edited Loading

Choose a reason for hiding this comment

knyar Oct 19, 2022

Choose a reason for hiding this comment

kingluo Oct 20, 2022

Choose a reason for hiding this comment

knyar Oct 19, 2022

Choose a reason for hiding this comment

dolik-rce Oct 20, 2022

Choose a reason for hiding this comment

kingluo Oct 20, 2022

Choose a reason for hiding this comment

dolik-rce commented Oct 20, 2022

kingluo commented Oct 20, 2022

kingluo commented Oct 20, 2022

dolik-rce commented Oct 20, 2022

kingluo commented Oct 20, 2022

knyar commented Oct 20, 2022 • edited Loading

knyar commented Oct 20, 2022

dolik-rce commented Oct 21, 2022

knyar commented Oct 21, 2022

kingluo commented Nov 1, 2022

knyar Nov 1, 2022

Choose a reason for hiding this comment

kingluo Nov 2, 2022

Choose a reason for hiding this comment

kingluo commented Oct 17, 2022 •

edited

Loading

knyar left a comment •

edited

Loading

knyar commented Oct 20, 2022 •

edited

Loading