Ever increasing 'stats_mysql_query_digest' results in sporadic client timings #4531

flopex · 2024-04-26T22:07:15Z

We recently ran into what seemed very high connection and execution timings from client to ProxySQL. After much debugging it turned out to be the ever increasing size of the stats_mysql_query_digest data and running TRUNCATE resolved our issues.

stat metric at time of issues:

| query_digest_memory          | 5264587687     |

What we found interesting was the pattern in the bad performing requests. We noticed a pattern as to when a bad request would happen.

This is the debugging test we ran, a series of 5000 tests against a ProxySQL host where we ran a simple SELECT 1 using Python, result timing in seconds.

Would be curious to know if there was an explanation to the pattern seen during these tests. What is the suggested mitigation/setup for keeping stats_mysql_query_digest at a reasonable size?

ProxySQL Version: 2.3.2
OS: Ubuntu 18.04

I'm aware of the other reports surrounding this issue, but wanted to get more insight on expected behaviour and possible "fixes".
Ref:
#3482 (comment)
#2368 (comment)

The text was updated successfully, but these errors were encountered:

renecannao · 2024-04-27T04:20:57Z

Hi @flopex , thank you for your report.
The memory structure in which query digest information are store is a hash table: best case and worse case complexity for lookup and and insert for a hash table are well documented, and I won't try to summarize it here.
But in a nutshell, both lookup and insert in a hash table should be very fast .

Why a large query_digest_memory can affect performance?
As explained earlier, lookup and insert in it shouldn't be a problem.
Although other reasons are possible.
For example, a large query_digest_memory may result in excessive memory usage and swapping. If your system is swapping, everything becomes extremely slow. This doesn't seem to be your case.

Your test with 5000 simple SELECT 1 shows an interesting result.
The vast majority of queries doesn't seem to face any performance issue. Yet, at regular interval, in a pattern that is almost predictable, you have a spike in execution time. (Note the highlight on at regular interval).
Is the spike in execution time really caused by the query digest table?
In this GH issue there is not enough evidence to be 100% sure, but I will assume this is true and evaluate possible reasons.

If the query digest table is responsible for these stalls it means that the query digest digest itself is blocked: some thread is performing some operation other than lookup or insert.
There are mostly 5 possible operations against query digest:

lookup of a single entry (to then update it counters)
insert of a new entry
traverse of the full table to generate memory information (query_digest_memory)
traverse of the full table to delete all its entries
traverse of the full table to generate stats_mysql_query_digest table on SQLite

The first 2 operations are extremely fast.
The other 3 operations are a lot slower, and they are also ordered in term of execution time (as at ProxySQL 2.3.2, things changed in newer version).

Because you have spikes at regular interval, I assume that the table is not truncated, thus point 4 shouldn't apply to you.
Therefore the stalls are either caused by queries on stats_mysql_query_digest (point 5) , or queries on stats_memory_metrics or Prometheus exporter (point 3).

Would be curious to know if there was an explanation to the pattern seen during these tests.

I hope the above explains it.

What is the suggested mitigation/setup for keeping stats_mysql_query_digest at a reasonable size?

I think the first step is to identify why stats_mysql_query_digest is so large.
What queries pattern are stored that cause so many entries in it?

And finally, as you would expect, "upgrade" is often a solution :)

In ProxySQL 2.4.0 parsing of queries and the query digests was extensively improved, and at least 3 new variables were introduced to control how query digest are generated.
In latest ProxySQL (2.6.2 , for future reference) there are 10 variables that control how query digests are generated: tuning them may reduce the size of stats_mysql_query_digest .
Thus an upgrade and tuning of these variable can reduce the size of query digest table.

Also, In ProxySQL 2.5.2 there is an improvement that makes point 5 (the generation of stats_mysql_query_digest) almost not blocking, using an adaptive algorithm that blocks query digest table for very very short time.
And again in ProxySQL 2.5.2 , TRUNCATE on stats_mysql_query_digest is a not blocking operation.

As a final note, in latest ProxySQL the 5 operations on query digest table ordered by blocking time should be:

TRUNCATE
lookup of a single entry (to then update it counters)
insert of a new entry
traverse of the full table to generate stats_mysql_query_digest table on SQLite
traverse of the full table to generate memory information (query_digest_memory)

flopex · 2024-04-29T16:28:04Z

Thank you for the very detailed response, it all makes more sense now. We'll look into further testing out these to find the root cause, but the Prometheus exporter does sound like an interesting one, we do run it and could explain the predictable timings.

flopex · 2024-05-08T23:14:51Z

Finally had some time to test this in a dev environment. I can confirm that it was the the proxysql_exporter scrape interval that was causing the stalls.

Series of 5000 tests with no spikes after disabling the exporter.

I was able to reproduce the stalls by disabling query_digests_no_digits, running a bunch of bogus SAVEPOINTS with unique identifiers and watching the query_digest_memory to grow in size.

renecannao · 2024-05-09T01:52:56Z

Hi @flopex

[...] a bunch of bogus SAVEPOINTS with unique identifiers [...]

I am not sure if this was only for testing purposes, or if you identified that this is the reason (a lot of savepoints with unique identifiers) why you have a large table.
But if this is the case, these "unique entries" should have a last_seen that is probably never updated.
This gave me an idea, described in #4543 .

flopex · 2024-05-09T14:44:47Z

Not only for testing, because we also noticed the bloat of the query-digest was because of Django, similar to this other issue https://groups.google.com/g/proxysql/c/MeCodsiqlo0

We found this to be the easiest way to fill up the stats metrics to similar levels seen in prod.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ever increasing 'stats_mysql_query_digest' results in sporadic client timings #4531

Ever increasing 'stats_mysql_query_digest' results in sporadic client timings #4531

flopex commented Apr 26, 2024

renecannao commented Apr 27, 2024

flopex commented Apr 29, 2024

flopex commented May 8, 2024

renecannao commented May 9, 2024

flopex commented May 9, 2024

Ever increasing 'stats_mysql_query_digest' results in sporadic client timings #4531

Ever increasing 'stats_mysql_query_digest' results in sporadic client timings #4531

Comments

flopex commented Apr 26, 2024

renecannao commented Apr 27, 2024

flopex commented Apr 29, 2024

flopex commented May 8, 2024

renecannao commented May 9, 2024

flopex commented May 9, 2024