Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhead from the pg_stat_kcache extension #41

Closed
vitabaks opened this issue Jan 18, 2024 · 7 comments
Closed

Overhead from the pg_stat_kcache extension #41

vitabaks opened this issue Jan 18, 2024 · 7 comments
Assignees

Comments

@vitabaks
Copy link

vitabaks commented Jan 18, 2024

Please take a look at the following results of the synthetic (read-only) pgbench test
Which we run on servers: c3-standard-176 (176 vCPU Intel, 704 GB Memory) and c3d-standard-360 (360 vCPU AMD, 1440 GB Memory).

We observe the degradation with over 100 clients:

[c1|workload_pgbench] 2024-01-05 04:03:30 tps = 12177
[c50|workload_pgbench] 2024-01-05 04:09:01 tps = 565902
[c100|workload_pgbench] 2024-01-05 04:14:34 tps = 696601
[c150|workload_pgbench] 2024-01-05 04:20:07 tps = 402113
[c200|workload_pgbench] 2024-01-05 04:25:40 tps = 314192
[c250|workload_pgbench] 2024-01-05 04:31:14 tps = 289959
[c300|workload_pgbench] 2024-01-05 04:36:47 tps = 281305
[c350|workload_pgbench] 2024-01-05 04:42:20 tps = 285510
[c400|workload_pgbench] 2024-01-05 04:47:53 tps = 277419

Analyzing the expectation event profile (based on pg_wait_sampling), we see how the number of LWLock:pg_stat_kcache wait events increases with an increase in the number of clients, until eventually pg_stat_kcache becomes the TOP-1 event in the profile.

1 client:

  • No 'pg_stat_kcache' wait event

50 clients:

  • 'pg_stat_kcache' wait event is present

100 clients:

  • 'pg_stat_kcache' wait event in TOP-5

150 clients:

  • 'pg_stat_kcache' wait event is TOP-1

In the attachment you will find artifacts including settings, postgres stats, logs, and more:

@NikolayS
Copy link

NikolayS commented Jan 18, 2024

Thanks @vitabaks for posting! Worth noting that this comes from our (postgres.ai) new bot activities; it's working on top of gpt4 turbo with lots of additional components https://twitter.com/samokhvalov/status/1743151620555477083

@anayrat raised a couple of questions in https://twitter.com/Adrien_nayrat/status/1744288348217151991

the whole pipeline is here, we collect 70+ artifacts for each iteration, browsable here (or just see the .zip provided above, it's the same). The non-AI part of the automation is here. We can quickly reproduce things if additional checks are needed, but it should be straightforward on any machine.

Also interesting that pgss demonstrates noticeable overhead as well at this scale for trivial pgbench workloads https://twitter.com/postgres_ai/status/1747690825709215793 – obviously, contention to update stats for a single query record. But pgss overhead is much, much lower than pgsk's one – so question is, why so significant difference.

@vitabaks vitabaks changed the title Overhead from the pg_kstat_cache extension Overhead from the pg_stat_kcache extension Jan 18, 2024
@rjuju rjuju self-assigned this Jan 19, 2024
@rjuju
Copy link
Member

rjuju commented Jan 19, 2024

Hi @vitabaks

I'm assuming that the bottleneck is coming from the internal lock that protects the array where we store the queryid for each backend in case parallel workers will be launched. That lock was initially added as a precaution but there shouldn't be any risk of concurrent modification while reading the value so I don't think it's necessary. That lock should have been harmless but indeed in case of high client count I can see how it would affect performance.

Can you try with the "remove_queryids_lock" branch that I just pushed? https://github.com/powa-team/pg_stat_kcache/tree/remove_queryids_lock

@vitabaks
Copy link
Author

vitabaks commented Jan 19, 2024

Can you try with the "remove_queryids_lock" branch that I just pushed? https://github.com/powa-team/pg_stat_kcache/tree/remove_queryids_lock

@rjuju Thanks for the quick response.

I tested your patch (on c3d-standard-360), here is the result:

Result:

Without a patch:

[c200_kcache_release|workload_pgbench] 2024-01-19 09:12:46 tps = 358295
  • 'pg_stat_kcache' wait event is TOP-1

With a patch:

[c200_kcache_remove_queryids_lock|workload_pgbench] 2024-01-19 09:23:11 tps = 976167
  • No 'pg_stat_kcache' wait event
  • Remains 'Timeout:SpinDelay' but this already (as it turned out) refers to pg_stat_statements extension.

Artifacts: https://gitlab.com/postgres-ai/postgresql-consulting/tests-and-benchmarks/-/jobs/5965632917/artifacts/browse/ARTIFACTS/

Conclusions

By eliminating the LWLock:pg_stat_kcache, we achieved a performance increase of over two times. Specifically, the second test shows a 172.5% increase in TPS compared to the first, indicating more than double the transaction throughput.

@rjuju
Copy link
Member

rjuju commented Jan 19, 2024

@vitabaks thanks for the testing! And that's a great news that this is enough to remove the overhead.

I just merged the commit in the main branch. I will wait a bit just in case and do a release early next week.

@anayrat
Copy link
Member

anayrat commented Jan 19, 2024

Thanks guys !

@vitabaks
Copy link
Author

Thanks!

@rjuju
Copy link
Member

rjuju commented Jan 24, 2024

I just released version 2.2.3! Thanks again for the report and testing the patch!

@rjuju rjuju closed this as completed Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants