Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose LSN and replication delay as metrics #7610

Merged
merged 4 commits into from
May 8, 2024
Merged

Conversation

save-buffer
Copy link
Contributor

Problem

We currently have no way to see what the current LSN of a compute its, and in case of read replicas, we don't know what the difference in LSNs is.

Summary of changes

Adds these metrics

Copy link

github-actions bot commented May 3, 2024

2886 tests run: 2759 passed, 0 failed, 127 skipped (full report)


Flaky tests (3)

Postgres 16

  • test_gc_aggressive: debug
  • test_vm_bit_clear_on_heap_lock: debug

Postgres 14

Code coverage* (full report)

  • functions: 31.4% (6241 of 19881 functions)
  • lines: 47.1% (46742 of 99293 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
057672c at 2024-05-06T22:59:58.713Z :recycle:

vm-image-spec.yaml Outdated Show resolved Hide resolved
@hlinnaka
Copy link
Contributor

hlinnaka commented May 5, 2024

What is the difference between the neon_collector and neon_collector_autoscaling sections in the yaml file? Which metrics are supposed to go in which?

@hlinnaka
Copy link
Contributor

hlinnaka commented May 5, 2024

What is the difference between the neon_collector and neon_collector_autoscaling sections in the yaml file? Which metrics are supposed to go in which?

Ok, found the explanation in the commit message of commit 4b55dad:

As discussed in neondatabase/autoscaling#895, we want to have a separate sql_exporter for simple metrics to avoid overload the database because the autoscaling agent needs to scrape at a higher interval. The new exporter is exposed at port 9499.

So if I understand correctly, the point of the "autoscaling" metrics is to expose metrics that will be used by autoscaling, to make scaling decisions. The replication lag metrics doesn't seem necessary for that. So I think these new metrics should only be added to the neon_collector metrics, not neon_collector_autoscaling.

@save-buffer
Copy link
Contributor Author

So I think these new metrics should only be added to the neon_collector metrics, not neon_collector_autoscaling.

Got it, good catch! I just kind of assumed they were there because we for some reason had different specs for pods and vms. Just removed it.

vm-image-spec.yaml Outdated Show resolved Hide resolved
vm-image-spec.yaml Outdated Show resolved Hide resolved
vm-image-spec.yaml Outdated Show resolved Hide resolved
@andreasscherbaum andreasscherbaum added the c/storage/compute Component: storage: compute label May 8, 2024
@save-buffer save-buffer merged commit 21e1a49 into main May 8, 2024
56 checks passed
@save-buffer save-buffer deleted the sasha_compute_metrics branch May 8, 2024 15:49
a-masterov pushed a commit that referenced this pull request May 20, 2024
## Problem
We currently have no way to see what the current LSN of a compute its,
and in case of read replicas, we don't know what the difference in LSNs
is.

## Summary of changes
Adds these metrics
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/compute Component: storage: compute
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants