-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(server): OpenTelemetry integration #7356
Conversation
Deploying with
|
Latest commit: |
90166eb
|
Status: | ✅ Deploy successful! |
Preview URL: | https://43e1673d.immich.pages.dev |
Branch Preview URL: | https://feat-server-otlp.immich.pages.dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to have this be something that is turned on or off from the administration settings? Can it be controlled at runtime?
To me this is something that is technical enough (also requiring deployment of other tools in the environment and such) that enabling it through an env var seems more logical. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome stuff! Really looking forward to this
What other tools are required? You basically just need to point a pre-existing prom instance at the /metrics endpoint. |
3be4fc1
to
5ace57a
Compare
55a8e5f
to
23f22bc
Compare
It looks like the functionality I'm looking for to turn spans to metrics comes with the Span Metrics Connector. But that would mean pushing to an OpenTelemetry Collector and integrating that with Prometheus. Hmm. |
0d2ecef
to
2fdf78c
Compare
Alright, after some wrestling I finally managed to export the execution time metrics to Prometheus. It uses histograms for metrics rather than spans/traces. |
Also, anything returning a Promise has now been made async because this is the only way for the decorator to know it should await the call. |
fc57aeb
to
4d38536
Compare
The SDK can be turned on or off, but it also has a bunch of env variables built into it already so any configuration we add is kinda redundant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff!
3941634
to
0bb473d
Compare
b36cefa
to
5f94dd6
Compare
87e29ce
to
f8d6399
Compare
Mert, please feel free to merge this after resolving the conflicts |
I just like being able to turn it on without having to restart the instance. I'm thinking of a troubleshooting situation where I turn on the sdk from the Admin Settings, then go to the problematic view, generate the monitoring, then go back and turn it off. It just seems like it could be more convenient than having to set and ENV and restart it, but that's just my opinion at least. |
fix typing
formatting
Co-authored-by: Daniel Dietzler <36593685+danieldietzler@users.noreply.github.com>
remove prometheus data
disable nestjs-otel stuff by default update imports
formatting formatting
540fbfe
to
2d75bef
Compare
That makes sense - we can possibly integrate this into the config repo in a later PR. But I think there are some questions to answer around how the SDK actually handles reconfiguration and restarts, whether it's okay to always wrap repo methods with the instrumentation decorator so it can be turned on or off dynamically, etc. |
Description
This PR greatly simplifies profiling performance issues by using OpenTelemetry. The current state of the PR is the addition of a
/metrics
endpoint for Prometheus that includes duration, count and sum metrics for repository methods and HTTP requests as well as host metrics. Additionally, instrumentation for Nest, Postgres and Redis is available if an OTLP exporter is configured through env variables.To do: