New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFE/Question: Using unique worker ID inplace of pidhash
label
#76
Comments
@amitsaha I agree that could, in theory, lead to a metric explosion. In practice, we don't have enough churn for this to occur. For this discussion, I assume you refer to the process ID when you say "worker id". I tried to recall the rationale behind Suggestions for other solutions to this problem are welcome! |
Hi @estahn thanks for getting back on this. Regarding the worker ID, I meant something like this. Each worker process irrespective of it's process ID is one of the N workers at any given point of time. So, if we assign a worker ID (like, 1, 2...N) to each worker, we can use that as an identifier. Then, we have a stable and a finite set of unique label values. However, as you mention that idea breaks for counter metrics. It would be fine for non-counter metrics of course since we are dealing with observations. However, may be we don't need counter metrics since prometheus gives us the count of each summary/histogram metrics for example automatically. What do you think? |
@amitsaha I see what you mean. In this case, how would you determine the number of killed/restarted processes over time? Is there a way to count the counter reset? |
@amitsaha After thinking about this a little another question came to mind. Assuming we implement this solution, how would you assign a worker ID (or index)? I see quite a couple of issues with keeping track of these indices, e.g.
If the process "Process ID 70" gets killed and a new process is forked (e.g. Process ID 130) would that be Index 2 or something else? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
* fix: high cardanality of pid_hash fix (#110) issue, by removing pid_hash and replace by slot label instead. fixed like prometheus best practices want see https://prometheus.io/docs/practices/naming/#labels * fix: change slot by child label fix (#110) issue, by removing pid_hash and replace by child label instead. fixed like prometheus best practices want see https://prometheus.io/docs/practices/naming/#labels * fix (#110) issue, remove "child" prefix from label "child" value. fixes #110 #76 @all-contributors please add itcsoft54 for code
* fix: high cardanality of pid_hash fix (#110) issue, by removing pid_hash and replace by slot label instead. fixed like prometheus best practices want see https://prometheus.io/docs/practices/naming/#labels * fix: change slot by child label fix (#110) issue, by removing pid_hash and replace by child label instead. fixed like prometheus best practices want see https://prometheus.io/docs/practices/naming/#labels * fix (#110) issue, remove "child" prefix from label "child" value. fixes #110 #76 @all-contributors please add itcsoft54 for code
Hi, thank you for the project. We can see that in #8, we started using
pid_hash
derived from PID and start time of the process. This will lead to a prometheus metric explosion (as per the caution in https://prometheus.io/docs/practices/naming/#labels). Could we instead use something like a "worker id" as a label instead? That way, we only have a finite number of label values. Happy to contribute a PR as well.Additional labels can be used to distinguish between multiple different instances of an application or across applications.
The text was updated successfully, but these errors were encountered: