Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion in the documentation: How to store labels with many different values? #1015

Closed
xiegeo opened this Issue Aug 20, 2015 · 9 comments

Comments

Projects
None yet
3 participants
@xiegeo
Copy link

xiegeo commented Aug 20, 2015

In http://prometheus.io/docs/practices/naming/, there is a large warning:

CAUTION: Remember that every unique key-value label pair represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.

But if not labels, what should I use?

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 20, 2015

http://prometheus.io/docs/practices/instrumentation/#do-not-overuse-labels

You should use a general purpose processing system, such as Hadoop or Spark. Prometheus is intended as a real-time monitoring system that you can easily run and depend on in an emergency. High cardinality metrics tend to only grow and require work disproportionate to the benefit they bring you in such a system, and dealing with them in a less-realtime fashion is more appropriate.

@juliusv

This comment has been minimized.

Copy link
Member

juliusv commented Aug 20, 2015

@xiegeo How large is the cardinality of your intended label values? What matters most in the end is the total cardinality of what you're selecting in queries...

@xiegeo

This comment has been minimized.

Copy link
Author

xiegeo commented Aug 20, 2015

For my current project the values are going to be small, I am mostly concerned about the correct way to do things such that the tools can present and analyze my data without too much modification, and as an learning experience.

If I understand correctly, Prometheus is for someone who already have a Hadoop cluster or such to do the number crunching for the business side and uses Prometheus to monitor such clusters. My problem is different but the solution can be similar so I want to gave Prometheus a try.

I have 2 servers for redundancy and locality. I will put the Prometheus client on them since anything interesting the users do will go though them. The two servers run independently of each other. I want to build a dashboard that shows what going on on both servers using Prometheus.

I only have around 10 users (associates in a highly specialized field, so it will never get large). I also need to know where they are connecting from (ip addresses in the 100s, preferably also allow me to follow each tcp connection) and what projects (10s per user) they are working on, and other events (errors, commands of different types, bandwidth usage, network delay) should be easy to add when need.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 21, 2015

I think you've going to be near the edge of what a single Prometheus server can handle, and I wouldn't track individual TCP connections. If you're looking for network flow tracking there are tools out there designed for that purpose. I'd more generally recommend something like the ELK stack as this sounds more like an event logging use case than a timeseries monitoring use case.

@xiegeo

This comment has been minimized.

Copy link
Author

xiegeo commented Aug 21, 2015

So there is no way to put unique metadata per event in Prometheus? I am open to writing something my self that can take advantage of the go process monitoring, data collection and some of the graphing features of Prometheus, without the weight of a large mammal sized stack.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 22, 2015

So there is no way to put unique metadata per event in Prometheus?

No, Prometheus isn't an event logging database. Have you considered InfluxDB?

@xiegeo

This comment has been minimized.

Copy link
Author

xiegeo commented Aug 22, 2015

No, I did not. In my preliminary searches I dismissed InfluxDB as a cloud database provider. Now that you propped me for a better look, it does look like want I need.

@brian-brazil

This comment has been minimized.

Copy link
Member

brian-brazil commented Aug 23, 2015

Great, glad that we could help you.

@lock

This comment has been minimized.

Copy link

lock bot commented Mar 24, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Mar 24, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.