Questions about performance query and storage cost #2345

Mojito2110 · 2022-07-08T10:00:57Z

Mojito2110
Jul 8, 2022

Hello

In my organization, we are looking at solutions for long term storage and global view with our Prometheus infrastructures. Among the solutions we are looking closely at Mimir and Thanos.
There are two major problems but we don't necessarily have the answers:

the cost (volume) on the S3 storage which is a solution hosted in the organization (on premise)
the performance on long time ranges

Are there any calculation rules available to evaluate the amount of disk needed to store historical data for a given period of time, depending on the number of samples ingested in Prometheus? Does Mimir apply a compression rate to the data?

For performance, are there any benchmarks or reports to evaluate the execution time of queries with Mimir on data with long term storage?

Without these elements, it is difficult to know if Mimir (or Thanos) is a good candidate for our needs.

Best regards

Andysimcoe · 2022-07-12T10:31:49Z

Andysimcoe
Jul 12, 2022

The answers will depend on your setup but I might be able to give you some hints (hopefully corrected if wrong).

First off there's some capacity planning tips here - https://grafana.com/docs/mimir/v2.1.x/operators-guide/running-production-environment/planning-capacity/

Both solutions use Prometheus Storage Engine (2). Unlike solutions such as VictoriaMetrics. So there is no compression as such, there is a compactor component (https://grafana.com/docs/mimir/v2.1.x/operators-guide/architecture/components/compactor/) that will make the LTS more efficient. But, if you're already running Prometheus you could have a rough estimate by multiplying by your desired retention period, eg if you're storing 15 days in Prometheus right now and you want a year you could get the 'worst case scenario'.

As for performance for the LTS, you can configure the length of time your queries should use LTS. In other words your queries hit the ingesters first, these values are configurable so if you find most of your queries are for 8 hours, you could easily have those satisfied by the ingesters which have their own persistent storage or use memcached with the query frontend depending on cost concerns. I believe Grafana found most of their queries were under 2 hours though.

Caching the query data - https://grafana.com/docs/mimir/v2.1.x/operators-guide/architecture/components/compactor/ - I don't think Thanos offers this - Thanos can cache metadeta but not the query results.

There's also a tool for comparing query performance - https://grafana.com/docs/mimir/v2.1.x/operators-guide/tools/query-tee/ so if you do setup a PoC you could confirm your findings.

5 replies

Mojito2110 Jul 13, 2022
Author

Hello
Thank you for your answer. About the capacity plan on the S3 storage, can we say if my Mimir deployment has 3 ingesters, the storage volume will be multiplied by 3.
So if all the TSDB volumes of the Prometheus servers are 2 TB per month, I would need for a retention period of one year 2 TB * 12 months * 3 ingesters = 36 TB

pstibrany Jul 13, 2022
Maintainer

if my Mimir deployment has 3 ingesters, the storage volume will be multiplied by 3.

This is only temporary. Compactor will compact blocks from 3 ingesters into single block, and later delete those 3 original blocks. Compactor then also compacts blocks into larger (6h, 24h) blocks, saving space used by repeated index.

So if all the TSDB volumes of the Prometheus servers are 2 TB per month, I would need for a retention period of one year 2 TB * 12 months * 3 ingesters = 24 TB

In the worst case, you will need 24 TB (2 TB per month * 12 months), but not multiplied by 3. What I mean by "worst case" is that data from multiple Prometheus servers will be stored into a single block in Mimir, and possibly saving little extra space.

Mojito2110 Jul 13, 2022
Author

Thank you all, great job again

pracucci Jul 15, 2022
Maintainer

there is no compression as such

I want to clarify this. Samples (timestamp-value pairs) are compressed in Mimir, both in ingesters memory and in the long-term storage (blocks). At Grafana Labs, across our customers, we see about 1.3 bytes per sample after compression (raw would be 16 bytes = 8 bytes timestamp + 8 bytes value).

Andysimcoe Jul 15, 2022

there is no compression as such

I want to clarify this. Samples (timestamp-value pairs) are compressed in Mimir, both in ingesters memory and in the long-term storage (blocks). At Grafana Labs, across our customers, we see about 1.3 bytes per sample after compression (raw would be 16 bytes = 8 bytes timestamp + 8 bytes value).

That's interesting, the migration tool, let's say Thanos to Mimir looks to really be updating the tenant label. Is it possible to go back? The changes the compactor is making, could you easily migrate to an existing solution such as Thanos?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about performance query and storage cost #2345

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Questions about performance query and storage cost #2345

Mojito2110 Jul 8, 2022

Replies: 1 comment · 5 replies

Andysimcoe Jul 12, 2022

Mojito2110 Jul 13, 2022 Author

pstibrany Jul 13, 2022 Maintainer

Mojito2110 Jul 13, 2022 Author

pracucci Jul 15, 2022 Maintainer

Andysimcoe Jul 15, 2022

Mojito2110
Jul 8, 2022

Replies: 1 comment 5 replies

Andysimcoe
Jul 12, 2022

Mojito2110 Jul 13, 2022
Author

pstibrany Jul 13, 2022
Maintainer

Mojito2110 Jul 13, 2022
Author

pracucci Jul 15, 2022
Maintainer