Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce output size of metrics endpoint #152

Closed
mikelorant opened this issue Nov 10, 2023 · 3 comments · Fixed by #153
Closed

Reduce output size of metrics endpoint #152

mikelorant opened this issue Nov 10, 2023 · 3 comments · Fixed by #153
Assignees

Comments

@mikelorant
Copy link
Contributor

Problem

Currently, when collecting stats for 201 services after running the exporter for 13 days with 12 shards the metrics endpoint output size is as follows:

Shard Services Payload (KB)
1 18 30,792
2 25 55,378
3 16 40,243
4 15 29,123
5 21 34,345
6 22 40,100
7 10 19,948
8 19 47,790
9 11 20,234
10 15 40,499
11 19 37,366
12 19 29,092
Total 210 424,910

With a scrape interval of 60 seconds the bandwidth requirement becomes 7,082 KB/s. In terms of storage requirements, this is 424,910 KB * 60 mins * 24 hours = 584 GB of raw data per day.

This can cause considerable impact on Prometheus scraping performance as this is a very large payload.

Proposal

Currently, each datacenter is a label which multiplies the number of each metric. When combined with a metric that has a status_code label this can explode the number of metrics returned.

A possible solution to reduce the output size of the metrics endpoint would be to aggregate the datacenter.

Analysis of how this might impact the output size for the earlier example is as follows:

Shard Services Payload (KB)
1 17 645
2 25 934
3 16 607
4 14 531
5 21 796
6 21 786
7 10 394
8 18 686
9 10 398
10 15 582
11 19 718
12 15 569
Total 201 7,646

With a scrape interval of 60 seconds the bandwidth requirement becomes 127 KB/s. In terms of storage requirements, this is 7,646 KB * 60 mins * 24 hours = 11 GB of raw data per day.

A comparison to the results with having individual datacenter metrics shows the following improvements:

Datacenter Payload (KB) Rate (KB/s) Storage (Daily in GB) Reduction
Individual 424,910 7082 584
Aggregated 7,646 127 11 98%

A side effect of having aggregated datacenter metrics would be the memory consumption should be reduced. It it hard to determine the exact impact but there should certainly be some improvements.

Conclusion

Aggregated data center metrics would provide an option for users that wish to reduce the metrics endpoint output size. By providing this as an option (not the default) this would allow users to decide if the benefits of reducing the output size outweigh the loss of inidivual datacenter metrics.

@mikelorant
Copy link
Contributor Author

I have worked with @matthope to create a preliminary implementation of this feature. Before I take the final steps to turn this into an open pull request I wanted to have a discussion if this is a feature that would be beneficial to add the Fastly exporter.

The initial implementation is based on the work done by @matthope in 2020 and required some effort to bring forward to the head of the master branch.

I then took the opportunity to refine the implementation based on his feedback and our discussions.

This means this work will need to be combined into 2 commits each attributed to the developer who added the code. As there are no contributing guidelines there is clarity required about how this work should be submitted.

The current state of the implementation is:

These are a combination of 2 stacked branches layered upon master.

The final diff report can be viewed here:
main...fairfaxmedia:fastly-exporter:feature/aggregate-datacenter-improve

Any feedback would be greatly appreciated.

@mikelorant
Copy link
Contributor Author

Preliminary pull request #153 created.

@mikelorant
Copy link
Contributor Author

This pull request is being split into multi pull requests.

The first change is to refactor the way labels are implemented allowing the default labels to be changed easily. See pull request #167.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants