Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add additional metrics reporting size of artifacts stored #4603

Closed
Tracked by #4604
bmbouter opened this issue Oct 20, 2023 · 4 comments · Fixed by #4989
Closed
Tracked by #4604

Add additional metrics reporting size of artifacts stored #4603

bmbouter opened this issue Oct 20, 2023 · 4 comments · Fixed by #4989
Assignees

Comments

@bmbouter
Copy link
Member

Feature

When running on Pulp on cloud services the data storage costs can be expensive. Understanding these costs is key. It would be excellent to know answers to these questions:

  • How many Bytes is a RepositoryVersion consuming?
  • How many Bytes are all of the RepoVersions consuming for a given Repository?
  • How many Bytes are all Repositories of a given domain consuming?
  • Will deleting old versions actually help me to reduce my storage costs?

Proposal

Have the saving of a RepositoryVersion emit this open telemetry data.

  • Bytes of the Artifacts that are new to this RepositoryVersion
  • Bytes of Artifacts that are removed from this RepositoryVersion
  • The Repository name and URL this RepositoryVersion is a member of
  • The Domain (if any) this Repository is a member of.

With ^ data, aggregating reports could be run (not part of this ticket) in Prometheus that do things like:

  • How many Bytes does all the history of a specific Repository use?
  • How many Bytes does all the data in a specific domain use?
  • If I deleted N RepositoryVersions how many Bytes would I save?
@bmbouter
Copy link
Member Author

bmbouter commented Nov 29, 2023

In thinking over the details here there is an accuracy issue with this plan: while the Bytes of Artifacts new to this repository version are indeed new to this repository version, they may already be part of another repository version in this domain and so they aren't really new to Pulp.

Say rpm Foo is 10MB. Say repo A gets rpm Foo in version 1. Then repo B gets rpm Foo in version 1 also. Both repo versions would show 10MB and if you sum them together you would expect 20MB of storage to be used, but in reality they are de-duplicated and only using 10MB. So that's problematic if the goal is to sum these values and have them be correct.

@lubosmj lubosmj self-assigned this Dec 4, 2023
@ipanova
Copy link
Member

ipanova commented Dec 4, 2023

There is desire to implement these two usecases:

  1. I operate pulp, and want a better understanding of how much each domain is costing.
  2. I operate a domain, and I am providing my storage, and want a better understanding of what each repo & repo version is costing.
    By seeing shared/exlusive space, at the repo level and the version level:
    • I will know how much space I can get back when deleting something
    • shared space information tells me the 'impact' its having, even if its not totally on that repo/version

@bmbouter
Copy link
Member Author

bmbouter commented Dec 6, 2023

@ipanova I think you're summary of the discussion is good.

@ALL I'd like to add the observation that only the operator of Pulp (not the operator of a domain) is the one consuming the open telemetry data.

Given that this ticket is for open telemetry data only, I propose we only report the total domain storage size and not anything on the individual repo or repo version level.

When to report it? Ideally we'd emit the OTEL metric anytime the storage size stored for a domain changes. That would include sync, orphan cleanup, and the pulp-content app streaming data (while also storing it). Really any operation that results in a possible change to the data amount stored for a domain.

How to report it? To me, reporting in total Bytes as an integer, along with the domain "name" and it's "pulp_href" would be sufficient.

This would leave the reporting tooling for helping API users answer the questions about how much space is being used for specific repos or repository versions as a completely separate topic for another time.

What do others think about this?

lubosmj added a commit to lubosmj/pulpcore that referenced this issue Jan 12, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Jan 12, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Jan 12, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Jan 12, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Jan 12, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Jan 15, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Jan 15, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Jan 16, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Jan 17, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Jan 18, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Jan 18, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Jan 18, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Jan 18, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Jan 18, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Jan 18, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Jan 25, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Jan 29, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Jan 29, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Jan 29, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Jan 29, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Jan 29, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Jan 29, 2024
@lubosmj
Copy link
Member

lubosmj commented Jan 29, 2024

When to report it? Ideally we'd emit the OTEL metric anytime the storage size stored for a domain changes. That would include sync, orphan cleanup, and the pulp-content app streaming data (while also storing it). Really any operation that results in a possible change to the data amount stored for a domain.

This is no longer a valid statement. We decided to report metrics periodically.

lubosmj added a commit to lubosmj/pulpcore that referenced this issue Feb 19, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Feb 19, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Feb 19, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Feb 19, 2024
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Feb 19, 2024
At the moment, it is not possible to destroy instruments that send
metrics. Therefore, when a user removes a domain, there might be still
the metrics about it emitted. A temporary workaround is to restart the
pulpcore-api process to reload meters.

Ref: open-telemetry/opentelemetry-specification#2232

closes pulp#4603
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Feb 19, 2024
At the moment, it is not possible to destroy instruments that send
metrics. Therefore, when a user removes a domain, there might be still
the metrics about it emitted. A temporary workaround is to restart the
pulpcore-api process to reload meters.

Ref: open-telemetry/opentelemetry-specification#2232

closes pulp#4603
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Feb 20, 2024
At the moment, it is not possible to destroy instruments that send
metrics. Therefore, when a user removes a domain, there might be still
the metrics about it emitted. A temporary workaround is to restart the
pulpcore-api process to reload meters.

Ref: open-telemetry/opentelemetry-specification#2232

closes pulp#4603
lubosmj added a commit to lubosmj/pulpcore that referenced this issue Feb 20, 2024
At the moment, it is not possible to destroy instruments that send
metrics. Therefore, when a user removes a domain, there might be still
the metrics about it emitted. A temporary workaround is to restart the
pulpcore-api process to reload meters.

Ref: open-telemetry/opentelemetry-specification#2232

closes pulp#4603
lubosmj added a commit that referenced this issue Feb 20, 2024
At the moment, it is not possible to destroy instruments that send
metrics. Therefore, when a user removes a domain, there might be still
the metrics about it emitted. A temporary workaround is to restart the
pulpcore-api process to reload meters.

Ref: open-telemetry/opentelemetry-specification#2232

closes #4603
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
3 participants