Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(PE-37294) Remove duplicate storage of catalog resource parameters #3922

Closed

Conversation

austb
Copy link
Contributor

@austb austb commented Dec 13, 2023

  • make a public GH issue for this change, as well as the regex op bug
  • run A/B performance tests

Resolves #3923
Resolves #3924

resource_params and resource_params_cache store the same data. Remove
the resource_params table in favor of the "cache". Due to PuppetDB's
high write load, the single jsonb row is preferable over the flattened
storage.
Removes storage code that interacted with the resource_params table now
that it has been removed.

Adjusted the tests to use
  :parameters (sutils/munge-jsonb-for-storage {})
instead of nil. The wireformat does not allow for nil parameters and
this was causing issues with the jsonb queries.
Replace resource_params queries with resource_params_cache queries,
prefering the @> and ? operators where possible because both are indexed
by the gin index on parameters.
@austb austb force-pushed the pe-37294/main/unify-resource-parameters branch from 34518e9 to 8478d93 Compare December 13, 2023 18:29
@austb
Copy link
Contributor Author

austb commented Dec 28, 2023

This is of marginal benefit, processing catalogs that took roughly 200ms each, this saved less than 5ms per catalog. So this would be expected to be no more than a 1-2% improvement on catalog storage times. It also appeared to cause some performance regressions in querying

In the initial storage, most of the time is spent inserting catalog resources, not parameters. At ~100ms it does not seem abnormally slow for an insert to Postgres and we do appear to be properly constructing our batched insertions, but we should look into optimization options there.

After initial storage, about 25% of the time (in my testing approx. 25ms of 95ms) is spent calculating the catalog-similarity-hash to determine if the catalog hash changed. In this run rand-perc was set to 100%, so there was always an update and another ~12ms was spent calculating the resource hashes. The remaining time is spent: ~10ms updating metadata ~45ms updating resources which itself is mostly a set of <10ms operations. Given that most of the time catalogs do not change, improving the catalog similarity hash would have the best overall performance improvement in the steady-state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant