(PE-37294) Remove duplicate storage of catalog resource parameters #3922

austb · 2023-12-13T00:09:36Z

make a public GH issue for this change, as well as the regex op bug
run A/B performance tests

Resolves #3923
Resolves #3924

resource_params and resource_params_cache store the same data. Remove the resource_params table in favor of the "cache". Due to PuppetDB's high write load, the single jsonb row is preferable over the flattened storage.

Removes storage code that interacted with the resource_params table now that it has been removed. Adjusted the tests to use :parameters (sutils/munge-jsonb-for-storage {}) instead of nil. The wireformat does not allow for nil parameters and this was causing issues with the jsonb queries.

Replace resource_params queries with resource_params_cache queries, prefering the @> and ? operators where possible because both are indexed by the gin index on parameters.

austb · 2023-12-28T18:50:51Z

This is of marginal benefit, processing catalogs that took roughly 200ms each, this saved less than 5ms per catalog. So this would be expected to be no more than a 1-2% improvement on catalog storage times. It also appeared to cause some performance regressions in querying

In the initial storage, most of the time is spent inserting catalog resources, not parameters. At ~100ms it does not seem abnormally slow for an insert to Postgres and we do appear to be properly constructing our batched insertions, but we should look into optimization options there.

After initial storage, about 25% of the time (in my testing approx. 25ms of 95ms) is spent calculating the catalog-similarity-hash to determine if the catalog hash changed. In this run rand-perc was set to 100%, so there was always an update and another ~12ms was spent calculating the resource hashes. The remaining time is spent: ~10ms updating metadata ~45ms updating resources which itself is mostly a set of <10ms operations. Given that most of the time catalogs do not change, improving the catalog similarity hash would have the best overall performance improvement in the steady-state.

austb added 4 commits December 13, 2023 10:28

(PE-37294) Drop resource_params table

d6e945f

resource_params and resource_params_cache store the same data. Remove the resource_params table in favor of the "cache". Due to PuppetDB's high write load, the single jsonb row is preferable over the flattened storage.

(PE-37294) Replace resource_params queries with rpc

7e53663

Replace resource_params queries with resource_params_cache queries, prefering the @> and ? operators where possible because both are indexed by the gin index on parameters.

(PDB-5713) Fix bug with !~ operator

8478d93

austb force-pushed the pe-37294/main/unify-resource-parameters branch from 34518e9 to 8478d93 Compare December 13, 2023 18:29

austb closed this Dec 28, 2023

austb mentioned this pull request Dec 28, 2023

Remove duplicate storage of catalgo resource parameters #3923

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(PE-37294) Remove duplicate storage of catalog resource parameters #3922

(PE-37294) Remove duplicate storage of catalog resource parameters #3922

austb commented Dec 13, 2023 •

edited

Loading

austb commented Dec 28, 2023

(PE-37294) Remove duplicate storage of catalog resource parameters #3922

(PE-37294) Remove duplicate storage of catalog resource parameters #3922

Conversation

austb commented Dec 13, 2023 • edited Loading

austb commented Dec 28, 2023

austb commented Dec 13, 2023 •

edited

Loading