Skip to content

Support reading critical path information from local JSON files #4755

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 10, 2022

Conversation

AdamWill
Copy link
Contributor

@AdamWill AdamWill commented Sep 30, 2022

I have follow-ups to this PR to actually do interesting things with the grouped data, but that will be a rather bigger change (including db schema change) and this is useful in itself, so I figured I'd submit it first.

Doing this would give us two significant things: we can potentially stop using PDC for critical path information (which is one of the things blocking PDC's retirement), and we wouldn't have to restart Bodhi for changes to the critical path to take effect. I'm imagining to implement this we'd write a trivial cron job that runs the critpath.py script and outputs the data with the right filenames every day.

I did write this to theoretically support non-RPM component types, like the existing code does, but I don't think we actually do generate critical path data for anything besides RPMs in the main release collections right now, unless someone knows otherwise.

@nirik @smooge for info.

@AdamWill AdamWill requested a review from a team as a code owner September 30, 2022 15:51
@AdamWill AdamWill force-pushed the critpath-json branch 2 times, most recently from cd06e3e to c521ea7 Compare September 30, 2022 16:04
@AdamWill
Copy link
Contributor Author

Test failure looks unrelated (the F35 integration tests ran into an authentication prompt in unrelated code).

@AdamWill AdamWill force-pushed the critpath-json branch 2 times, most recently from f0d9192 to 8a12d75 Compare September 30, 2022 21:16
@mattiaverga
Copy link
Contributor

Looks good, just a two pedantic comments from me:

  • the critpath_components = [] on line 217 doesn't seem needed as it is set on line 207
  • shall we cache read_critpath_json? I think changes in critpath files would be deployed by ansible, thus reloading Bodhi server and clearing the cache.

I'm also thinking about dropping memoized which seems not maintained anymore. With python3 it seems best to use functools.lru_cache.

@AdamWill
Copy link
Contributor Author

AdamWill commented Oct 8, 2022

Thanks, I'll look at the first note in a bit.

My intent for how this could be used is that we could run a cron job (or systemd timer service, whatever) on the Bodhi backend server that runs the critpath generation script every 24 hours and drops it in the appropriate location. I guess the script could also restart Bodhi, but is it a great idea to do that every 24 hours? It might cause unexpected effects for anyone who happens to be using Bodhi at that moment, after all. So I kinda intended to drop the caching, it makes the code simpler, it avoids potential weird caching issues (caching always seems to have weird issues), and since opening and reading a local JSON file is a fairly trivial operation and AFAICS only happens when an update is created or edited, doesn't really seem worthwhile at least to me.

I guess ideally in future I'd like to drop the other two critpath 'backends' which would allow this code to be much simpler. While the PDC backend is still around you can say "well it's certainly sensible to cache that, and if we're caching that one we may as well cache the others", but since we're trying to drop PDC anyway, I'm kinda expecting we'll drop that code in future.

@AdamWill
Copy link
Contributor Author

AdamWill commented Oct 8, 2022

and yeah, if there's a stdlib implementation of the caching, I certainly agree with switching to that as long as we actually are doing caching, but that could probably be done as a separate PR.

@mattiaverga
Copy link
Contributor

I almost forgot: can you add a note to be added to release notes?

@AdamWill
Copy link
Contributor Author

AdamWill commented Oct 10, 2022

Technical note on the caching: after just the first commit in this series, the JSON path is cached, because at that point, caching is still applied to the umbrella get_critpath_components. The second commit actually specifically removes caching for the JSON path (and the 'just read it from config' path), by moving the caching to get_critpath_components_from_pdc. So if you really really want to keep the caching, we just drop the second commit.

This adds another valid setting for critpath.type, 'json', which
means to read the critpath information from JSON files. These
are expected to be produced by the `critpath.py` script from
the releng repo (which I recently updated to output this data).
An additional setting, `critpath.jsonpath`, tells Bodhi where
to find the files. The files must be named for each Release's
'branch' value, which is the name of the dist-git branch for
that release.

Ultimately this has two benefits: let us stop using PDC for this
(we can just have a cron job to run `critpath.py` nightly or
whatever), which is one step to not needing PDC any more and
avoids unnecessary network traffic and server load, and allow
us to do critpath stuff by group. The JSON data as output by
`critpath.py` is grouped. This commit doesn't take advantage of
that yet - it flattens the data into a single big list when
reading it - but follow-up commits will add additional code
paths to determine which groups a given update is critpath for.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
This moves the `@memoized` decorator to the function for getting
critpath from PDC. The effect of this is we will not cache
the critpath when retrieving it from a local source (JSON files
or configuration). There's no real point in re-reading it from
the configuration on every run of the function, but it also
doesn't hurt much - it's probably no more resource-intensive
than running through the `@memoized` decorator anyway. There is
a benefit to re-reading the JSON file on every run: it allows
us to update the critical path regularly without restarting
Bodhi. This function is called only when creating or editing
updates, so I don't think we're going to overwhelm servers with
disk reads or anything by doing this.

Signed-off-by: Adam Williamson <awilliam@redhat.com>
@AdamWill
Copy link
Contributor Author

Rebased, dropped the unnecessary critpath_components setting, and added a news file.

Copy link
Contributor

@mattiaverga mattiaverga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if you really really want to keep the caching, we just drop the second commit.

I was just thinking that having the json file open and parsed for every update creation is not really efficient, since the critpath list is rarely to be updated. Maybe the cronjob used to update the critpath files can also fire cache_clear().

However, this PR is fine as it is, I'll try to look to porting the cache from memoize to functools in another PR.

@mergify mergify bot merged commit 6c91471 into fedora-infra:develop Oct 10, 2022
@AdamWill
Copy link
Contributor Author

I agree it's not super efficient, but practically speaking, I don't think we create / edit updates enough that the trivial amount of resources needed to open and parse a JSON file each time we do are really worth worrying about :) It's just intuition, though. I guess if we actually do notice increased resource usage in production after deploying the change we can always revisit it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants