Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pkg_infos() to speed up rpm_build_info() #2287

Closed
wants to merge 2 commits into from

Conversation

chrstphrchvz
Copy link
Contributor

Fixes: #2285

@chrstphrchvz
Copy link
Contributor Author

The existing approach of running shell commands might be easier for ongoing tinkering with which ones to use, and for spying on subprocesses (as I've been doing); and in this case, rpm --query --queryformat lends itself well for use by scripts. But I am curious if Rockstor is allowed to use/depend on the librpm Python bindings, and if so, would it prefer using them over running commands and parsing their output?

@phillxnet
Copy link
Member

@chrstphrchvz Thanks for this rather nice simplification. I hope it does pan out that we can go this way, and it will be nice to start dropping the prior requirement of having to also be CentOS yum compatible as well as openSUSE dnf-yum compatible.

As this is fairly critical path stuff, we really don't want to break updates to save a few seconds, or more like a single second on actual targeted hardware, could you take a look at how these changes pan out in our existing tests for this area of the code:

https://github.com/rockstor/rockstor-core/blob/master/src/rockstor/system/tests/test_pkg_mgmt.py

The following forum thread has info on how to run these once you have a working source build:

https://forum.rockstor.com/t/built-on-opensuse-dev-notes-and-status/5662

We have to be super careful on the update code. Especially if it's just for performance reasons. As I indicated in response to another great suggestion of yours recent, I'm nervous about making changes in this area of the code if it's currently working. But if we have reasonable testing coverage and we can do an end to end test (any source install will consider any rpm install to be an update and will auto-wipe the db while at it) then it's far less of a worry and we get the goodness of speedups.

Let us know how you get on with the unit test or if you can improve them if need be. As you see this area of the code is both old and new but definitely clunky and so all the more likely to hide surprises. But again this also makes it ripe for improvement such as your suggestion here.

In summary I'm not happy saving a second or two for any risk to all users of the rpm update track. But I'm also not happy with the current state of some of that code. So we should have good coverage of these changes and tests against regressions as if we break updates we break the vast majority of our users ability to be able to update their way out of any other problem we also have to address bar being sub-optimal on the speed front.

You may be interested to know that we are also planning to move to Python 3, which will in turn offer speed-ups:
#1877
so there is hope and also further requirement to improve our test coverage as and when as it where.

Thanks for submitting this pr and don't be put off by any means. I just wasn't sure if you were aware of our existing, all be it insufficient, tests; and the potential project failure a breakage in update can represent. We had one a few years ago and are still dealing with the fall out. See for context the nuance in:
version and date incorrectly reported re update info #1870
and it's associated pr:
#1871
and having to deal with such things as identical changelogs etc.

Cheers.

@chrstphrchvz
Copy link
Contributor Author

Thanks for the response, will look into those.

In this case, since I'm not sure whether dnf/yum output is intended more for human than machine readability and won't change over time (Rockstor already accommodates some differences), I would think it is more reassuring to rely on rpm --query --queryformat. Maybe the librpm Python API is just as stable.

If there is concern that rpm might return different information about installed packages than dnf/yum, I am currently not aware of that being possible.

@phillxnet
Copy link
Member

@chrstphrchvz Re:

I would think it is more reassuring to rely on rpm --query --queryformat. Maybe the librpm Python API is just as stable.

Yes, quite possibly. Let us know how you get on after looking at/running our current test as that may well help, both with proving your changes and identifying more what stuff is used for.

Cheers.

@chrstphrchvz
Copy link
Contributor Author

test_rpm_build_info does need to be rewritten for this proposed change (it will likely become much simpler), but I am still trying to figure out a good way of doing so.

In test_pkg_mgmt.py, the pkg_infos function is currently being mocked; should I modify it by setting mock_pkg_info.side_effect to a function e.g. fake_pkg_infos?

Or should this instead exercise the actual pkg_infos function? That approach would need to accommodate run_command being called more than once per rpm_build_info call; at the moment I think that would involve setting mock_run_command.side_effect to an iterable rather than setting mock_run_command.return_value.

@phillxnet
Copy link
Member

@chrstphrchvz Well done on making progress here.
Can't response fully just yet as haven't had a proper look but:
Re:

That approach would need to accommodate run_command being called more than once per rpm_build_info call;

@FroggyFlox rand into something similar in the network code. The neater solution would be to refactor, if possible, the tested code, such that each testable element only contains a single call to run_command. This may not be possible, but if it is then we further increase our potential to fully test without having overly complex (read fragile) tests in the first place. And hopefully further simplify the code under test also.

Hope that helps.

@chrstphrchvz
Copy link
Contributor Author

The neater solution would be to refactor, if possible, the tested code, such that each testable element only contains a single call to run_command.

I think that can be done in this case by rewriting/extending pkg_infos to accept multiple tags to retrieve from rpm -q --queryformat …. If that sounds acceptable, are there input/output types which would be preferred/make the most sense?

@phillxnet
Copy link
Member

@chrstphrchvz Apologies for the delayed response on this one.

I think that can be done in this case by rewriting/extending pkg_infos to accept multiple tags to retrieve from rpm -q --queryformat …. If that sounds acceptable, are there input/output types which would be preferred/make the most sense?

That sounds just the ticket. Re 'types', at first glance, I'd say the input of multiple tags would fit well with being moved to a list. But whatever works really. But changing the nature of what is returned may end up opening a can of worms so bit by bit and see how you get on.

Really looking forward to type hints in Python 3 but again, bit by bit.

It would be nice to have better test coverage in this area because as stated previously, it's current state has been haunting us for a while and given it's critical nature we have to be very careful. Especially if we are just chasing performance. But if we can turn that into greater robustness/simplicity then great.

Thanks again for looking so closely as this stuff. Quite the concern given it's pivotal nature in our ability to deliver fixes.

@chrstphrchvz
Copy link
Contributor Author

I realized that if we want to run rpm -q --queryformat … only once by using multiple tags, then there needs to be a way of delimiting the output for each tag, and a naïve approach of delimiting with newlines will not always work. Some possible alternatives might be to have the output for DESCRIPTION (which can be multiple lines long) always be output last, or to use a control character like ASCII BEL (\a) that normally shouldn't appear in any tags' output, though I'm not certain those approaches are reliable either. Maybe using the librpm Python bindings avoids this problem entirely, but it may involve even more extensive rewriting of the tests.

@phillxnet
Copy link
Member

@chrstphrchvz I'll close this for now as we are now based on Py3.11: so you may want to pick-up on this effort from scratch again now.

Thanks again for having a go at this. We have picked up quite the speed improvement of late so it may also be useful to use some of the new Python profiling that is now available to us, regarding finding our slow-spots as it were.

I also suspect that our newer zypper can retrieve changelogs form uninstalled packages: if so that would be a great improvement as we can dump all our dnf nonsense just to get our changelog. Super confusing for folks when they see that installed but unable to do anything. Just a though.

Thanks again for your improvements/speed-ups to date. Hope to see you again in these parts :).
We are now moving towards release candidate status in current testing so major changes are now curtailing, but package stuff could be re-done in the next testing if you fancy popping by again.

Cheers.

@phillxnet phillxnet closed this Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Web UI delay/timeout while waiting on yum info installed
2 participants