Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repoquery: fix rich deps matching by using provide expansion from libdnf #1432

Closed
wants to merge 6 commits into from

Conversation

lukash
Copy link
Contributor

@lukash lukash commented Jul 15, 2019

This is meant to fix a bug where rich dependencies would match a package even if the require specified a different version range than doesn't match the version of the provide.

Uses new libdnf (PR linked below) that can expand package provides while filtering so that it doesn't haveto be implemented in dnf in a non-straightforward way that was causing it to match rich dependencies incorrectly. Further description is with the commits, along with some shortcomings this currently has.

rpm-software-management/libdnf#762

Tests for this are here:
rpm-software-management/ci-dnf-stack#577

https://bugzilla.redhat.com/show_bug.cgi?id=1534123
https://bugzilla.redhat.com/show_bug.cgi?id=1698034

@packit-as-a-service
Copy link

There was an error while running a copr build. You can re-trigger copr build by adding a comment (/packit copr-build) into this pull request.

@j-mracek
Copy link
Member

j-mracek commented Nov 8, 2019

@lukash Did you make any performance measurements?

@lukash
Copy link
Contributor Author

lukash commented Nov 8, 2019

@j-mracek I did not. From what I remember the command from the bug was always quite slow - but I might remember that wrong. Are you concerned about anything in particular?

@j-mracek
Copy link
Member

j-mracek commented Nov 8, 2019

I would like to hear that there is no performance issue with the code change. The original code is slow, but it could be even worse.

@lukash
Copy link
Contributor Author

lukash commented Nov 8, 2019

I would like to hear that there is no performance issue with the code change. The original code is slow, but it could be even worse.

I'll check it.

@rh-atomic-bot
Copy link

☔ The latest upstream changes (presumably 5ccde9f) made this pull request unmergeable. Please resolve the merge conflicts.

@lukash
Copy link
Contributor Author

lukash commented Dec 10, 2019

I've done some rudimentary performance measurements. As a summary, it's mostly the same, the new version is slightly slower. There's a big performance hit for --whatconflicts, --whatrecommends, --whatenhances, --whatsupplements and --whatsuggests, because I've added NEVRA resolving for their arguments. Otherwise the new rich dependency resolving couldn't be used for them and they would have inconsistent behavior with the other switches.

Here's a table with the measurements. I've picked the --whatrequires and --whatconflicts options for the testing. The command run was dnf repoquery --OPTION QUERY. The options are in second to fourth column, 'old' is original dnf, 'new' is with the patches. Old dnf repoquery --whatconflicts QUERY was always around 2s, so that's where the major performance difference is. The added NEVRA resolving is causing it.

 QUERY     | old --whatrequires | new --whatrequires | new --whatconflicts |
-----------+--------------------+--------------------+---------------------+
 *server*  | 5m 5s              | 5m 14s             | 1m 25s              |
 *dnf*     | 1m 6s              | 1m 17s             | 24s                 |
 *libsolv* | 25s                | 30s                | 12s                 |
 libsolv   | 2s                 | 2s                 | 3s                  |

@lukash
Copy link
Contributor Author

lukash commented Dec 10, 2019

Oh, just to add: The stars in the QUERIES are there to increase the amount of matches for the queries, so that the times are longer and performance differences more significant. They're not the usual queries users would run most of the time, obviously.

@j-mracek
Copy link
Member

@lukash Please could you rebase it?

@j-mracek
Copy link
Member

I have discovered a performance problem with --whatconflicts:

time sudo dnf repoquery -C --disableplugin=* --whatconflicts $(rpm -qa --queryformat="%{name},") |

The original dnf is able to perform the command in 3.835s but with patches in 18.262s.
Please could you provide improvements?

@lukash
Copy link
Contributor Author

lukash commented Dec 13, 2019

@j-mracek:

I have discovered a performance problem with --whatconflicts

Please kindly read my #1432 (comment). Thanks.

@packit-as-a-service
Copy link

There was an error while running a copr build:

Request is not in JSON format, there is probably a bug in the API code.

You can re-trigger copr build by adding a comment (/packit copr-build) into this pull request.

depquery = depquery.union(query.filter(requires=resolved_nevras))

if all_dep_types:
# TODO this is actually very inefficient, as it builds the same list of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the comment. I would recommend to remove it because it has not much informative value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment has a lot of value, it just may not be phrased well. The problem is each call to query.filter IIRC resolves the names to packages or does some similarly expensive operation (sorry, I forgot the details, I can check them if you find it important here), which is the same for all the five total calls that are happening here. Therefore, if the code was written in a way this same operation was done only once, the code could be five times faster.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please could you remove the comment anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to give me a good reason for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple I don't think that the code should be used as a wish list. Additionally I don't think that there is any real impact on performance. Also the implementation is limited by API available in libsolv.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a wish, it is a performance issue with the code that I've discovered.

There is a real impact on performance. I've done a very rudimentary performance check (as I kind of hope it will be enough for you) and for a repoquery like:

dnf3 repoquery --whatdepends "libdnf*"

The pyseq_to_reldeplist() function (https://github.com/rpm-software-management/libdnf/blob/20ba333f9a54e73e95257e1a78f0f55bb6e88809/python/hawkey/iutil-py.cpp#L237) is being called 5 times with the same arguments and the same result. It is taking about 50% of the total time (I've eyeballed the time tbh., didn't go through measuring it exactly...) of the command run time.

This is obviously not a libsolv limitation, it is a shortcoming of the current API design and the comment is there to bring attention to it, because it is not apparent at first glance this may be happening.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then please enhance the comment and provide additional information or remove it. The issue is with resolving string as a glob to reldep. It could be enhance by providing API for the conversion and then using list of reldeps for query. To general notes have nearly no value and the author of the note will not remember details. Just remember how many time you already mentioned that you do not remember details about your code, in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've adjusted the comment, sorry about being imprecise in the first place.

FTR I think the right API fix is to have support for depends, instead of separating out the resolving of globs. I think that's better for a high-level API.

return done

def by_all_deps(self, names, query, all_dep_types=False):
# in case of arguments being NEVRAs, resolve them to packages
resolved_nevras = self._resolve_nevras(names, query)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using nevra could be confusing because we have NEVRA object. Option parser use REQ for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is confusing, the naming? The comment? I'm not sure what you mean...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved_nevras is confusing. Inside the variable there is a query. What about nevras_query, REQs_query, specs_query, subjects_query or anything else that is better than resolved_nevras.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed it to resolved_nevras_query, as the _query suffix seems to be what you really want and while a bit long, it still fits on the lines.

Lukáš Hrázký added 3 commits January 14, 2020 10:24
Moves the check for alldeps/exactdeps switches to the configure() method
along with the other checks.

Raises dnf.cli.CliError instead of dnf.exceptions.Error.
Fixes a bug introduced in bd00c04, where the by_all_deps() arguments
were changed to an array, but in tree_seed() it still passes a single
string.

Untested, I have actually no idea what the code does there, it looks
kind of suspect.
Separates whatrequires and whatdepends handling into two separate code
branches, the original was overly complex, confusing and presented
multiple unnecessary corner cases.

The by_all_deps() arguments now also have a much better defined role.
@j-mracek
Copy link
Member

I believe that information in doc/api_queries.rst is outdated. @lukash Please could you take a look?

@lukash
Copy link
Contributor Author

lukash commented Jan 14, 2020

I believe that information in doc/api_queries.rst is outdated. @lukash Please could you take a look?

Could you be more specific?

@j-mracek
Copy link
Member

I believe that information in doc/api_queries.rst is outdated. @lukash Please could you take a look?

Could you be more specific?

At lease please could you update a table between lines 90 and 115. For requires an argument type query/package sequence is not described.
I also believe that it is worth to mentioned that requires=[pkg] could provides a different result from requires=pkg.provides + pkg.files.

@lukash
Copy link
Contributor Author

lukash commented Jan 14, 2020

At lease please could you update a table between lines 90 and 115. For requires an argument type query/package sequence is not described.

Ok, I see, I'll try to add that. I also see there's no mention of conflicts and the rich deps filters, I guess I should add those... But it's also going to get a bit verbose and repetitive given how the table is structured...

I also believe that it is worth to mentioned that requires=[pkg] could provides a different result from requires=pkg.provides + pkg.files.

I suppose, but I'm not entirely sure how to phrase that and I'm also having a bit of a hard time remembering what exactly are the differences here (rich deps evaluation being the main difference I'm introducing), do you have any suggestions what to mention?

@j-mracek
Copy link
Member

j-mracek commented Jan 14, 2020

At lease please could you update a table between lines 90 and 115. For requires an argument type query/package sequence is not described.

Ok, I see, I'll try to add that. I also see there's no mention of conflicts and the rich deps filters, I guess I should add those... But it's also going to get a bit verbose and repetitive given how the table is structured...

I also believe that it is worth to mentioned that requires=[pkg] could provides a different result from requires=pkg.provides + pkg.files.

I suppose, but I'm not entirely sure how to phrase that and I'm also having a bit of a hard time remembering what exactly are the differences here (rich deps evaluation being the main difference I'm introducing), do you have any suggestions what to mention?

That rich deps are evaluated differently between arguments type. Like package my provides my = 2
then other packages requires (my > 0 with my < 1) and other package requires (my > 0 with my < 3). Then query filter requires=[pkg_my] will provides a different result like requires=["my = 2"].

@lukash
Copy link
Contributor Author

lukash commented Jan 14, 2020

I've added a commit updating the documentation in api_queries.rst.

Lukáš Hrázký added 3 commits January 16, 2020 14:14
Uses the new feature of filtering by solvables to properly resolve rich
dependencies for packages. The new code now, along with serching for the
arguments as reldeps, also searches for them as NEVRAs and then looks up
the dependencies for the packages it found.

https://bugzilla.redhat.com/show_bug.cgi?id=1534123
https://bugzilla.redhat.com/show_bug.cgi?id=1698034
…RhBug:1534123,1698034)

To properly list the conflicts, suggests, recommends, enhances and
supplements dependencies of packages defined through provides or even
rich dependencies, it is required to resolve them to packages and query
for them, because that's where the dependencies are defined.

https://bugzilla.redhat.com/show_bug.cgi?id=1534123
https://bugzilla.redhat.com/show_bug.cgi?id=1698034
Add the new types of arguments supported for dependency filtering.
@j-mracek
Copy link
Member

@rh-atomic-bot r+

@rh-atomic-bot
Copy link

📌 Commit 757f9ad has been approved by j-mracek

@rh-atomic-bot
Copy link

⌛ Testing commit 757f9ad with merge b6d99fa...

rh-atomic-bot pushed a commit that referenced this pull request Jan 27, 2020
Fixes a bug introduced in bd00c04, where the by_all_deps() arguments
were changed to an array, but in tree_seed() it still passes a single
string.

Untested, I have actually no idea what the code does there, it looks
kind of suspect.

Closes: #1432
Approved by: j-mracek
rh-atomic-bot pushed a commit that referenced this pull request Jan 27, 2020
Separates whatrequires and whatdepends handling into two separate code
branches, the original was overly complex, confusing and presented
multiple unnecessary corner cases.

The by_all_deps() arguments now also have a much better defined role.

Closes: #1432
Approved by: j-mracek
rh-atomic-bot pushed a commit that referenced this pull request Jan 27, 2020
Uses the new feature of filtering by solvables to properly resolve rich
dependencies for packages. The new code now, along with serching for the
arguments as reldeps, also searches for them as NEVRAs and then looks up
the dependencies for the packages it found.

https://bugzilla.redhat.com/show_bug.cgi?id=1534123
https://bugzilla.redhat.com/show_bug.cgi?id=1698034

Closes: #1432
Approved by: j-mracek
rh-atomic-bot pushed a commit that referenced this pull request Jan 27, 2020
…RhBug:1534123,1698034)

To properly list the conflicts, suggests, recommends, enhances and
supplements dependencies of packages defined through provides or even
rich dependencies, it is required to resolve them to packages and query
for them, because that's where the dependencies are defined.

https://bugzilla.redhat.com/show_bug.cgi?id=1534123
https://bugzilla.redhat.com/show_bug.cgi?id=1698034

Closes: #1432
Approved by: j-mracek
rh-atomic-bot pushed a commit that referenced this pull request Jan 27, 2020
Add the new types of arguments supported for dependency filtering.

Closes: #1432
Approved by: j-mracek
@rh-atomic-bot
Copy link

☀️ Test successful - status-papr
Approved by: j-mracek
Pushing b6d99fa to master...

inknos pushed a commit that referenced this pull request May 7, 2020
Moves the check for alldeps/exactdeps switches to the configure() method
along with the other checks.

Raises dnf.cli.CliError instead of dnf.exceptions.Error.

Closes: #1432
Approved by: j-mracek
inknos pushed a commit that referenced this pull request May 7, 2020
Fixes a bug introduced in bd00c04, where the by_all_deps() arguments
were changed to an array, but in tree_seed() it still passes a single
string.

Untested, I have actually no idea what the code does there, it looks
kind of suspect.

Closes: #1432
Approved by: j-mracek
inknos pushed a commit that referenced this pull request May 7, 2020
Separates whatrequires and whatdepends handling into two separate code
branches, the original was overly complex, confusing and presented
multiple unnecessary corner cases.

The by_all_deps() arguments now also have a much better defined role.

Closes: #1432
Approved by: j-mracek
inknos pushed a commit that referenced this pull request May 7, 2020
Uses the new feature of filtering by solvables to properly resolve rich
dependencies for packages. The new code now, along with serching for the
arguments as reldeps, also searches for them as NEVRAs and then looks up
the dependencies for the packages it found.

https://bugzilla.redhat.com/show_bug.cgi?id=1534123
https://bugzilla.redhat.com/show_bug.cgi?id=1698034

Closes: #1432
Approved by: j-mracek
inknos pushed a commit that referenced this pull request May 7, 2020
…RhBug:1534123,1698034)

To properly list the conflicts, suggests, recommends, enhances and
supplements dependencies of packages defined through provides or even
rich dependencies, it is required to resolve them to packages and query
for them, because that's where the dependencies are defined.

https://bugzilla.redhat.com/show_bug.cgi?id=1534123
https://bugzilla.redhat.com/show_bug.cgi?id=1698034

Closes: #1432
Approved by: j-mracek
inknos pushed a commit that referenced this pull request May 7, 2020
Add the new types of arguments supported for dependency filtering.

Closes: #1432
Approved by: j-mracek
@lukash lukash deleted the rich-deps branch September 30, 2020 13:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants