Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reintroduce the preresolved dependency optimization #610

Closed
woodruffw opened this issue May 5, 2023 · 10 comments · Fixed by #626
Closed

Reintroduce the preresolved dependency optimization #610

woodruffw opened this issue May 5, 2023 · 10 comments · Fixed by #626
Labels
component:dep-sources Dependency sources performance Something isn't as fast or responsive as it should be.

Comments

@woodruffw
Copy link
Member

We removed this in #540 for correctness reasons.

However, doing so seems to have caused some bad performance regressions on brew-pip-audit's automation: audits that were previously only bound by vulnerability API lookup time are now much slower, as they call into pip and do dependency machinery work.

We should consider reintroducing a variant of these, potentially stuffed behind a scary flag.

CC @tetsuo-cpp for thoughts.

CC @alex for visiblity.

@woodruffw woodruffw added the component:dep-sources Dependency sources label May 5, 2023
@pauloxnet
Copy link

In our projects, we fixed pip-audit to version 2.5.2 because the later versions are definitely so much slower.

Version 2.5.2

$ time python3 -m pip_audit --require-hashes --requirement requirements/remote.txt
No known vulnerabilities found

real    0m0,624s
user    0m0,563s
sys     0m0,062s

Version 2.5.5

$ time python3 -m pip_audit --require-hashes --requirement requirements/remote.txt
No known vulnerabilities found

real    0m15,915s
user    0m11,808s
sys     0m1,037s

@tetsuo-cpp
Copy link
Contributor

@woodruffw I wonder whether this would be solved by us passing --no-deps into pip whenever the user gives us --require-hashes. One of the key differences in pip's behaviour is that --require-hashes doesn't imply --no-deps and it will do full dependency resolution to ensure that all requirements are in the file and are hashed.

I need to do a bit of testing with pip, but the solution might be to just pass in --no-deps whenever the user gives us --require-hashes and hopefully this stops dependency resolution from running.

@tetsuo-cpp
Copy link
Contributor

@pauloxnet If you run that command with --no-deps, does that help performance at all?

@pauloxnet
Copy link

pauloxnet commented May 16, 2023

I confirm that we create hashed requirements file with pipcompile

python3 -m piptools compile --generate-hashes --no-header --quiet --resolver=backtracking --upgrade --output-file requirements/test.txt requirements/test.in

Using --no-deps does not help performances.

The main difference in pip-audit up to 2.5.2 seems to me that it checks the versions only considering the requirements file, while in later versions it tries to install them in an isolated environment, but this last thing is slower and potentially forces you to carry around more dependencies.

@tetsuo-cpp
Copy link
Contributor

@woodruffw Based on that, we might have to just bring that logic back. I don't think it's that big a deal as long as it's not the default behaviour when someone uses --require-hashes. It only has problems with local requirements and even then, the offending requirements are listed in the "skipped" summary.

@woodruffw
Copy link
Member Author

@tetsuo-cpp Agreed; I'll update this issue.

@woodruffw woodruffw changed the title Consider reintroducing (a variant of) the preresolved dependency optimization Reintroduce the preresolved dependency optimization May 16, 2023
@woodruffw woodruffw added the performance Something isn't as fast or responsive as it should be. label May 16, 2023
@trottomv
Copy link
Contributor

trottomv commented May 23, 2023

Hi everyone, I would like to confirm what @pauloxnet has reported regarding the drop in performance observed in versions after 2.5.2 of the software. Specifically, this issue arises when a requirement.txt file, built with pip compile and containing hashes, is passed to pip audit. The problem stems from the fact that starting from version 2.5.3, all the dependencies parsed by pip audit are installed in a temporary virtual environment, and there is no option to exclude this behavior.

Upon examining the code of pip audit, I have come to the conclusion that it is logically correct to enforce the installation of packages in a temporary virtual environment in order to ensure that the audit includes all subdependencies. The only, or perhaps the primary, exception to this would be when the requirements are compiled with pip compile, without necessarily including the hashes. This is because certain Python libraries generate the hashes in the requirement.txt file, but these hashes may not guarantee the inclusion of all the subdependencies.

Therefore, in general, I believe it is appropriate that when the arguments --require-hashes or --no-deps are passed, the installation occurs in the temporary virtual environment. This is because the presence of hashes does not guarantee that all the dependencies listed in the requirements.txt file will actually be installed.

Personally, I would suggest implementing a new code flow to provide pip audit with information about whether the requirements.txt file was compiled with pip compile or not. Perhaps this could be achieved through a new argument for pip audit, such as --skip-venv or --requirement requirements.txt --with-pip-compile.

Alternatively, another approach could be to parse the content of the requirements.txt file itself. If the file contains a reference to pip-compile in the header, it could trigger a more efficient flow that avoids installing the dependencies in the temporary virtual environment. However, it's worth noting that pip compile can be executed with the --no-header option, which means that even if the requirements were compiled with pip compile, it would not necessarily result in a more efficient flow.

@trottomv
Copy link
Contributor

@woodruffw I thought of opening a pull request with the proposal to introduce a new option to optimize dependency resolution in case they are compiled with pip-compile ☝️

@woodruffw
Copy link
Member Author

The work in #626 has been released with 2.6.0. Thanks for your hard work here @trottomv!

@pauloxnet
Copy link

Thanks to @woodruffw and @trottomv for the new version.

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Jul 19, 2023
## [2.6.0]

### Added

* Added option to skip dependency resolution via `pip` with the `--disable-pip`
  flag. This option can only be used with hashed requirements files or when the
  `--no-deps` flag has been provided
  ([#610](pypa/pip-audit#610))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component:dep-sources Dependency sources performance Something isn't as fast or responsive as it should be.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants