Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand lockfile documentation to its own page. #18471

Merged
merged 2 commits into from Mar 11, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
271 changes: 271 additions & 0 deletions docs/markdown/Python/python/python-lockfiles.md
@@ -0,0 +1,271 @@
---
title: "Lockfiles"
slug: "python-lockfiles"
excerpt: "Securely locking down your third-party dependencies."
hidden: false
createdAt: "2023-03-11T01:50:46.369Z"
benjyw marked this conversation as resolved.
Show resolved Hide resolved
updatedAt: "2023-03-11T01:50:46.369Z"
---
Third-party dependencies are typically specified via a range of allowed versions, known as "requirements", in a file such as requirements.txt or pyproject.toml. A dependency resolution tool like Pip or Poetry then takes these initial requirements and attempts to find and download a consistent set of transitive dependencies that are mutually compatible with each other and with the target Python interpreter version.
benjyw marked this conversation as resolved.
Show resolved Hide resolved

When used naively, this dependency resolution process is unstable: if you run a resolve, and then some time later run another resolve on the same inputs, you may end up with a different resulting set of dependencies. This is because new versions of direct or transitive dependencies may have been published (or, in rare cases, yanked) between the two runs.
benjyw marked this conversation as resolved.
Show resolved Hide resolved

This is an issue both for correctness (your code may not be compatible with the new versions) and security (a new version may contain a vulnerability). A further security concern is that repeatedly downloading even the _same_ versions exposes you to greater risk if one of those versions is later compromised.

Dependency resolution can also be a performance bottleneck, with the same complex version compatibility logic running repeatedly, and unnecessarily.

Pants offers a solution to these issues that ensures stable, hermetic, secure builds over time, in the form of _lockfiles_.

### What are lockfiles?

A lockfile is a metadata file that enumerates specific pinned versions of every transitive third-party dependency. It also provides the expected SHA256 hashes of the downloadable artifacts (sdists and wheels) for each dependency. A lockfile can contain dependency version information that is valid across multiple platforms and Python interpreter versions. Lockfiles can be large and complex, but fortunately Pants will generate them for you!

If you use lockfiles, and we highly recommend that you do, then Pants will use the locked transitive dependency versions in every build, and only change them when you deliberately update your lockfiles. Pants will also verify the downloaded artifacts against their expected hashes, to ensure that they haven't been compromised after the lockfile was generated.

Pants supports multiple lockfiles for different parts of your repo, via the mechanism of "resolves" - logical names given to lockfiles so that they are easy to reference.

> 📘 Lockfiles are generated by Pex
>
> Pants delegates lockfile creation and consumption to the [Pex](https://github.com/pantsbuild/pex) tool. So you may see standard lockfiles referred to as "Pex-style" lockfiles.
>

### Getting started with resolves

First, you'll need to turn on the resolves functionality for the repo:

```toml pants.toml
[python]
enable_resolves = true
```



Initially, Pants will assume a single resolve named `python-default` which references a lockfile at `3rdparty/python/default.lock`. You can change the name of the default resolve, and/or the location of its lockfile, via:
benjyw marked this conversation as resolved.
Show resolved Hide resolved

```toml pants.toml
[python]
enable_resolves = true
default_resolve = "myresolve"

[python.resolves]
myresolve = "path/to/mylockfile"
```



You generate the lockfile as follows:

```shell Bash
$ pants generate-lockfiles
19:00:39.26 [INFO] Completed: Generate lockfile for python-default
19:00:39.29 [INFO] Wrote lockfile for the resolve `python-default` to 3rdparty/python/default.lock
```



The inputs used to generate a lockfile are third-party dependencies in your repo, expressed via [`python_requirement` targets](doc:python-third-party-dependencies) , or the `python_requirements` / `poetry_requirements` generator targets. In this case, since you haven't yet explicitly mapped your requirement targets to a resolve, they will all map to `python-default`, and so all serve as inputs to the default lockfile.

### Multiple lockfiles

It's generally simpler to have a single resolve for the whole repository, if you can get away with it. But sometimes you may need more than one resolve, if you genuinely have conflicting requirements in different parts of your repo. For example, you may have both Django 3 and Django 4 projects in your repo.

If you need multiple resolves, you declare them in your config file:

```toml pants.toml
[python]
enable_resolves = true
default_resolve = "data_science"

[python.resolves]
data_science = "3rdparty/python/data_science.lock"
webapps_django3 = "3rdparty/python/webapps_django3.lock"
webapps_django4 = "3rdparty/python/webapps_django4.lock"
```



Then, you partition your requirements targets across these resolves using the `resolve` field, and possibly the [parametrize](doc:targets#parametrizing-targets) mechanism:

```python 3rdparty/python/BUILD
python_requirement(
name="django3",
requirements=["django>=3.1.0,<4"],
resolve="webapps_django3",
)

python_requirement(
name="django4",
requirements=["django>=4.0.0,<5"],
resolve="webapps_django4",
)

python_requirements(
name="webapps_shared",
source="webapps-shared-requirements.txt",
resolve=parametrize("webapps_django3", "webapps_django4")
)

poetry_requirements(
name="data_science_requirements",
)
```



Any requirements targets that don't specify an explicit `resolve=` will be associated with the default resolve.

As before, you run `pants generate-lockfiles` to generate the lockfiles. You can use the `--resolve` flag to generate just a subset of lockfiles. E.g.,

```shell Bash
$ pants generate-lockfiles --resolve=webapps_django3 --resolve=webapps_django4
19:00:39.26 [INFO] Completed: Generate lockfile for webapps_django3
19:00:39.29 [INFO] Completed: Generate lockfile for webapps_django4
19:00:40.02 [INFO] Wrote lockfile for the resolve `webapps_django3` to 3rdparty/python/webapps_django3.lock
19:00:40.17 [INFO] Wrote lockfile for the resolve `webapps_django4` to 3rdparty/python/webapps_django4.lock
```



Finally, you update your first-party code targets, such as `python_sources`, `python_tests`, and `pex_binary` to set their `resolve=` field (which, as before, defaults to the default resolve).

```python my/project/BUILD
python_sources(
resolve="django_webapp3",
)

python_tests(
name="tests",
resolve="django_webapp3",
# You can use `overrides` to change certain generated targets
overrides={"test_django4.py": {"resolve": "django_webapp4"}},
)
```



If a first-party target is compatible with multiple resolves, such as utility code, you can use the [parametrize](doc:targets#parametrizing-targets) mechanism with the `resolve=` field.

> 📘 Transitive dependencies must use the same resolve
>
> All transitive dependencies of a source target must use the same resolve. Pants's dependency inference already handles this for you by only inferring dependencies between targets that share the same resolve.
>
> If you manually add a dependency across different resolves, Pants will error with a helpful message when you try to use that dependency.

To reiterate an important distinction: The `resolve=` field on a third-party requirements target specifies that these requirements are _inputs_ to the lockfile generator for that resolve. The `resolve=` field on a first-party source target specifies that this target will _consume_ the generated lockfile for that resolve.

### Interpreter constraints

A lockfile will contain dependencies for all requested Python versions. By default these are the global constraints specified by the [\[python\].interpreter_constraints](doc:reference-python#interpreter_constraints) option. You can override this per-lockfile using the [\[python\].resolves_to_interpreter_constraints](doc:reference-python#resolves_to_interpreter_constraints) option.

### Modifying lockfile generation behavior

You can use the following options to affect how the lockfile generator resolves dependencies for each resolve:

- [\[python\].resolves_to_constraints_file](doc:reference-python#resolves_to_constraints_file): For each resolve, a path to a [Pip constraints file](https://pip.pypa.io/en/stable/user_guide/#constraints-files) to use when resolving that lockfile.
- [\[python\].resolve_to_no_binary](doc:reference-python#resolve_to_no_binary): For each resolve, a list of projects that must only resolve to sdists and not wheels. Use the value `[":all:"]` to disable wheels for all packages.
- [\[python\].resolve_to_only_binary](doc:reference-python#resolve_to_only_binary): For each resolve, a list of projects that must only resolve to wheels and not sdists. Use the value `[":all:"]` to disable sdists for all packages.

You can use the key `__default__` to set the value for all resolves at once.

### Updating lockfiles

If you modify the third-party requirements of a resolve then you must regenerate its lockfile by running the `generate-lockfiles` goal. Pants will display an error if a lockfile is no longer compatible with its updated requirements.

In theory, when you generate a lockfile, you should want to audit it for bugs, compliance and security concerns. In practice this is intractable to do manually. We would like to integrate with automated auditing tools and services in the future, so watch this space for updates, or feel free to [reach out on Slack](doc:the-pants-community) if this is important to you and you'd like to work on it.

### Lockfile subsetting

When consuming a lockfile, Pants uses only the necessary subset of its transitive dependencies in each situation.

For example, when running a test, only the requirements actually used (transitively) by that test will be present on the `sys.path`. This means that a test run won't be invalidated if unrelated requirements have changed, which improves cache hit rates. The same holds true when running and packaging code.

You can override this subsetting behavior by setting the [\[python\].run_against_entire_lockfile](doc:reference-python#run_against_entire_lockfile) option.

### Lockfiles for tools

Pants's Python support typically involves invoking underlying tools, such as `pytest`, `mypy`, `black` etc. in subprocesses. Almost all these tools are themselves written in Python and thus depended on via requirement strings, just like your third-party import dependencies.

It is strongly recommended that these tools be installed from a hermetic lockfile, for the same security and stability reasons stated above. In fact, Pants ships with built-in lockfiles for every Python tool it uses, and uses them automatically.

The only time you need to think about this is if you want to customize the tool requirements that Pants uses. This might be the case if you want to modify the version of a tool or add extra requirements (e.g., tool plugins).

If you want a tool to be installed from some resolve, instead of from the built-in lockfile, you set the resolve on the tool's config section:

```toml pants.toml
[python.resolves]
pytest = "3rdparty/python/pytest.lock"

[pytest]
resolve = "pytest"
```



Then set up the resolve's inputs:

```python 3rdparty/python/BUILD
python_requirements(
source="pytest-requirements.txt",
resolve="pytest",
)
```
```Text 3rdparty/python/pytest-requirements.txt
pytest==7.1.1
pytest-cov>=2.12,!=2.12.1,<3.1
pytest-xdist>=2.5,<3
pytest-myplugin>=1.2.0,<2
```



And generate its custom lockfile:

```shell Bash
$ pants generate-lockfiles --resolve=pytest
19:00:39.26 [INFO] Completed: Generate lockfile for pytest
19:00:39.29 [INFO] Wrote lockfile for the resolve `pytest` to 3rdparty/python/pytest.lock
```



Note that some tools, such as Flake8 and Bandit, must run on a Python interpreter that is compatible with the code they operate on. In this case you must ensure that the interpreter constraints for the tool's resolve are the same as those for the code in question.

### Sharing lockfiles between tools and code

In some cases a tool also provides a runtime library. For example, `pytest` is run as a tool in a subprocess, but your tests can also `import pytest` to access testing functionality.

Rather than repeat the same requirement in two different resolves, you can point the tool at an existing resolve that you also use for your code:

```Text pants.toml
[pytest]
resolve=python-default
```



Of course you have to make sure that this resolve does in fact provide appropriate versions of the tool.

You can have a single resolve for all your tools, or even a single resolve for all your tools and code! This may be useful if you want to [export](doc:reference-export) a virtualenv that includes all your dependencies and all the tools you use.

But note that the more frequently you update a lockfile the more likely it is that unrelated updates will come along for the ride, since Pants does not yet support an "only-if-needed" upgrade strategy.

> 🚧 The previous way of generating tool lockfiles is deprecated!
>
> There is an older way of generating tool lockfiles, by setting the `version` and `extra_requirements` fields on a tool's config. This method is deprecated in favor of the standard one described above.
>
> If you're using this deprecated tool lockfile generation mechanism, please switch to using the one described here as soon as possible!

### Manually generating lockfiles

Rather than using `generate-lockfiles` to generate Pex-style lockfiles, you can generate them manually. This can be useful when adopting Pants in a repository already using Poetry by running `poetry export --dev`.

Manually generated lockfiles must either use Pex's JSON format or use pip's `requirements.txt`-style format (ideally with `--hash` entries for better supply chain security).
For example:

```text 3rdparty/user_lock.txt
freezegun==1.2.0 \
--hash=sha256:93e90676da3... \
--hash=sha256:e19563d0b05...
```

To use a manually generated lockfile for a resolve, point the resolve to that lockfile's path in [`[python].resolves`](doc:reference-python#resolves). Then set [`[python].resolves_generate_lockfiles`](doc:reference-python#resolves_generate_lockfiles) to `False`. Warning: it will likely be slower to install manually-generated user lockfiles than Pex ones, because Pants cannot as efficiently extract the subset of requirements used for a particular task; see the option [`[python].run_against_entire_lockfile`](doc:reference-python#run_against_entire_lockfile).