-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Description
What's the problem this feature will solve?
The Bazel build system has the major selling point of supporting both local and remote-caching.
In order for that caching to work though, Bazel targets must be built deterministically so that the same target always has the same content-addressable hash.
Currently pip wheel is non-deterministic, so our Python Bazel targets will cache miss if they depend on something built with pip wheel.
Describe the solution you'd like
Note: The following is the output of a Bazel execution log. A bit unrelated to the
pip wheelcommand but shows the relevant information.
inputs {
path: "external/pypi__PyYAML_5_1/PyYAML-5.1.dist-info/LICENSE"
digest {
hash: "a2adb9c959b797494a0ef80bdf60e22db2749ee3e0c0908556e3eb548f967c56"
size_bytes: 1101
hash_function_name: "SHA-256"
}
}
inputs {
path: "external/pypi__PyYAML_5_1/PyYAML-5.1.dist-info/METADATA"
digest {
hash: "df7bc0c7cbd2ce350c5c61ceda3a74bbcb6f82446a7c01f7f8e1034a98df231f"
size_bytes: 1704
hash_function_name: "SHA-256"
}
}
inputs {
path: "external/pypi__PyYAML_5_1/PyYAML-5.1.dist-info/RECORD"
digest {
hash: "6fe803b74ab4fcab1f23e96060cf062d12779598af7e72692c492c2dd7cad0ed"
size_bytes: 1701
hash_function_name: "SHA-256"
}
}
inputs {
path: "external/pypi__PyYAML_5_1/PyYAML-5.1.dist-info/WHEEL"
digest {
hash: "cdf2c8f141bc498ae490a88870d655dd174abe3db8c1f57562224b168930c624"
size_bytes: 104
hash_function_name: "SHA-256"
}
}
inputs {
path: "external/pypi__PyYAML_5_1/PyYAML-5.1.dist-info/top_level.txt"
digest {
hash: "ae98f42153138ac02387fd6f1b709c7fdbf98e9090c00cfa703d48554e597614"
size_bytes: 11
hash_function_name: "SHA-256"
}
}
inputs {
path: "external/pypi__PyYAML_5_1/_yaml.cpython-36m-x86_64-linux-gnu.so"
digest {
hash: "a7f3774015f839ccee5e2281bbfdf22a42e0e1dafaac33ef4c91db83a07210d9"
size_bytes: 1133288
hash_function_name: "SHA-256"
}
}
inputs {
path: "external/pypi__PyYAML_5_1/yaml/__init__.py"
digest {
hash: "2af8b6dbcb1df5c63597f215421cad02f2317e291061b181b0f7bbf4f71ac0dd"
size_bytes: 12012
hash_function_name: "SHA-256"
}
}
The following is a subset of the build outputs of the PyYAML package. Of the build outputs, it is the RECORD files and the _yaml.cpython-36m-x86_64-linux-gnu.so shared object file that have non-deterministic hashes build to build. I have inspected the RECORD file and found that it contains the hash of the .so file, so it is non-deterministic because of the .so file, and I think only because of that.
So the problem is the .so file.
I ran the strings program on the .so file and found this printable string: /tmp/pip-wheel-_bd8v3f2/pyyaml. That is coming from here:
pip/src/pip/_internal/wheel.py
Line 649 in 6af9de9
| with TempDirectory(kind="wheel") as temp_dir: |
So while I found other differences between different _yaml.cpython-36m-x86_64-linux-gnu.so, this tmp directory usage leaking in itself is sufficient to break determinism.
Additional context
rules_python issue discussing this problem: bazel-contrib/rules_python#154
rules_python repo: https://github.com/bazelbuild/rules_python