Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserving read-only flags on package data causing access denied when building #1424

Closed
miuliano opened this issue Jul 12, 2018 · 3 comments · Fixed by #1607
Closed

Preserving read-only flags on package data causing access denied when building #1424

miuliano opened this issue Jul 12, 2018 · 3 comments · Fixed by #1607
Labels
Needs Triage Issues that need to be evaluated for severity and status.

Comments

@miuliano
Copy link

I'm working with a read-only source directory on Windows and running into access denied errors when calling python setup.py bdist_wheel. I noticed that it was only files included in package_data that were causing this, which helped me track it down to:

setuptools/setuptools/command/build_py.py
outf, copied = self.copy_file(srcfile, target)

which is slightly different from its distutils version:

cpython/Lib/distutils/command/build_py.py
self.copy_file(os.path.join(src_dir, filename), target, preserve_mode=False)

So it looks like it comes down to this preserve_mode flag.

I don't know if there is a good reason to preserve things like read-only flags? There's a comment about preserving mode further up in the distutils code, which seems to suggest not preserving is the way to go.

@pganssle pganssle added the Needs Triage Issues that need to be evaluated for severity and status. label Oct 19, 2018
@jorikdima
Copy link
Contributor

Let me vote up. I was going to report about this issue, but @miuliano has already done it.
My version control system keeps all unchecked files in RO mode, so when I build a wheel it copies sources to build folder. But ONLY package data is keeping mode then and build wheel crashes as it can't delete it then.

@acerv
Copy link

acerv commented Feb 18, 2019

Hello everyone,

this is actually a major issue in Windows systems, where some versioning tools might stick with read-only files by default when creating their local repo.

And this is the case of Perforce, where all files are read-only by default. I tried to put my own project inside the CI system using Jenkins. After bdist_wheel command, packaging failed because copied files with MANIFEST.in setup had read-only flag and it was not possible to removed them after build folder was created.

My 2 cents.

Andrea

@miuliano
Copy link
Author

I finally had some time to come back to this issue and dig a bit deeper. It's definitely messier than it looks, and @jorikdima did a great job breaking it down in the PR. For simplicity's sake, there are really two issues here:

  1. setuptools implementation of the build_py command, unlike the distutils version, preserves file modes for package_data when copying to the build folder (but not for the package itself). However, on Windows the read-only attribute is the only supported one as os.chmod doesn't support anything else (see https://docs.python.org/3/library/os.html#os.chmod). Read-only on Windows isn't the same as having the write bits unset on Linux - it also means the file can't be deleted. This prevents deleting the file in the build folder as well as prevents copying an updated version if it has changed in source.

  2. The bdist_wheel command finishes by attempting to call shutil.rmtree on the folder containing the intermediate files used to generate the wheel. These files are created via the install (install_lib) command, which copies them from the build folder. This always preserves file modes so that the read-only flag gets propagated over. rmtree comes with the caveat that attempting to delete read-only files on Windows will cause an exception (https://docs.python.org/3/library/shutil.html#rmtree-example), but the wheel package doesn't handle this correctly so it errors out.

As for solutions... My gut feeling is that preserving modes on Windows during the build step is just a gotcha with very few benefits. If setuptools is open to a Windows-specific hack then I'd turn off copy_file's preserve_mode for just that platform. The wheel package also needs better rmtree error handling, so I'll log a bug there.

jaraco added a commit that referenced this issue Mar 21, 2020
HyukjinKwon added a commit to apache/spark that referenced this issue Mar 24, 2020
…ackage test

### What changes were proposed in this pull request?

For a bit of background,
PIP packaging test started to fail (see [this logs](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120218/testReport/)) as of  setuptools 46.1.0 release. In pypa/setuptools#1424, they decided to don't keep the modes in `package_data`.

In PySpark pip installation, we keep the executable scripts in `package_data` https://github.com/apache/spark/blob/fc4e56a54c15e20baf085e6061d3d83f5ce1185d/python/setup.py#L199-L200, and expose their symbolic links as executable scripts.

So, the symbolic links (or copied scripts) executes the scripts copied from `package_data`, which doesn't have the executable permission in its mode:

```
/tmp/tmp.UmkEGNFdKF/3.6/bin/spark-submit: line 27: /tmp/tmp.UmkEGNFdKF/3.6/lib/python3.6/site-packages/pyspark/bin/spark-class: Permission denied
/tmp/tmp.UmkEGNFdKF/3.6/bin/spark-submit: line 27: exec: /tmp/tmp.UmkEGNFdKF/3.6/lib/python3.6/site-packages/pyspark/bin/spark-class: cannot execute: Permission denied
```

The current issue is being tracked at pypa/setuptools#2041

</br>

For what this PR proposes:
It sets the upper bound in PR builder for now to unblock other PRs.  _This PR does not solve the issue yet. I will make a fix after monitoring https://github.com/pypa/setuptools/issues/2041_

### Why are the changes needed?

It currently affects users who uses the latest setuptools. So, _users seem unable to use PySpark with the latest setuptools._ See also pypa/setuptools#2041 (comment)

### Does this PR introduce any user-facing change?

It makes CI pass for now. No user-facing change yet.

### How was this patch tested?

Jenkins will test.

Closes #27995 from HyukjinKwon/investigate-pip-packaging.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
HyukjinKwon added a commit to apache/spark that referenced this issue Mar 24, 2020
…ackage test

### What changes were proposed in this pull request?

For a bit of background,
PIP packaging test started to fail (see [this logs](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120218/testReport/)) as of  setuptools 46.1.0 release. In pypa/setuptools#1424, they decided to don't keep the modes in `package_data`.

In PySpark pip installation, we keep the executable scripts in `package_data` https://github.com/apache/spark/blob/fc4e56a54c15e20baf085e6061d3d83f5ce1185d/python/setup.py#L199-L200, and expose their symbolic links as executable scripts.

So, the symbolic links (or copied scripts) executes the scripts copied from `package_data`, which doesn't have the executable permission in its mode:

```
/tmp/tmp.UmkEGNFdKF/3.6/bin/spark-submit: line 27: /tmp/tmp.UmkEGNFdKF/3.6/lib/python3.6/site-packages/pyspark/bin/spark-class: Permission denied
/tmp/tmp.UmkEGNFdKF/3.6/bin/spark-submit: line 27: exec: /tmp/tmp.UmkEGNFdKF/3.6/lib/python3.6/site-packages/pyspark/bin/spark-class: cannot execute: Permission denied
```

The current issue is being tracked at pypa/setuptools#2041

</br>

For what this PR proposes:
It sets the upper bound in PR builder for now to unblock other PRs.  _This PR does not solve the issue yet. I will make a fix after monitoring https://github.com/pypa/setuptools/issues/2041_

### Why are the changes needed?

It currently affects users who uses the latest setuptools. So, _users seem unable to use PySpark with the latest setuptools._ See also pypa/setuptools#2041 (comment)

### Does this PR introduce any user-facing change?

It makes CI pass for now. No user-facing change yet.

### How was this patch tested?

Jenkins will test.

Closes #27995 from HyukjinKwon/investigate-pip-packaging.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(cherry picked from commit c181c45)
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
HyukjinKwon added a commit to apache/spark that referenced this issue Mar 24, 2020
…ackage test

### What changes were proposed in this pull request?

For a bit of background,
PIP packaging test started to fail (see [this logs](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120218/testReport/)) as of  setuptools 46.1.0 release. In pypa/setuptools#1424, they decided to don't keep the modes in `package_data`.

In PySpark pip installation, we keep the executable scripts in `package_data` https://github.com/apache/spark/blob/fc4e56a54c15e20baf085e6061d3d83f5ce1185d/python/setup.py#L199-L200, and expose their symbolic links as executable scripts.

So, the symbolic links (or copied scripts) executes the scripts copied from `package_data`, which doesn't have the executable permission in its mode:

```
/tmp/tmp.UmkEGNFdKF/3.6/bin/spark-submit: line 27: /tmp/tmp.UmkEGNFdKF/3.6/lib/python3.6/site-packages/pyspark/bin/spark-class: Permission denied
/tmp/tmp.UmkEGNFdKF/3.6/bin/spark-submit: line 27: exec: /tmp/tmp.UmkEGNFdKF/3.6/lib/python3.6/site-packages/pyspark/bin/spark-class: cannot execute: Permission denied
```

The current issue is being tracked at pypa/setuptools#2041

</br>

For what this PR proposes:
It sets the upper bound in PR builder for now to unblock other PRs.  _This PR does not solve the issue yet. I will make a fix after monitoring https://github.com/pypa/setuptools/issues/2041_

### Why are the changes needed?

It currently affects users who uses the latest setuptools. So, _users seem unable to use PySpark with the latest setuptools._ See also pypa/setuptools#2041 (comment)

### Does this PR introduce any user-facing change?

It makes CI pass for now. No user-facing change yet.

### How was this patch tested?

Jenkins will test.

Closes #27995 from HyukjinKwon/investigate-pip-packaging.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(cherry picked from commit c181c45)
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
sjincho pushed a commit to sjincho/spark that referenced this issue Apr 15, 2020
…ackage test

### What changes were proposed in this pull request?

For a bit of background,
PIP packaging test started to fail (see [this logs](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120218/testReport/)) as of  setuptools 46.1.0 release. In pypa/setuptools#1424, they decided to don't keep the modes in `package_data`.

In PySpark pip installation, we keep the executable scripts in `package_data` https://github.com/apache/spark/blob/fc4e56a54c15e20baf085e6061d3d83f5ce1185d/python/setup.py#L199-L200, and expose their symbolic links as executable scripts.

So, the symbolic links (or copied scripts) executes the scripts copied from `package_data`, which doesn't have the executable permission in its mode:

```
/tmp/tmp.UmkEGNFdKF/3.6/bin/spark-submit: line 27: /tmp/tmp.UmkEGNFdKF/3.6/lib/python3.6/site-packages/pyspark/bin/spark-class: Permission denied
/tmp/tmp.UmkEGNFdKF/3.6/bin/spark-submit: line 27: exec: /tmp/tmp.UmkEGNFdKF/3.6/lib/python3.6/site-packages/pyspark/bin/spark-class: cannot execute: Permission denied
```

The current issue is being tracked at pypa/setuptools#2041

</br>

For what this PR proposes:
It sets the upper bound in PR builder for now to unblock other PRs.  _This PR does not solve the issue yet. I will make a fix after monitoring https://github.com/pypa/setuptools/issues/2041_

### Why are the changes needed?

It currently affects users who uses the latest setuptools. So, _users seem unable to use PySpark with the latest setuptools._ See also pypa/setuptools#2041 (comment)

### Does this PR introduce any user-facing change?

It makes CI pass for now. No user-facing change yet.

### How was this patch tested?

Jenkins will test.

Closes apache#27995 from HyukjinKwon/investigate-pip-packaging.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Triage Issues that need to be evaluated for severity and status.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants