Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packaged script execute permissions lost from v46.1.0 onwards #2041

Closed
dhallam opened this issue Mar 22, 2020 · 7 comments · Fixed by #2046
Closed

Packaged script execute permissions lost from v46.1.0 onwards #2041

dhallam opened this issue Mar 22, 2020 · 7 comments · Fixed by #2046

Comments

@dhallam
Copy link

dhallam commented Mar 22, 2020

Issue

We use pipenv and, as a result, are forced to use the latest version of setuptools as it internally pins to the latest version.

We observed that executable scripts that are part of python packages have lost their execute flag from setuptools v46.1.0 onwards. The example to demonstrate this bug uses pyspark which includes a number of executable scripts in its package.

The issue was introduced by commit 7843688 where the copy_file() function is now called with preserve_mode=False. The change log states the reason for the change as:

Prevent keeping files mode for package_data build. It may break a build if user's package data has read only flag.

Unfortunately, this has the side effect of stripping all execute permissions from files, meaning users can't use the scripts "out of the box" - they have to set the execute permissions manually.

Demonstration Script

#!/bin/bash
set -eu

wget -nc https://files.pythonhosted.org/packages/9a/5a/271c416c1c2185b6cb0151b29a91fff6fcaed80173c8584ff6d20e46b465/pyspark-2.4.5.tar.gz

for version in "46.0.0" "46.1.0"; do
    rm -rf .venv pyspark-2.4.5
    tar xzf pyspark-2.4.5.tar.gz
    virtualenv -q -p /usr/bin/python3.7 .venv
    . .venv/bin/activate
    python3 -m pip install --upgrade setuptools="==${version}" wheel
    pushd pyspark-2.4.5
    python3 setup.py -q bdist_wheel
    pushd dist
    unzip -q pyspark-2.4.5-py2.py3-none-any.whl
    echo -e "\n\n${version}: Here are the permissions for spark-submit:\n"
    ls -l ./pyspark/bin/spark-submit
    echo -e "\n\n"
    popd
    popd
done

Expected result

-rwxr-xr-x 1 dave dave 1040 Feb  2 19:35 ./pyspark/bin/spark-submit

Actual Result

-rw-rw-r-- 1 dave dave 1040 Feb  2 19:35 ./pyspark/bin/spark-submit
@tibdex
Copy link

tibdex commented Mar 23, 2020

I'm facing this issue too on macOS 10.15.3 and Pipenv 2018.11.26. This code was working fine in 46.0.0 but started failing in 46.1.0:

from pyspark.sql import SparkSession # pyspark==2.4.5

spark = SparkSession.builder.appName("Test").getOrCreate()

@mattclay
Copy link

mattclay commented Mar 24, 2020

In the Ansible community we've started seeing this issue testing Ansible Collections, since ansible-test depends on internal executable scripts which are not entry points.

See #1607 (comment)

@leekillough
Copy link

leekillough commented Mar 24, 2020

The bug seems to have been introduced in the last 3 days, when we started seeing permission problems, most notably the lack of execution permissions in scripts.

Someone briefly mentioned 512565e, but that seems to only be whitespace changes, and the original change in 7843688 goes back to 2018, which was merged a long time ago.

We can workaround this issue by changing the permissions of installed scripts, but I'm not sure that this project is the root cause of this recent regression.

@jaraco @KOLANICH

@mattclay
Copy link

mattclay commented Mar 24, 2020

@leekillough #1607 (which contained 7843688) was merged into master only 2 days ago with the change. The change can be seen in the release history for version v46.1.0, released on March 21:

https://setuptools.readthedocs.io/en/latest/history.html#v46-1-0

#1424: Prevent keeping files mode for package_data build. It may break a build if user’s package data has read only flag.

@leekillough
Copy link

leekillough commented Mar 24, 2020

@leekillough #1607 (which contained 7843688) was merged into master only 2 days ago with the change. The change can be seen in the release history for version v46.1.0, released on March 21:

https://setuptools.readthedocs.io/en/latest/history.html#v46-1-0

#1424: Prevent keeping files mode for package_data build. It may break a build if user’s package data has read only flag.

Sorry, I missed that PR. I thought I went through them all, but for some reason I could never find commit 7843688.

@HyukjinKwon
Copy link

HyukjinKwon commented Mar 24, 2020

Hi @jaraco and @jorikdima, Is it going to be fixed?

Apache Spark community faced this issue and CI started to break. We're setting the upper bound of setuptools to work around CI break for now.

@HyukjinKwon
Copy link

HyukjinKwon commented Mar 24, 2020

I also filed this issue in Spark side to track SPARK-31231.

HyukjinKwon added a commit to apache/spark that referenced this issue Mar 24, 2020
…ackage test

### What changes were proposed in this pull request?

For a bit of background,
PIP packaging test started to fail (see [this logs](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120218/testReport/)) as of  setuptools 46.1.0 release. In pypa/setuptools#1424, they decided to don't keep the modes in `package_data`.

In PySpark pip installation, we keep the executable scripts in `package_data` https://github.com/apache/spark/blob/fc4e56a54c15e20baf085e6061d3d83f5ce1185d/python/setup.py#L199-L200, and expose their symbolic links as executable scripts.

So, the symbolic links (or copied scripts) executes the scripts copied from `package_data`, which doesn't have the executable permission in its mode:

```
/tmp/tmp.UmkEGNFdKF/3.6/bin/spark-submit: line 27: /tmp/tmp.UmkEGNFdKF/3.6/lib/python3.6/site-packages/pyspark/bin/spark-class: Permission denied
/tmp/tmp.UmkEGNFdKF/3.6/bin/spark-submit: line 27: exec: /tmp/tmp.UmkEGNFdKF/3.6/lib/python3.6/site-packages/pyspark/bin/spark-class: cannot execute: Permission denied
```

The current issue is being tracked at pypa/setuptools#2041

</br>

For what this PR proposes:
It sets the upper bound in PR builder for now to unblock other PRs.  _This PR does not solve the issue yet. I will make a fix after monitoring https://github.com/pypa/setuptools/issues/2041_

### Why are the changes needed?

It currently affects users who uses the latest setuptools. So, _users seem unable to use PySpark with the latest setuptools._ See also pypa/setuptools#2041 (comment)

### Does this PR introduce any user-facing change?

It makes CI pass for now. No user-facing change yet.

### How was this patch tested?

Jenkins will test.

Closes #27995 from HyukjinKwon/investigate-pip-packaging.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
HyukjinKwon added a commit to apache/spark that referenced this issue Mar 24, 2020
…ackage test

### What changes were proposed in this pull request?

For a bit of background,
PIP packaging test started to fail (see [this logs](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120218/testReport/)) as of  setuptools 46.1.0 release. In pypa/setuptools#1424, they decided to don't keep the modes in `package_data`.

In PySpark pip installation, we keep the executable scripts in `package_data` https://github.com/apache/spark/blob/fc4e56a54c15e20baf085e6061d3d83f5ce1185d/python/setup.py#L199-L200, and expose their symbolic links as executable scripts.

So, the symbolic links (or copied scripts) executes the scripts copied from `package_data`, which doesn't have the executable permission in its mode:

```
/tmp/tmp.UmkEGNFdKF/3.6/bin/spark-submit: line 27: /tmp/tmp.UmkEGNFdKF/3.6/lib/python3.6/site-packages/pyspark/bin/spark-class: Permission denied
/tmp/tmp.UmkEGNFdKF/3.6/bin/spark-submit: line 27: exec: /tmp/tmp.UmkEGNFdKF/3.6/lib/python3.6/site-packages/pyspark/bin/spark-class: cannot execute: Permission denied
```

The current issue is being tracked at pypa/setuptools#2041

</br>

For what this PR proposes:
It sets the upper bound in PR builder for now to unblock other PRs.  _This PR does not solve the issue yet. I will make a fix after monitoring https://github.com/pypa/setuptools/issues/2041_

### Why are the changes needed?

It currently affects users who uses the latest setuptools. So, _users seem unable to use PySpark with the latest setuptools._ See also pypa/setuptools#2041 (comment)

### Does this PR introduce any user-facing change?

It makes CI pass for now. No user-facing change yet.

### How was this patch tested?

Jenkins will test.

Closes #27995 from HyukjinKwon/investigate-pip-packaging.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(cherry picked from commit c181c45)
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
HyukjinKwon added a commit to apache/spark that referenced this issue Mar 24, 2020
…ackage test

### What changes were proposed in this pull request?

For a bit of background,
PIP packaging test started to fail (see [this logs](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120218/testReport/)) as of  setuptools 46.1.0 release. In pypa/setuptools#1424, they decided to don't keep the modes in `package_data`.

In PySpark pip installation, we keep the executable scripts in `package_data` https://github.com/apache/spark/blob/fc4e56a54c15e20baf085e6061d3d83f5ce1185d/python/setup.py#L199-L200, and expose their symbolic links as executable scripts.

So, the symbolic links (or copied scripts) executes the scripts copied from `package_data`, which doesn't have the executable permission in its mode:

```
/tmp/tmp.UmkEGNFdKF/3.6/bin/spark-submit: line 27: /tmp/tmp.UmkEGNFdKF/3.6/lib/python3.6/site-packages/pyspark/bin/spark-class: Permission denied
/tmp/tmp.UmkEGNFdKF/3.6/bin/spark-submit: line 27: exec: /tmp/tmp.UmkEGNFdKF/3.6/lib/python3.6/site-packages/pyspark/bin/spark-class: cannot execute: Permission denied
```

The current issue is being tracked at pypa/setuptools#2041

</br>

For what this PR proposes:
It sets the upper bound in PR builder for now to unblock other PRs.  _This PR does not solve the issue yet. I will make a fix after monitoring https://github.com/pypa/setuptools/issues/2041_

### Why are the changes needed?

It currently affects users who uses the latest setuptools. So, _users seem unable to use PySpark with the latest setuptools._ See also pypa/setuptools#2041 (comment)

### Does this PR introduce any user-facing change?

It makes CI pass for now. No user-facing change yet.

### How was this patch tested?

Jenkins will test.

Closes #27995 from HyukjinKwon/investigate-pip-packaging.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
(cherry picked from commit c181c45)
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
jillr added a commit to jillr/amazon.aws that referenced this issue Mar 24, 2020
jillr added a commit to ansible-collections/amazon.aws that referenced this issue Mar 25, 2020
* Rename collection

Content collections may not be published to the ansible namespace. As such,
rename collection, imports, and other relevant paths to amazon.aws.

* Remove unused GH action from readme

* Workaround setuptools bug pypa/setuptools#2041

* Missed some paths that needed to change in unit tests

* Test pulling community collection from a PR with modified imports
jaraco added a commit that referenced this issue Mar 25, 2020
sjincho pushed a commit to sjincho/spark that referenced this issue Apr 15, 2020
…ackage test

### What changes were proposed in this pull request?

For a bit of background,
PIP packaging test started to fail (see [this logs](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120218/testReport/)) as of  setuptools 46.1.0 release. In pypa/setuptools#1424, they decided to don't keep the modes in `package_data`.

In PySpark pip installation, we keep the executable scripts in `package_data` https://github.com/apache/spark/blob/fc4e56a54c15e20baf085e6061d3d83f5ce1185d/python/setup.py#L199-L200, and expose their symbolic links as executable scripts.

So, the symbolic links (or copied scripts) executes the scripts copied from `package_data`, which doesn't have the executable permission in its mode:

```
/tmp/tmp.UmkEGNFdKF/3.6/bin/spark-submit: line 27: /tmp/tmp.UmkEGNFdKF/3.6/lib/python3.6/site-packages/pyspark/bin/spark-class: Permission denied
/tmp/tmp.UmkEGNFdKF/3.6/bin/spark-submit: line 27: exec: /tmp/tmp.UmkEGNFdKF/3.6/lib/python3.6/site-packages/pyspark/bin/spark-class: cannot execute: Permission denied
```

The current issue is being tracked at pypa/setuptools#2041

</br>

For what this PR proposes:
It sets the upper bound in PR builder for now to unblock other PRs.  _This PR does not solve the issue yet. I will make a fix after monitoring https://github.com/pypa/setuptools/issues/2041_

### Why are the changes needed?

It currently affects users who uses the latest setuptools. So, _users seem unable to use PySpark with the latest setuptools._ See also pypa/setuptools#2041 (comment)

### Does this PR introduce any user-facing change?

It makes CI pass for now. No user-facing change yet.

### How was this patch tested?

Jenkins will test.

Closes apache#27995 from HyukjinKwon/investigate-pip-packaging.

Authored-by: HyukjinKwon <gurwls223@apache.org>
Signed-off-by: HyukjinKwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants