Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

semgrep thows exception on python setup.py install #2054

Closed
ajinabraham opened this issue Nov 16, 2020 · 29 comments · Fixed by #2326
Closed

semgrep thows exception on python setup.py install #2054

ajinabraham opened this issue Nov 16, 2020 · 29 comments · Fixed by #2326
Assignees
Labels
bug Something isn't working priority:high Issue requires immediate attention user:external requested by someone outside of r2c

Comments

@ajinabraham
Copy link
Contributor

Describe the bug
A clear and concise description of what the bug is.

Processing semgrep-0.31.1-cp36.cp37.cp38.py36.py37.py38-none-macosx_10_14_x86_64.whl
Installing semgrep-0.31.1-cp36.cp37.cp38.py36.py37.py38-none-macosx_10_14_x86_64.whl to /Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages
Adding semgrep 0.31.1 to easy-install.pth file
Traceback (most recent call last):
  File "setup.py", line 56, in <module>
    install_requires=requirements,
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/__init__.py", line 144, in setup
    return distutils.core.setup(**attrs)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/install.py", line 67, in run
    self.do_egg_install()
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/install.py", line 117, in do_egg_install
    cmd.run(show_deprecation=False)
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/easy_install.py", line 424, in run
    self.easy_install(spec, not self.no_deps)
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/easy_install.py", line 673, in easy_install
    return self.install_item(None, spec, tmpdir, deps, True)
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/easy_install.py", line 720, in install_item
    self.process_distribution(spec, dist, deps)
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/easy_install.py", line 765, in process_distribution
    [requirement], self.local_index, self.easy_install
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/pkg_resources/__init__.py", line 783, in resolve
    replace_conflicting=replace_conflicting
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1066, in best_match
    return self.obtain(req, installer)
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1078, in obtain
    return installer(requirement)
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/easy_install.py", line 692, in easy_install
    return self.install_item(spec, dist.location, tmpdir, deps)
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/easy_install.py", line 720, in install_item
    self.process_distribution(spec, dist, deps)
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/easy_install.py", line 745, in process_distribution
    self.install_egg_scripts(dist)
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/setuptools/command/easy_install.py", line 619, in install_egg_scripts
    dist.get_metadata('scripts/' + script_name)
  File "/Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1426, in get_metadata
    return value.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcf in position 0: invalid continuation byte in scripts/spacegrep file at path: /Users/ajinabraham/Code/Mobile-Security-Framework-MobSF/v/lib/python3.7/site-packages/semgrep-0.31.1-py3.7-macosx-10.9-x86_64.egg/EGG-INFO/scripts/spacegrep

To Reproduce
Steps to reproduce the behavior, ideally a link to https://semgrep.dev:

Add semgrep to a setup.py script and run python setup.py install

Expected behavior
install gracefully

Screenshots
If applicable, add screenshots to help explain your problem.

What is the priority of the bug to you?
NA

Environment
pypi, 0.31.1

@ajinabraham
Copy link
Contributor Author

ajinabraham commented Nov 16, 2020

Setup in question https://github.com/MobSF/Mobile-Security-Framework-MobSF/blob/setup/setup.py

Repro

python3 -m venv v
. v/bin/activate
python setup.py install

@brendongo brendongo added the bug Something isn't working label Nov 16, 2020
@mschwager
Copy link
Contributor

Thanks for the heads up @ajinabraham - we'll look into this shortly 👍

@brendongo
Copy link
Member

brendongo commented Nov 16, 2020

Was able to reproduce this locally but not entirely sure what's happening. @mschwager do you know have any ideas why python is trying to read spacegrep as utf-8 :D

@brendongo
Copy link
Member

Was able to reproduce this with semgrep==0.27.0 so it's not just spacegrep, python tries to read semgrep-core as utf-8 as well.

@brendongo
Copy link
Member

Might be relevant: https://github.com/pypa/setuptools/blob/master/setuptools/command/easy_install.py#L591
I think we might not want to name the folder we put the binaries in scripts/ but I'm not sure if we need to do extra work somewhere to make sure the files are included in the wheel if we put them in a different folder.

@minusworld minusworld self-assigned this Nov 30, 2020
@nbrahms nbrahms added the user:external requested by someone outside of r2c label Dec 1, 2020
@minusworld
Copy link
Member

minusworld commented Dec 2, 2020

@brendongo Do you know where we put the binaries in scripts/ offhand?

Found it.

Looks like we were originally installing to bin/ but there's a comment linking to this issue that suggests we can't for some reason: pypa/setuptools#210 (comment)

The recommendation is to move over to scripts/ after install, which is what we do now.

Then, easy_install makes assumptions about things in the scripts/ directory, which tries to parse the binaries as text... 😢

@minusworld
Copy link
Member

This data_files thing might be an option: https://docs.python.org/3/distutils/setupscript.html#installing-additional-files

Could also consider an upstream patch to easy_install 🤷‍♂️

@minusworld
Copy link
Member

minusworld commented Dec 2, 2020

Indeed, if I rename scripts/ in the .egg, easy_install quits complaining:

(venv) ➜  site-packages git:(master) ✗ find . -type d -name 'scripts'
./semgrep-0.32.0-py3.8.egg/EGG-INFO/scripts
# RENAME TO bin/
(venv) ➜  site-packages git:(master) ✗ find . -type d -name 'bin'
./semgrep-0.32.0-py3.8.egg/EGG-INFO/bin
(venv) ➜  pyspider git:(master) ✗ python3 setup.py install
# ...
Searching for semgrep==0.32.0
Best match: semgrep 0.32.0
Processing semgrep-0.32.0-py3.8.egg
semgrep 0.32.0 is already the active version in easy-install.pth
Installing semgrep script to /Users/grayson/sandbox/pyspider/venv/bin
# ...
Finished processing dependencies for pyspider==0.3.2
(venv) ➜  pyspider git:(master) ✗

@DrewDennison
Copy link
Member

Huzzah! Nice work @minusworld

@minusworld
Copy link
Member

Heh, thanks -- we still actually have to solve the issue 🤷‍♂️
Now comes the hard part 😅

@minusworld
Copy link
Member

@ajinabraham As a temporary workaround, you may try installing without easy_install via another method. This isn't ideal, so we will keep working on this issue.

@brendongo @DrewDennison Any suggestions for how to proceed? I see two main options:

  1. Figure out an alternative to scripts/
  2. Try to fix easy_install upstream to check the file type, or something similar

@ajinabraham
Copy link
Contributor Author

This is out of my control. This came as a requirement to install and package MobSF(which uses libsast that has a requirement on semgrep) on a pentesting distribution. The maintainer of that project expects a setup.py like most tools. MobSF/Mobile-Security-Framework-MobSF#1585 (comment)

@ajinabraham
Copy link
Contributor Author

Note sure if it's relevant, but have you looked into bundling data files or binaries using MANIFEST.in ?
https://packaging.python.org/guides/using-manifest-in/

@minusworld
Copy link
Member

OK gotcha. MANIFEST.in looks like it could be an option. 👍 We'll look into it and see if it works.

@DrewDennison DrewDennison added priority:high Issue requires immediate attention and removed priority:medium labels Dec 8, 2020
@underyx
Copy link
Member

underyx commented Dec 10, 2020

@minusworld will we be able to try this solution by 0.35?

@mschwager
Copy link
Contributor

TL;DR: After further investigation I can't find a way to make both pip and setuptools/easy_install work with a bundled semgrep-core binary. The Python documentation recommends 3 ways for packaging additional data files with a package. Each had a shortcoming in my testing.

  • scripts - pip: works, setuptools tries to interpret the binary as a text script, which causes the UnicodeDecodeError.
  • package_data - pip: works, setuptools does not retain the execute bit on the binary after installation.
  • data_files - pip: works, setuptools does not retain the execute bit on the binary after installation.

The execute bit problem means we receive a PermissionError running semgrep-core any time we run semgrep after that installation method. I don't think it's feasible to change the execute bit after installation if we notice it's missing. E.g. if someone installs with sudo we won't have the correct privileges. And more generally, trying to change file permission post-installation seems fragile and poor practice.

So, I see a few options:

  • Move to pip installation instead of setuptools (setup.py install).
  • Users perform some manual post-installation commands if running setuptools.
  • Other? Am I missing anything?

Perhaps it'd be best to pull in @blshkv and see what's feasible. Is there a reason you cannot use pip for installing this package? Am I missing anything? Have other packages managed to successfully bundle an executable binary and have installation work with both pip and setuptools?

@blshkv
Copy link

blshkv commented Dec 17, 2020

pip is a higher-level interface on top of setuptools or distribute, it might even call setup.py (never used pip, so not sure). So please focus on setuptools solution

@ajinabraham
Copy link
Contributor Author

I guess semgrep needs to support pip as that's the easiest and widely used way for people to install semgrep. Other tools that integrate semgrep also have to install semgrep which may not use pip at all.

@mschwager In the ideal scenario, instead of semgrep-core as a binary in a python package, there should have been shared objects targeted for multiple arch and OS. So I wouldn't see any issues with setting execute bit as a part of the install process and not a post or manual process.
Alternatively, you could invoke semgrep-core without execute bit using dynamic linker /lib/x86_64-linux-gnu/ld-2.27.so <ELF binary> if it does not have too many external dependencies.

@mschwager
Copy link
Contributor

Can you point me to where you're seeing the original issue? I can reproduce it, but I can't see where/why this is an issue vs. simply using pip. Is it a CI system, or install instructions somewhere that can be updated?

As mentioned above in my testing, setuptools is unable to handle this situation, unless I'm missing something. If you can point me to a packaging mechanism that allows for binary data and retains the execute bit then I'd be happy to try it out.

We could package semgrep-core differently, but that's a large undertaking for little gain at this point. We haven't seen users using setup.py, and we only officially support pip for Python package installation at this point. As mentioned above, another option is some post-install commands for this situation. For example, chmod +x the binaries, or a "bring your own semgrep-core and spacegrep" approach with setup.py.

@ajinabraham
Copy link
Contributor Author

@mschwager @blshkv can give more information on that. He had a use case of integrating/bundling tools with Pentoo, a security-focused Linux Distro. For python projects, the standard approach followed by the project is python setup.py install it seems. When I require semgrep in setup.py, and uses install problem occurs. Maybe for non-pip and problematic use cases, it should fail gracefully telling, bring your own binary and make it available in PATH rather than triggering an error related to semgrep-core and interrupting the setup.

@blshkv
Copy link

blshkv commented Dec 20, 2020

You can also have a look at https://snarky.ca/what-the-heck-is-pyproject-toml/ for the complete picture.

@blshkv
Copy link

blshkv commented Dec 20, 2020

I think the closest software is frida where we solved а similar problem, see:
frida/frida-python#136
See also https://github.com/frida/frida-core and https://github.com/frida/frida-python (setup.py)

I'm happy with the things as they are now (0.28.0) in general.
We download semgrep-0.28.0-cp36.cp37.cp38.py36.py37.py38-none-any.whl binary and install semgrep python using setuptools. Any other standard way like pyproject is fine too. However, pip is a python package manager designed of the end user. Each Linux distribution has each own package manager and everything has to be ported to it usually

@mschwager
Copy link
Contributor

Thanks for the additional information!

We download semgrep-0.28.0-...-none-any.whl binary and install semgrep python using setuptools.

Can you point me to where this is being done? That will help with my testing and learning about this issue.

I'm trying to brainstorm the best solution here. Would bringing your own semgrep-core and spacegrep binaries then running setup.py work for you? They're included in the .whl artifacts, so you could easily grab them there. Something like the following:

$ wget https://files.pythonhosted.org/semgrep.whl # Or somehow obtain the .whl artifact
$ unzip semgrep.whl
$ cp /path/to/unzipped/semgrep-core /somewhere/in/PATH
$ cp /path/to/unzipped/spacegrep /somewhere/in/PATH
$ SEMGREP_SKIP_BIN=true python setup.py
$ semgrep -h # Will now use binaries installed in PATH

Would that installation workflow work for you?

Unfortunately, splitting Semgrep into two packages and using extension modules is more work then we can take on right now.

@blshkv
Copy link

blshkv commented Dec 24, 2020

At Pentoo/Gentoo we build all software from source usually and create ebuilds using bash (pretty much) for each package.
It is common to have multiple repositories for each different modules. It definitely would help to create cleaner ebuilds because the Gentoo API expects modelised software.

Here is the current Pentoo implementation:
https://github.com/pentoo/pentoo-overlay/tree/master/dev-util/semgrep-core-bin
https://github.com/pentoo/pentoo-overlay/tree/master/dev-python/semgrep

SEMGREP_SKIP_BIN=true is a workaround and it should work fine too.

@mschwager
Copy link
Contributor

Hi @blshkv,

I've just merged #2326, which should fix this. This change will go out in the next release in the next few days. The install process should remain similar, but you'll have to make a few changes (note we now have two binaries: semgrep-core and spacegrep):

  • PRECOMPILED_LOCATION no longer exists. You can now use the SEMGREP_SKIP_BIN env variable during installation to avoid installing the binaries. semgrep will then look for the binaries in your PATH via shutil.which.

We've also made the SEMGREP_CORE_BIN and SPACEGREP_BIN env variables available, but I suspect those won't work because they rely on package_data, which fails to retain the execute bit with setuptools/easy_install. Another option here would be using those env variables, then doing a chmod +x after installation, but before running semgrep.

@mschwager
Copy link
Contributor

@blshkv, heads up, I've confirmed the fix, which went out in v0.36.0:

$ SEMGREP_SKIP_BIN=true easy_install https://files.pythonhosted.org/packages/2b/63/4ed5b3c8c7f871fbcbabd6c9d2f16dcb45f70104155aba6f59da65afc8ed/semgrep-0.36.0.tar.gz
$ echo '1 == 1' | semgrep -l python -e '$X == $X' -
/tmp/tmp46ax8fv0
1:1 == 1
ran 1 rules on 1 files: 1 findings

If you don't include SEMGREP_SKIP_BIN you'll see something like:

$ easy_install https://files.pythonhosted.org/packages/2b/63/4ed5b3c8c7f871fbcbabd6c9d2f16dcb45f70104155aba6f59da65afc8ed/semgrep-0.36.0.tar.gz
...
Exception: Could not find 'semgrep-core' executable, tried 'SEMGREP_CORE_BIN' and system 'semgrep-core'

If you don't have semgrep-core available on your system you'll see something like:

$ echo '1 == 1' | semgrep -l python -e '$X == $X' -
...
Exception: Could not locate 'semgrep-core' binary

Note that we now include an additional binary called spacegrep which will have to be installed similar to semgrep-core. We include those instruction here: https://github.com/returntocorp/semgrep/blob/develop/CONTRIBUTING.md#generic-pattern-matching. cc @mjambon who can help if you run into trouble building this binary.

@blshkv
Copy link

blshkv commented Jan 8, 2021

I have adjusted our ebuilds, SEMGREP_SKIP_BIN works fine. Thanks a lot guys

@mschwager
Copy link
Contributor

@blshkv,

In the 0.37.0 release (#2383) we removed the setup.py dependency on the wheel package unless you're explicitly building the wheels. I noticed your ebuild was including that implicit dependency, and it shouldn't be needed anymore 👍

@blshkv
Copy link

blshkv commented Jan 14, 2021

Thanks, I was too lazy to ask for it ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority:high Issue requires immediate attention user:external requested by someone outside of r2c
Development

Successfully merging a pull request may close this issue.

8 participants