Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove the --key/cipher bytecode encryption. #6999

Merged
merged 1 commit into from Jun 29, 2023

Conversation

bwoodsend
Copy link
Member

@bwoodsend bwoodsend commented Jul 29, 2022

Bytecode encryption, given that the decryption key has to be stored somewhere in the built application for the application to be able to function, was only ever a mild deterrent against prying eyes. It could be cracked by anyone willing to dig around PyInstaller's source code for the exact layout of the executable archive and a quick hexdump to get the key once you know where to look.

These days however, PyInstaller reverse engineering tools like PyExtractor have this all built in. For example, in the steps below, our would be prying user doesn't even need to know that the application they are trying to break open is encrypted, let alone have to do anything clever to decrypt it.

git clone https://github.com/Rdimo/PyExtractor.git
cd PyExtractor
pip install -r requirements.txt
python main.py some/pyinstaller/application

So since the knowledge barrier to reverse engineer an encrypted build is now identical to that of a regular one, and because users are being misled into thinking that an encrypted PyInstaller build is a safe place to put things like API keys, and since adding further code obfuscation will eventually lead to the same outcome, remove the encryption feature entirely.

Users looking for a replacement should look for code obfuscation methods that don't require lossless de-obfuscation at runtime in order for the code to be runable. This means PyArmour or any DIY bytecode encryption scheme should be avoided for the same reasons that this feature is being dropped. Instead, you can use pyminifier's obfuscation feature which mangles variable names or if (and only if) you understand the perils of Linux ABI compatibility, are aware of the macOS deployment target and understand that PyInstaller can't detect imports made by C extensions (i.e. you will need to use --hidden-import/--collect-submodules a lot) then you may consider running Cython on the more confidential Python files in your project.

@Legorooj
Copy link
Member

While I agree with the general direction of this PR, we can't remove this till v6, as it's a breaking change. What we need to instead do, if it's our intention to remove the encryption features in v6, is add a warning in the code when it's used - ideally one that stands out from the other warnings the logger gives - and add warnings all over the docs about it being deprecated due to it being ineffective.

@bwoodsend
Copy link
Member Author

Do you actually find deprecation warnings helpful? I always find that they just prolong the pain rather than lessen it.

@Legorooj
Copy link
Member

@bwoodsend yes, I do find them helpful, because in a case like PyInstaller, most people won't read the docs on a regular basis. Once they know what they're doing they don't check docs, so a deprecation warning is the only way that people really know it's going to stop working.

@rokm
Copy link
Member

rokm commented Jul 30, 2022

I kind of doubt people who are familiar with PyInstaller and its warning verbosity are going to notice those deprecation warnings either... especially for builds that are done on the CI. So in practice, people will likely really notice only when it stops working.

That said, I see no reason for not introducing breaking changes in "point" releases, because we lately have a sort of a rolling-release model anyway, with each next release containing changes accumulated up until that point (which I think is good, because we get feedback sooner). If this is about some unwritten expectations, the next release can be 6.0 instead of 5.3. Or we can switch to firefox/chrome-like versioning.

But... I am not comfortable with this change hitting everyone, i.e., the people who do not use encryption. Which is unfortunate consequence of how the spec was generated up until now. Perhaps we could raise an error for specs that try to use encryption, but downgrade it to just a warning if cipher is passed but is None? I wouldn't mind quietly ignoring the cipher = None case, either...

@bwoodsend
Copy link
Member Author

bwoodsend commented Jul 30, 2022

I'm with Rok in thinking that no one is going to notice another warning. (I personally tend to set --log-level=CRITICAL or redirect 2>/dev/null to keep my terminal clean.) And if they did notice, either they'll treat it like an error and insist on fixing it or they'll pin pyinstaller < 6.0 in their requirements file - both of which routes require exactly the same amount of work as if their build had failed outright rather than emitting a deprecation warning.

And yes, the more I think about it, the less of a problem having a dead cipher=None in a spec file seems to me. I'll change it to silently ignoring None.

@Legorooj
Copy link
Member

Rok is correct, and we are effectively on a rolling release model, so I guess it matters slightly less. However, I still think we should stagger this somewhat. Either go down the deprecation route, or remove the actual encryption functionality and give a warning that passing the arguments will soon cause an error, and that the functionality itself has been removed.

bwoodsend added a commit to bwoodsend/pyinstaller that referenced this pull request Nov 30, 2022
* Add a new log level called DEPRECATION with higher priority than
  WARNING in the hopes that it'll be more visisble.
* Add deprecation log commands to instantiating
  pyi_crypto.PyiBlockCipher() and the --key option. This puts the
  warning near the top when running both with or without a spec.
* Add a deprecation category to the changelog.
bwoodsend added a commit to bwoodsend/pyinstaller that referenced this pull request Nov 30, 2022
* Add a new log level called DEPRECATION with higher priority than
  WARNING in the hopes that it'll be more visisble.
* Add deprecation log commands to instantiating
  pyi_crypto.PyiBlockCipher() and the --key option. This puts the
  warning near the top when running both with or without a spec.
* Add a deprecation category to the changelog.
bwoodsend added a commit to bwoodsend/pyinstaller that referenced this pull request Nov 30, 2022
* Add a new log level called DEPRECATION with higher priority than
  WARNING in the hopes that it'll be more visisble.
* Add deprecation log commands to instantiating
  pyi_crypto.PyiBlockCipher() and the --key option. This puts the
  warning near the top when running both with or without a spec.
* Add a deprecation category to the changelog.
bwoodsend added a commit to bwoodsend/pyinstaller that referenced this pull request Dec 1, 2022
* Add a new log level called DEPRECATION with higher priority than
  WARNING in the hopes that it'll be more visisble.
* Add deprecation log commands to instantiating
  pyi_crypto.PyiBlockCipher() and the --key option. This puts the
  warning near the top when running both with or without a spec.
* Add a deprecation category to the changelog.
bwoodsend added a commit that referenced this pull request Dec 2, 2022
* Add a new log level called DEPRECATION with higher priority than
  WARNING in the hopes that it'll be more visisble.
* Add deprecation log commands to instantiating
  pyi_crypto.PyiBlockCipher() and the --key option. This puts the
  warning near the top when running both with or without a spec.
* Add a deprecation category to the changelog.
@bwoodsend bwoodsend force-pushed the remove-cipher branch 3 times, most recently from c8d7ac3 to 4e0c031 Compare December 4, 2022 01:28
@bwoodsend bwoodsend marked this pull request as ready for review December 4, 2022 14:01
@bwoodsend
Copy link
Member Author

That's me done everything I want to do with this PR. I don't want to do dual branching like we did with v4 so I'll just not merge this until we make a v6 release.

@marcelomanzo
Copy link

One thing that would be awesome, is that the encryption is replaced by the pyminifier

So the pyinstaller would obfuscate the code as part of the build, config file driven, of course...

@mastercoms
Copy link

I was under the impression that --key helped with AV false positives, but I could have been misguided in this belief.

@bwoodsend
Copy link
Member Author

I see that pyminifier is basically abandoned and no longer installs on modern versions of Python due to its never truly being ported to Python 3. That just leaves Cython - I really don't want to have to recommend that people use cython.

I was under the impression that --key helped with AV false positives, but I could have been misguided in this belief.

The bootloader (the part that's common to all PyInstaller applications) is the AV magnet. That part is never encrypted because otherwise we'd need a second unencrypted bootloader to do the decryption and then we'd go full circle.

@mastercoms
Copy link

mastercoms commented Feb 3, 2023

I already had a privately modified pyinstaller bootloader, which helped with false positives. But then one day, we were getting them again and adding --key fixed it for some reason.

@rokm
Copy link
Member

rokm commented Feb 3, 2023

I suppose it could help, if AV's pattern matcher fixated on patterns belonging to .pyc files for collected stdlib modules (assuming zlib output is deterministic, which I think it is...).

For onefile builds, though, I imagine that commonly-collected binaries, such as the python shared library, represent much more attractive target for matching (even if compressed, again assuming that compression is deterministic).

@mastercoms
Copy link

I did use onefile and UPX

@pzloty
Copy link

pzloty commented Feb 17, 2023

I did a variant of PyInstaller (~v4) that loads the key from the environment variable, so it's not included in the build.
The main reason was encryption in rest.
Example use-cases:
You provide a user executable and unique key. If the executable or key is leaked, you can easily map it back to the user.
The executable is widely available, but the key is provided in a secure environment only to selected users.

Would this change (loading key from env. var) justify to keep the bytecode encryption feature in PyInstaller?

@bwoodsend
Copy link
Member Author

This was suggested before in #5317. I still think that just popping the executable into an encrypted archive is nicer for both the user (they only need the key to unpack it then they can forget about it) and for PyInstaller since PyInstaller doesn't have to do anything to support it.

zip -r -e your-app.zip your-app/

@bwoodsend
Copy link
Member Author

Another issue with the env var approach: macOS has no concept of custom environment variables that desktop apps can see. You'd have to tell your mac users to put this variable in their bashrc and launch your application from the terminal.

@neuhaus
Copy link

neuhaus commented Feb 24, 2023

Thanks for the depreciation notice pointing to this PR! 👏🏼

@da-woods
Copy link

then you may consider running Cython on the more confidential Python files in your project.

That just leaves Cython - I really don't want to have to recommend that people use cython

Very quick comment semi-related to this PR. As a Cython developer, obfuscation isn't a use-case we really care about supporting. It's mostly just a natural side effect. But it has its weaknesses (a lot of plain text strings end up built into the binary for example) and we really have no interest in trying to make that more robust.

@bwoodsend
Copy link
Member Author

That sounds like the polite way of saying we don't want to receive piles of support/feature requests from disgruntled PyInstaller users wanting to use Cython this way? I suppose that that would be fair enough.

@da-woods
Copy link

That sounds like the polite way of saying we don't want to receive piles of support/feature requests from disgruntled PyInstaller users wanting to use Cython this way? I suppose that that would be fair enough.

A little ;).

It's also a hard problem to get right, especially for a non-expert (as you people probably know, given that you're in the process of removing some encryption that probably didn't add a whole lot of security). I doubt that Cython-compiled code is that hard to reverse engineer (it's a lot of calls to PyObject_GetAttr with easily identifiably strings). So personally I'd rather just say "we don't really know about this"

@gaamaaresosa
Copy link

gaamaaresosa commented May 21, 2023

Don't you guys find some better logic to hide the --Key some where on the binary garbage strings pool and retain this crypto which helps at least some extent from intermediate crackers ?
When we always speak about everything is crackable or reversable then why still crypto/cyber security/obfuscation/Steganography are surviving.
Miilions of researcher still happening on this to make it tuffer every day.

Note: You can still give a warning as the --Key is not 100% safer and secured.
So that who ever wanted some security can use it and others don't set --key

But we must maintain some secrets for the sake of goodness even on the opensource.

@bwoodsend
Copy link
Member Author

Don't you guys find some better logic to hide the --Key some where on the binary garbage strings pool and retain this crypto which helps at least some extent from intermediate crackers ?

Even if we made it super complex, the code PyInstaller uses to store the key, encrypt the code, retrieve the key and decrypt the code is all open source. It would only raise the bar up to someone who can follow a bit of PyInstaller's source code -- at least until someone writes a script to do it for them, in which case the bar drops back down to any old noob who knows how to run a Python script.

@gmfull2014
Copy link

Wouldn't there be a possibility that the password does not travel with the program and that it must be entered by whoever uses the program?

@bwoodsend
Copy link
Member Author

Wouldn't there be a possibility that the password does not travel with the program and that it must be entered by whoever uses the program?

See canned response #42.

PyInstaller/building/api.py Outdated Show resolved Hide resolved
Bytecode encryption, given that the decryption key has to be stored somewhere in
the built application for the application to be able to function, was only ever
a mild deterrent against prying eyes. It could be cracked by anyone willing to
dig around PyInstaller's source code for the exact layout of the executable
archive and a quick hexdump to get the key once you know where to look.

These days however, PyInstaller reverse engineering tools like PyExtractor have
this all built in. For example, in the steps below, our would be prying user
doesn't even need to know that the application they are trying to break open is
encrypted, let alone have to do anything clever to decrypt it.

    git clone https://github.com/Rdimo/PyExtractor.git
    cd PyExtractor
    pip install -r requirements.txt
    python main.py some/pyinstaller/application

So since the knowledge barrier to reverse engineer an encrypted build is now
identical to that of a regular one, and because users are being misled into
thinking that an encrypted PyInstaller build is a safe place to put things like
API keys, and since adding further code obfuscation will eventually lead to the
same outcome, remove the encryption feature entirely.

Users looking for a replacement should look for code obfuscation methods that
don't require lossless de-obfuscation at runtime in order for the code to be
runable. This means PyArmour or any DIY bytecode encryption scheme should be
avoided for the same reasons that this feature is being dropped. Instead, you
can use pyminifier's obfuscation feature which mangles variable names or if (and
only if) you understand the perils of Linux ABI compatibility, are aware of the
macOS deployment target and understand that PyInstaller can't detect imports
made by C extensions (i.e. you will need to use
--hidden-import/--collect-submodules a lot) then you may consider running Cython
on the more confidential Python files in your project.
@bwoodsend bwoodsend merged commit f5adf29 into pyinstaller:develop Jun 29, 2023
18 checks passed
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 29, 2023
@bwoodsend bwoodsend deleted the remove-cipher branch May 7, 2024 20:31
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants