New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove the --key/cipher bytecode encryption. #6999
Conversation
While I agree with the general direction of this PR, we can't remove this till v6, as it's a breaking change. What we need to instead do, if it's our intention to remove the encryption features in v6, is add a warning in the code when it's used - ideally one that stands out from the other warnings the logger gives - and add warnings all over the docs about it being deprecated due to it being ineffective. |
Do you actually find deprecation warnings helpful? I always find that they just prolong the pain rather than lessen it. |
@bwoodsend yes, I do find them helpful, because in a case like PyInstaller, most people won't read the docs on a regular basis. Once they know what they're doing they don't check docs, so a deprecation warning is the only way that people really know it's going to stop working. |
I kind of doubt people who are familiar with PyInstaller and its warning verbosity are going to notice those deprecation warnings either... especially for builds that are done on the CI. So in practice, people will likely really notice only when it stops working. That said, I see no reason for not introducing breaking changes in "point" releases, because we lately have a sort of a rolling-release model anyway, with each next release containing changes accumulated up until that point (which I think is good, because we get feedback sooner). If this is about some unwritten expectations, the next release can be 6.0 instead of 5.3. Or we can switch to firefox/chrome-like versioning. But... I am not comfortable with this change hitting everyone, i.e., the people who do not use encryption. Which is unfortunate consequence of how the spec was generated up until now. Perhaps we could raise an error for specs that try to use encryption, but downgrade it to just a warning if |
I'm with Rok in thinking that no one is going to notice another warning. (I personally tend to set And yes, the more I think about it, the less of a problem having a dead |
Rok is correct, and we are effectively on a rolling release model, so I guess it matters slightly less. However, I still think we should stagger this somewhat. Either go down the deprecation route, or remove the actual encryption functionality and give a warning that passing the arguments will soon cause an error, and that the functionality itself has been removed. |
* Add a new log level called DEPRECATION with higher priority than WARNING in the hopes that it'll be more visisble. * Add deprecation log commands to instantiating pyi_crypto.PyiBlockCipher() and the --key option. This puts the warning near the top when running both with or without a spec. * Add a deprecation category to the changelog.
* Add a new log level called DEPRECATION with higher priority than WARNING in the hopes that it'll be more visisble. * Add deprecation log commands to instantiating pyi_crypto.PyiBlockCipher() and the --key option. This puts the warning near the top when running both with or without a spec. * Add a deprecation category to the changelog.
* Add a new log level called DEPRECATION with higher priority than WARNING in the hopes that it'll be more visisble. * Add deprecation log commands to instantiating pyi_crypto.PyiBlockCipher() and the --key option. This puts the warning near the top when running both with or without a spec. * Add a deprecation category to the changelog.
* Add a new log level called DEPRECATION with higher priority than WARNING in the hopes that it'll be more visisble. * Add deprecation log commands to instantiating pyi_crypto.PyiBlockCipher() and the --key option. This puts the warning near the top when running both with or without a spec. * Add a deprecation category to the changelog.
* Add a new log level called DEPRECATION with higher priority than WARNING in the hopes that it'll be more visisble. * Add deprecation log commands to instantiating pyi_crypto.PyiBlockCipher() and the --key option. This puts the warning near the top when running both with or without a spec. * Add a deprecation category to the changelog.
c8d7ac3
to
4e0c031
Compare
That's me done everything I want to do with this PR. I don't want to do dual branching like we did with v4 so I'll just not merge this until we make a v6 release. |
One thing that would be awesome, is that the encryption is replaced by the pyminifier So the pyinstaller would obfuscate the code as part of the build, config file driven, of course... |
I was under the impression that --key helped with AV false positives, but I could have been misguided in this belief. |
I see that pyminifier is basically abandoned and no longer installs on modern versions of Python due to its never truly being ported to Python 3. That just leaves Cython - I really don't want to have to recommend that people use cython.
The bootloader (the part that's common to all PyInstaller applications) is the AV magnet. That part is never encrypted because otherwise we'd need a second unencrypted bootloader to do the decryption and then we'd go full circle. |
I already had a privately modified pyinstaller bootloader, which helped with false positives. But then one day, we were getting them again and adding |
I suppose it could help, if AV's pattern matcher fixated on patterns belonging to .pyc files for collected stdlib modules (assuming zlib output is deterministic, which I think it is...). For onefile builds, though, I imagine that commonly-collected binaries, such as the python shared library, represent much more attractive target for matching (even if compressed, again assuming that compression is deterministic). |
I did use onefile and UPX |
I did a variant of PyInstaller (~v4) that loads the key from the environment variable, so it's not included in the build. Would this change (loading key from env. var) justify to keep the bytecode encryption feature in PyInstaller? |
This was suggested before in #5317. I still think that just popping the executable into an encrypted archive is nicer for both the user (they only need the key to unpack it then they can forget about it) and for PyInstaller since PyInstaller doesn't have to do anything to support it.
|
Another issue with the env var approach: macOS has no concept of custom environment variables that desktop apps can see. You'd have to tell your mac users to put this variable in their bashrc and launch your application from the terminal. |
Thanks for the depreciation notice pointing to this PR! 👏🏼 |
Very quick comment semi-related to this PR. As a Cython developer, obfuscation isn't a use-case we really care about supporting. It's mostly just a natural side effect. But it has its weaknesses (a lot of plain text strings end up built into the binary for example) and we really have no interest in trying to make that more robust. |
That sounds like the polite way of saying we don't want to receive piles of support/feature requests from disgruntled PyInstaller users wanting to use Cython this way? I suppose that that would be fair enough. |
A little ;). It's also a hard problem to get right, especially for a non-expert (as you people probably know, given that you're in the process of removing some encryption that probably didn't add a whole lot of security). I doubt that Cython-compiled code is that hard to reverse engineer (it's a lot of calls to PyObject_GetAttr with easily identifiably strings). So personally I'd rather just say "we don't really know about this" |
Don't you guys find some better logic to hide the --Key some where on the binary garbage strings pool and retain this crypto which helps at least some extent from intermediate crackers ? Note: You can still give a warning as the --Key is not 100% safer and secured. But we must maintain some secrets for the sake of goodness even on the opensource. |
Even if we made it super complex, the code PyInstaller uses to store the key, encrypt the code, retrieve the key and decrypt the code is all open source. It would only raise the bar up to someone who can follow a bit of PyInstaller's source code -- at least until someone writes a script to do it for them, in which case the bar drops back down to any old noob who knows how to run a Python script. |
Wouldn't there be a possibility that the password does not travel with the program and that it must be entered by whoever uses the program? |
See canned response #42. |
Bytecode encryption, given that the decryption key has to be stored somewhere in the built application for the application to be able to function, was only ever a mild deterrent against prying eyes. It could be cracked by anyone willing to dig around PyInstaller's source code for the exact layout of the executable archive and a quick hexdump to get the key once you know where to look. These days however, PyInstaller reverse engineering tools like PyExtractor have this all built in. For example, in the steps below, our would be prying user doesn't even need to know that the application they are trying to break open is encrypted, let alone have to do anything clever to decrypt it. git clone https://github.com/Rdimo/PyExtractor.git cd PyExtractor pip install -r requirements.txt python main.py some/pyinstaller/application So since the knowledge barrier to reverse engineer an encrypted build is now identical to that of a regular one, and because users are being misled into thinking that an encrypted PyInstaller build is a safe place to put things like API keys, and since adding further code obfuscation will eventually lead to the same outcome, remove the encryption feature entirely. Users looking for a replacement should look for code obfuscation methods that don't require lossless de-obfuscation at runtime in order for the code to be runable. This means PyArmour or any DIY bytecode encryption scheme should be avoided for the same reasons that this feature is being dropped. Instead, you can use pyminifier's obfuscation feature which mangles variable names or if (and only if) you understand the perils of Linux ABI compatibility, are aware of the macOS deployment target and understand that PyInstaller can't detect imports made by C extensions (i.e. you will need to use --hidden-import/--collect-submodules a lot) then you may consider running Cython on the more confidential Python files in your project.
Bytecode encryption, given that the decryption key has to be stored somewhere in the built application for the application to be able to function, was only ever a mild deterrent against prying eyes. It could be cracked by anyone willing to dig around PyInstaller's source code for the exact layout of the executable archive and a quick hexdump to get the key once you know where to look.
These days however, PyInstaller reverse engineering tools like PyExtractor have this all built in. For example, in the steps below, our would be prying user doesn't even need to know that the application they are trying to break open is encrypted, let alone have to do anything clever to decrypt it.
So since the knowledge barrier to reverse engineer an encrypted build is now identical to that of a regular one, and because users are being misled into thinking that an encrypted PyInstaller build is a safe place to put things like API keys, and since adding further code obfuscation will eventually lead to the same outcome, remove the encryption feature entirely.
Users looking for a replacement should look for code obfuscation methods that don't require lossless de-obfuscation at runtime in order for the code to be runable. This means PyArmour or any DIY bytecode encryption scheme should be avoided for the same reasons that this feature is being dropped. Instead, you can use pyminifier's obfuscation feature which mangles variable names or if (and only if) you understand the perils of Linux ABI compatibility, are aware of the macOS deployment target and understand that PyInstaller can't detect imports made by C extensions (i.e. you will need to use
--hidden-import
/--collect-submodules
a lot) then you may consider running Cython on the more confidential Python files in your project.