Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[discussion] should we remove .pyc files from all packages? #5278

Closed
lynxis opened this issue Dec 14, 2017 · 15 comments
Closed

[discussion] should we remove .pyc files from all packages? #5278

lynxis opened this issue Dec 14, 2017 · 15 comments

Comments

@lynxis
Copy link
Member

lynxis commented Dec 14, 2017

maintainers: @jefferyto @kissg1988 @commodo @dangowrt
(I've taken a random amount of py maintainers, nobody should feel excluded here)

Because of reproducible builds, I've noticed many (>50) python packages are unreproducible.
Those unreproducible packages are shipping .pyc which are unreproducible [0].

  • A .pyc is a "compiled" python script file.
  • Usually the .pyc got generated when the .py file got interpreted for the first time. (LEDE don't do this at runtime, only when building it via make)
  • The .pyc shortcuts the startup time of python.
  • .pyc can be used without the .py source file.
  • .pyc will be ignored if the .py exists and .py has a different timestamp than the embedded timestamp in the .pyc

When packaging the result python-foo.ipk all timestamp of the files in the archive will be modified to $SOURCE_DATE_EPOCH [1]. So the timestamp of the .py won't match the .pyc.

What are the reasons for .pyc files in LEDE packages?
Should we remove it?
Should we fix the .pyc timestamp?

PS: I might not know all facts of the .pyc files.

[0] https://tests.reproducible-builds.org/lede/lede_ar71xx.html (search for unreproducible)
[1] https://reproducible-builds.org/specs/source-date-epoch/

@commodo
Copy link
Contributor

commodo commented Dec 15, 2017

Hey,

So, the .py vs .pyc thing boils down to performance.
That is why there are now some python-<name>-src packages that package .py

There are some discussions about this:

  1. this is about where it started [with me as maintainer], and .pyc generation was disabled : Python 2.7: No module named urllib #474
  2. Then I got a few reports about performance ; one to my private email saying that [on some slow SoC] a simple HelloWorld.py script runs : .pyc == ~70 msecs, .py == 500 msecs [ with bytecodes disabled ] ; so, then I changed Python to ship bytecodes instead of sources
    2a. There is another thread regarding performance:
    http://openwrt-devel.openwrt.narkive.com/2hc56Q30/openwrt-how-to-include-byte-compile-pyc-files-into-sysupgrade-image

Personal conclusions/opinions [ feel free to argue/disagree ]:

  • you cannot have the best of both worlds with Python on embedded devices [micropython may be another thing] ; it would be nice to have the script files there, but then the slowness of Python interpreting them hits harder when you scale down the CPU performance
  • for some reason, people like Python/Python3 [the bloat-y one] enough to run it on OpenWrt/LEDE
  • I have not had complaints about Python/Python3 shipping only bytecodes ; so it seems [or I can assume] people don't care if we ship .py/.pyc as long as Python/Python3 works

I guess I got carried away with the background :)

In any case, I would choose to fix .pyc timestamps.
I will have to look into this, but if there are any suggestions, I'm open to them.
AFAIK: shipping only Python bytecodes is specific mostly to OpenWrt/LEDE [in all public Python packages I know of], so, it's possible that this is new territory.

But, if we fix this in the python/python3 packages, all reproduce-ability issues should be fixed for all other python/python3 packages.
A lot [but not all] of Python/Python3 packages use build rules exported by the interpreter python/python3 packages, which helps unify some things.

Thanks
Alex

commodo added a commit to commodo/packages that referenced this issue Dec 19, 2017
See:
openwrt#5278

This should make Python & Python3 packages reproducible
when building.
In my local tests, I got the same sha256 for a sample
.pyc file, so likely this is the solution that should address
this.

Signed-off-by: Alexandru Ardelean <ardeleanalex@gmail.com>
@commodo
Copy link
Contributor

commodo commented Dec 27, 2017

@lynxis
ping

[ i know it's the holidays now ; will ping again later :) ]

@lynxis
Copy link
Member Author

lynxis commented Dec 28, 2017

no. chaos communication congress ;)

@neocturne
Copy link
Member

What do other reproducible distros do with .pyc files?

@commodo
Copy link
Contributor

commodo commented Dec 28, 2017 via email

@commodo
Copy link
Contributor

commodo commented Dec 29, 2017

@NeoRaider

So, for a sampling of a few distros, see below:

  1. FreeBSD:
  • https://tests.reproducible-builds.org/freebsd/freebsd.html - does not seem to have Python in the list ; or I cannot find it there
  • but looking through this: pkg.freebsd.org/freebsd:12:x86:64/latest/All/python27-2.7.14_1.txz ; it does look like Python bytecodes are shipped with source ; AFAICT, the timestamp inside the file may not be consistent, but I could be wrong
  1. Fedora, seems to be behind on this : https://tests.reproducible-builds.org/rpms/fedora-23.html
    I saw that 2016 March was the last time it was run, and many python packages were not reproducible; though I did not see any Python interpreter package there.

  2. Debian: https://tests.reproducible-builds.org/debian/reproducible.html
    Seems to be at the forefront of reproducible-ality.

  1. Arch Linux: https://tests.reproducible-builds.org/archlinux/archlinux.html

All in all, I am not sure if other distros have got to care about Python bytecodes, so we could be in the lead with this, and we could push this for consideration to the Python project.

If this goes in I'll re-check the link that @lynxis sent and if all is good I can sent this upstream for consideration.

@commodo
Copy link
Contributor

commodo commented Jan 3, 2018

Hey,

An update ; seems I forgot a bit about any potential discussions within Python [upstream] about reproduce-ability.

Yesterday, I checked and found this [and joined in]: https://bugs.python.org/issue29708

Seems that PEP-552 was created to address this officially.
It's now in Python 3.7.
No idea yet about other Python versions [specifically 2.7].

Will try to keep track of that until it settles.

@commodo
Copy link
Contributor

commodo commented Jan 6, 2018

More stuff for reproduce-ability:
#5360

pprindeville pushed a commit to pprindeville/packages that referenced this issue Jan 6, 2018
See:
openwrt#5278

This should make Python & Python3 packages reproducible
when building.
In my local tests, I got the same sha256 for a sample
.pyc file, so likely this is the solution that should address
this.

Signed-off-by: Alexandru Ardelean <ardeleanalex@gmail.com>
salzmdan pushed a commit to salzmdan/packages that referenced this issue Jan 8, 2018
See:
openwrt#5278

This should make Python & Python3 packages reproducible
when building.
In my local tests, I got the same sha256 for a sample
.pyc file, so likely this is the solution that should address
this.

Signed-off-by: Alexandru Ardelean <ardeleanalex@gmail.com>
jow- pushed a commit to jow-/packages that referenced this issue Jan 15, 2018
See:
openwrt#5278

This should make Python & Python3 packages reproducible
when building.
In my local tests, I got the same sha256 for a sample
.pyc file, so likely this is the solution that should address
this.

Signed-off-by: Alexandru Ardelean <ardeleanalex@gmail.com>
@commodo
Copy link
Contributor

commodo commented Jan 19, 2018

@lynxis
is there a way to re-kick the build here sooner:
https://tests.reproducible-builds.org/lede/lede_ar71xx.html

@commodo
Copy link
Contributor

commodo commented Mar 14, 2018

@lynxis

seems the job has stopped running ; last run was on the ~20th of January ;
https://tests.reproducible-builds.org/lede/lede_ar71xx.html

any thoughts on when it would be back ?

@commodo
Copy link
Contributor

commodo commented Mar 14, 2018

fwiw: i've added a few changes that may help Python3 be reproducible and i'd be curios what the build says

lynxis pushed a commit to lynxis/packages that referenced this issue Jan 3, 2019
See:
openwrt#5278

This should make Python & Python3 packages reproducible
when building.
In my local tests, I got the same sha256 for a sample
.pyc file, so likely this is the solution that should address
this.

Signed-off-by: Alexandru Ardelean <ardeleanalex@gmail.com>
@neheb
Copy link
Contributor

neheb commented Jan 12, 2020

What's the progress on this?

@commodo
Copy link
Contributor

commodo commented Jan 13, 2020

So, there was a build that also included packages feed.
I can't seem to find it.

I found this:
https://tests.reproducible-builds.org/openwrt/openwrt.html

As far as I remember, there was some work that I did to make Py3 more reproducible and I think the results were pretty good.
But I am not sure now how many packages were left to be unreproducible.

It would be great to have a view regarding all packages, where we can see this.
Then we can have a better answer.

@lynxis
Any thoughts?

@jefferyto
Copy link
Member

Current tests on reproducible-builds.org are building base/core packages only. Building all packages was "temporarily disabled" in Oct 2018, re-enabled then disabled again on the same day in Feb 2019. I assume there are problems building all packages but there are no details in the commit messages.

@commodo
Copy link
Contributor

commodo commented May 27, 2023

this seems to have died

closing;

feel free to re-open if i'm wrong

@commodo commodo closed this as completed May 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants