New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[discussion] should we remove .pyc files from all packages? #5278

Open
lynxis opened this Issue Dec 14, 2017 · 11 comments

Comments

Projects
None yet
3 participants
@lynxis
Contributor

lynxis commented Dec 14, 2017

maintainers: @jefferyto @kissg1988 @commodo @dangowrt
(I've taken a random amount of py maintainers, nobody should feel excluded here)

Because of reproducible builds, I've noticed many (>50) python packages are unreproducible.
Those unreproducible packages are shipping .pyc which are unreproducible [0].

  • A .pyc is a "compiled" python script file.
  • Usually the .pyc got generated when the .py file got interpreted for the first time. (LEDE don't do this at runtime, only when building it via make)
  • The .pyc shortcuts the startup time of python.
  • .pyc can be used without the .py source file.
  • .pyc will be ignored if the .py exists and .py has a different timestamp than the embedded timestamp in the .pyc

When packaging the result python-foo.ipk all timestamp of the files in the archive will be modified to $SOURCE_DATE_EPOCH [1]. So the timestamp of the .py won't match the .pyc.

What are the reasons for .pyc files in LEDE packages?
Should we remove it?
Should we fix the .pyc timestamp?

PS: I might not know all facts of the .pyc files.

[0] https://tests.reproducible-builds.org/lede/lede_ar71xx.html (search for unreproducible)
[1] https://reproducible-builds.org/specs/source-date-epoch/

@commodo

This comment has been minimized.

Show comment
Hide comment
@commodo

commodo Dec 15, 2017

Contributor

Hey,

So, the .py vs .pyc thing boils down to performance.
That is why there are now some python-<name>-src packages that package .py

There are some discussions about this:

  1. this is about where it started [with me as maintainer], and .pyc generation was disabled : #474
  2. Then I got a few reports about performance ; one to my private email saying that [on some slow SoC] a simple HelloWorld.py script runs : .pyc == ~70 msecs, .py == 500 msecs [ with bytecodes disabled ] ; so, then I changed Python to ship bytecodes instead of sources
    2a. There is another thread regarding performance:
    http://openwrt-devel.openwrt.narkive.com/2hc56Q30/openwrt-how-to-include-byte-compile-pyc-files-into-sysupgrade-image

Personal conclusions/opinions [ feel free to argue/disagree ]:

  • you cannot have the best of both worlds with Python on embedded devices [micropython may be another thing] ; it would be nice to have the script files there, but then the slowness of Python interpreting them hits harder when you scale down the CPU performance
  • for some reason, people like Python/Python3 [the bloat-y one] enough to run it on OpenWrt/LEDE
  • I have not had complaints about Python/Python3 shipping only bytecodes ; so it seems [or I can assume] people don't care if we ship .py/.pyc as long as Python/Python3 works

I guess I got carried away with the background :)

In any case, I would choose to fix .pyc timestamps.
I will have to look into this, but if there are any suggestions, I'm open to them.
AFAIK: shipping only Python bytecodes is specific mostly to OpenWrt/LEDE [in all public Python packages I know of], so, it's possible that this is new territory.

But, if we fix this in the python/python3 packages, all reproduce-ability issues should be fixed for all other python/python3 packages.
A lot [but not all] of Python/Python3 packages use build rules exported by the interpreter python/python3 packages, which helps unify some things.

Thanks
Alex

Contributor

commodo commented Dec 15, 2017

Hey,

So, the .py vs .pyc thing boils down to performance.
That is why there are now some python-<name>-src packages that package .py

There are some discussions about this:

  1. this is about where it started [with me as maintainer], and .pyc generation was disabled : #474
  2. Then I got a few reports about performance ; one to my private email saying that [on some slow SoC] a simple HelloWorld.py script runs : .pyc == ~70 msecs, .py == 500 msecs [ with bytecodes disabled ] ; so, then I changed Python to ship bytecodes instead of sources
    2a. There is another thread regarding performance:
    http://openwrt-devel.openwrt.narkive.com/2hc56Q30/openwrt-how-to-include-byte-compile-pyc-files-into-sysupgrade-image

Personal conclusions/opinions [ feel free to argue/disagree ]:

  • you cannot have the best of both worlds with Python on embedded devices [micropython may be another thing] ; it would be nice to have the script files there, but then the slowness of Python interpreting them hits harder when you scale down the CPU performance
  • for some reason, people like Python/Python3 [the bloat-y one] enough to run it on OpenWrt/LEDE
  • I have not had complaints about Python/Python3 shipping only bytecodes ; so it seems [or I can assume] people don't care if we ship .py/.pyc as long as Python/Python3 works

I guess I got carried away with the background :)

In any case, I would choose to fix .pyc timestamps.
I will have to look into this, but if there are any suggestions, I'm open to them.
AFAIK: shipping only Python bytecodes is specific mostly to OpenWrt/LEDE [in all public Python packages I know of], so, it's possible that this is new territory.

But, if we fix this in the python/python3 packages, all reproduce-ability issues should be fixed for all other python/python3 packages.
A lot [but not all] of Python/Python3 packages use build rules exported by the interpreter python/python3 packages, which helps unify some things.

Thanks
Alex

commodo added a commit to commodo/packages that referenced this issue Dec 19, 2017

python,python3: add support for SOURCE_DATE_EPOCH var
See:
openwrt#5278

This should make Python & Python3 packages reproducible
when building.
In my local tests, I got the same sha256 for a sample
.pyc file, so likely this is the solution that should address
this.

Signed-off-by: Alexandru Ardelean <ardeleanalex@gmail.com>
@commodo

This comment has been minimized.

Show comment
Hide comment
@commodo

commodo Dec 27, 2017

Contributor

@lynxis
ping

[ i know it's the holidays now ; will ping again later :) ]

Contributor

commodo commented Dec 27, 2017

@lynxis
ping

[ i know it's the holidays now ; will ping again later :) ]

@lynxis

This comment has been minimized.

Show comment
Hide comment
@lynxis

lynxis Dec 28, 2017

Contributor

no. chaos communication congress ;)

Contributor

lynxis commented Dec 28, 2017

no. chaos communication congress ;)

@NeoRaider

This comment has been minimized.

Show comment
Hide comment
@NeoRaider

NeoRaider Dec 28, 2017

Contributor

What do other reproducible distros do with .pyc files?

Contributor

NeoRaider commented Dec 28, 2017

What do other reproducible distros do with .pyc files?

@commodo

This comment has been minimized.

Show comment
Hide comment
@commodo

commodo Dec 28, 2017

Contributor
Contributor

commodo commented Dec 28, 2017

@commodo

This comment has been minimized.

Show comment
Hide comment
@commodo

commodo Dec 29, 2017

Contributor

@NeoRaider

So, for a sampling of a few distros, see below:

  1. FreeBSD:
  • https://tests.reproducible-builds.org/freebsd/freebsd.html - does not seem to have Python in the list ; or I cannot find it there
  • but looking through this: pkg.freebsd.org/freebsd:12:x86:64/latest/All/python27-2.7.14_1.txz ; it does look like Python bytecodes are shipped with source ; AFAICT, the timestamp inside the file may not be consistent, but I could be wrong
  1. Fedora, seems to be behind on this : https://tests.reproducible-builds.org/rpms/fedora-23.html
    I saw that 2016 March was the last time it was run, and many python packages were not reproducible; though I did not see any Python interpreter package there.

  2. Debian: https://tests.reproducible-builds.org/debian/reproducible.html
    Seems to be at the forefront of reproducible-ality.

  1. Arch Linux: https://tests.reproducible-builds.org/archlinux/archlinux.html

All in all, I am not sure if other distros have got to care about Python bytecodes, so we could be in the lead with this, and we could push this for consideration to the Python project.

If this goes in I'll re-check the link that @lynxis sent and if all is good I can sent this upstream for consideration.

Contributor

commodo commented Dec 29, 2017

@NeoRaider

So, for a sampling of a few distros, see below:

  1. FreeBSD:
  • https://tests.reproducible-builds.org/freebsd/freebsd.html - does not seem to have Python in the list ; or I cannot find it there
  • but looking through this: pkg.freebsd.org/freebsd:12:x86:64/latest/All/python27-2.7.14_1.txz ; it does look like Python bytecodes are shipped with source ; AFAICT, the timestamp inside the file may not be consistent, but I could be wrong
  1. Fedora, seems to be behind on this : https://tests.reproducible-builds.org/rpms/fedora-23.html
    I saw that 2016 March was the last time it was run, and many python packages were not reproducible; though I did not see any Python interpreter package there.

  2. Debian: https://tests.reproducible-builds.org/debian/reproducible.html
    Seems to be at the forefront of reproducible-ality.

  1. Arch Linux: https://tests.reproducible-builds.org/archlinux/archlinux.html

All in all, I am not sure if other distros have got to care about Python bytecodes, so we could be in the lead with this, and we could push this for consideration to the Python project.

If this goes in I'll re-check the link that @lynxis sent and if all is good I can sent this upstream for consideration.

@commodo

This comment has been minimized.

Show comment
Hide comment
@commodo

commodo Jan 3, 2018

Contributor

Hey,

An update ; seems I forgot a bit about any potential discussions within Python [upstream] about reproduce-ability.

Yesterday, I checked and found this [and joined in]: https://bugs.python.org/issue29708

Seems that PEP-552 was created to address this officially.
It's now in Python 3.7.
No idea yet about other Python versions [specifically 2.7].

Will try to keep track of that until it settles.

Contributor

commodo commented Jan 3, 2018

Hey,

An update ; seems I forgot a bit about any potential discussions within Python [upstream] about reproduce-ability.

Yesterday, I checked and found this [and joined in]: https://bugs.python.org/issue29708

Seems that PEP-552 was created to address this officially.
It's now in Python 3.7.
No idea yet about other Python versions [specifically 2.7].

Will try to keep track of that until it settles.

@commodo

This comment has been minimized.

Show comment
Hide comment
@commodo

commodo Jan 6, 2018

Contributor

More stuff for reproduce-ability:
#5360

Contributor

commodo commented Jan 6, 2018

More stuff for reproduce-ability:
#5360

pprindeville added a commit to pprindeville/packages that referenced this issue Jan 6, 2018

python,python3: add support for SOURCE_DATE_EPOCH var
See:
openwrt#5278

This should make Python & Python3 packages reproducible
when building.
In my local tests, I got the same sha256 for a sample
.pyc file, so likely this is the solution that should address
this.

Signed-off-by: Alexandru Ardelean <ardeleanalex@gmail.com>

salzmdan added a commit to salzmdan/packages that referenced this issue Jan 8, 2018

python,python3: add support for SOURCE_DATE_EPOCH var
See:
openwrt#5278

This should make Python & Python3 packages reproducible
when building.
In my local tests, I got the same sha256 for a sample
.pyc file, so likely this is the solution that should address
this.

Signed-off-by: Alexandru Ardelean <ardeleanalex@gmail.com>

jow- added a commit to jow-/packages that referenced this issue Jan 15, 2018

python,python3: add support for SOURCE_DATE_EPOCH var
See:
openwrt#5278

This should make Python & Python3 packages reproducible
when building.
In my local tests, I got the same sha256 for a sample
.pyc file, so likely this is the solution that should address
this.

Signed-off-by: Alexandru Ardelean <ardeleanalex@gmail.com>
@commodo

This comment has been minimized.

Show comment
Hide comment
@commodo

commodo Jan 19, 2018

Contributor

@lynxis
is there a way to re-kick the build here sooner:
https://tests.reproducible-builds.org/lede/lede_ar71xx.html

Contributor

commodo commented Jan 19, 2018

@lynxis
is there a way to re-kick the build here sooner:
https://tests.reproducible-builds.org/lede/lede_ar71xx.html

@commodo

This comment has been minimized.

Show comment
Hide comment
@commodo

commodo Mar 14, 2018

Contributor

@lynxis

seems the job has stopped running ; last run was on the ~20th of January ;
https://tests.reproducible-builds.org/lede/lede_ar71xx.html

any thoughts on when it would be back ?

Contributor

commodo commented Mar 14, 2018

@lynxis

seems the job has stopped running ; last run was on the ~20th of January ;
https://tests.reproducible-builds.org/lede/lede_ar71xx.html

any thoughts on when it would be back ?

@commodo

This comment has been minimized.

Show comment
Hide comment
@commodo

commodo Mar 14, 2018

Contributor

fwiw: i've added a few changes that may help Python3 be reproducible and i'd be curios what the build says

Contributor

commodo commented Mar 14, 2018

fwiw: i've added a few changes that may help Python3 be reproducible and i'd be curios what the build says

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment