New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fake timestamp in pyc-header #3007

Merged
merged 4 commits into from Mar 20, 2018

Conversation

Projects
2 participants
@bauerj
Contributor

bauerj commented Nov 17, 2017

Directly bundling pyc-files makes the build output dependent on the modification time of pyinstaller source files. This change removes that relation and re-enables deterministic builds.

I didn't find a better place to add this than here. We need low-level access to the file to change the header so it can't be done after compressing it.

PEP552 actually made this a bit harder as it changed the pyc-header format. Since we don't know the magic number for the outstanding release that adds PEP552 support, I checked for the presence of the bit-field instead:

The pyc header currently consists of 3 32-bit words. We will expand it to 4. The first word will continue to be the magic number, versioning the bytecode and pyc format. The second word, conceptually the new word, will be a bit field. The interpretation of the rest of the header and invalidation behavior of the pyc depends on the contents of the bit field.

Basically, there are three possible pyc file headers:

magic | timestamp                        | size               (old style)
magic | 00000000000000000000000000000000 | timestamp | size   (new style)
magic | 00000000000000000000000000000001 | hash      | size   (new style)

Since the bit-field is either 0 or 1 we can just check if that's the content of the second word to find out if this a new-style header. Of course this misses an edge-case where the pyinstaller files were last changed on 01/01/1970 around midnight but I guess we can ignore that.


Closes: #3000

while 1:
buf = fh.read(16*1024)
if first:

This comment has been minimized.

@bauerj

bauerj Nov 17, 2017

Contributor

This is a bit ugly. But I wanted to keep it simple.

This comment has been minimized.

@htgoebel

htgoebel Nov 17, 2017

Member

It's okay.

But I suggest to move this behind the if not buf clause. Otherwise it would fail if the fail is empty. (This should not happen, but we can avoid a crash here if it would happen :-)

@htgoebel

Thanks. Please see my inline comments.

@@ -393,8 +393,12 @@ def add(self, entry):
self.lib.write(comprobj.compress(code_data))

This comment has been minimized.

@htgoebel

htgoebel Nov 17, 2017

Member

Don't we need to fake the timestamp here, too? Not not, please add a comment why not.

This comment has been minimized.

@bauerj

bauerj Nov 17, 2017

Contributor

I'm not sure. What does flag == 1 mean semantically?

This comment has been minimized.

@htgoebel

htgoebel Nov 17, 2017

Member

flag == 1 means the content needs to be compressed. But the more important thing would be that we have code here.

But I'm wrong: code_data is the code-object only (see line 376), which does not include a time-stamp.

while 1:
buf = fh.read(16*1024)
if first:

This comment has been minimized.

@htgoebel

htgoebel Nov 17, 2017

Member

It's okay.

But I suggest to move this behind the if not buf clause. Otherwise it would fail if the fail is empty. (This should not happen, but we can avoid a crash here if it would happen :-)

# timestamp is further ahead
start, end = 8, 12
ts = b'pyi0' # So people know where this comes from

This comment has been minimized.

@htgoebel
"""
current_ts = buf[4:8]
start, end = 4, 8
if current_ts == b'\x00\x00\x00\x01':

This comment has been minimized.

@htgoebel

htgoebel Nov 17, 2017

Member

Fine you took care about the upcoming format. But I'm afraid, the detection mechanism is not reliable. Some other tool may have manipulated the timestamp and then this code will destroy the bytecode.

The normal behavior would be to compare the magic which determines the file-format. Since PEP 552 is not yet implemented, we don't know the magic to test for. Anyway, since the new format may only occur with Python >= 3.7, it should be okay to run code-branch if compat.py_37. (Since the code has not yet landed, I suggest using if False and compat.py_37: # TODO py_27, to make the nightly builds still pass. And we don't care about alpha-versions later on.)

Also this 32-bit word is defined to be a bit field. For now PEP 552 only defined the lowest two bits (hash_based and check_source), so you need to compere the bit values. Furter PEPs may define other bits, and the check will fail.

So the code should be something like

if False and compat.py_37: # TODO py_37 Python 3.7
   (flags,) = struct.unpack_from(">I", buf, 4)
  If flags & 1:
    # hash-based, clear "check_source" flag, since there is no source
    buf[4:8] = strict.pack(">I", flags ^ 2)
  else:
    # timestamp-base, clear timestamp
    …
else:
  # old format
 ...
@bauerj

This comment has been minimized.

Contributor

bauerj commented Nov 17, 2017

What about the failing tests? Are those related?

@htgoebel

This comment has been minimized.

Member

htgoebel commented Nov 19, 2017

I checked on test which failed with

LOADER: Python library: /var/folders/my/m6ynh3bn6tq06h7xr3js0z7r0000gn/T/_MEIpt1T7x/Python
Error loading Python lib '/var/folders/my/m6ynh3bn6tq06h7xr3js0z7r0000gn/T/_MEIpt1T7x/Python': dlopen: dlopen(/var/folders/my/m6ynh3bn6tq06h7xr3js0z7r0000gn/T/_MEIpt1T7x/Python, 10): no suitable image found.  Did find:
	/var/folders/my/m6ynh3bn6tq06h7xr3js0z7r0000gn/T/_MEIpt1T7x/Python: mach-o, but wrong architecture

This looks like a different problem, so I restarted the tests. We'll see.

@bauerj bauerj force-pushed the bauerj:pyc-file-timestamps branch 2 times, most recently from 9902f79 to 885e26e Nov 19, 2017

@bauerj

This comment has been minimized.

Contributor

bauerj commented Nov 19, 2017

Thanks, I updated my code.

@htgoebel

This comment has been minimized.

Member

htgoebel commented Nov 20, 2017

Did you try running the test-suite on your machine? If not, please do so. Instructions are in test/README.

The errors look like this code is corrupting binary files. I missed the point that at this place, not only Python bytecode is coming along. See https://github.com/pyinstaller/pyinstaller/blob/develop/PyInstaller/building/api.py#L230for a list of possible item types. I assume you need to check whether this actually is Python code ("M", "m" and "s"). Please also check the loop around https://github.com/pyinstaller/pyinstaller/blob/develop/PyInstaller/building/api.py#L230 to ensure all the new code covers all used cases.

As an safety-belt, I suggest to check for compat.BYTECODE_MAGIC in the new fake_pyc_timestamp() function. I'd use (scratch!) assert buf[:4] == compat.BYTECODE_MAGIC to catch programming errors (like the one we – as I assume – currently have) quickly.

Thanks.

@bauerj bauerj force-pushed the bauerj:pyc-file-timestamps branch 2 times, most recently from 488b5ac to bbc0a4a Nov 23, 2017

@bauerj

This comment has been minimized.

Contributor

bauerj commented Nov 23, 2017

Alright, I fixed the error and now the test suite seems to run fine.

@htgoebel

Thanks, this looks good. One minor change left.

while 1:
buf = fh.read(16*1024)
if not buf:
break
if first and typcd in ('M', 'm', 's'):
first = False

This comment has been minimized.

@htgoebel

htgoebel Nov 23, 2017

Member

If typcd is not in this tuple, first will never be set to False and the test will be performed for every block. Works, but is slower.

I suggest using

# … some comment explaining what this *means*
first = typcd in ('M', 'm', 's')

in line 397 above.

@bauerj bauerj force-pushed the bauerj:pyc-file-timestamps branch from bbc0a4a to 62f3bfe Nov 23, 2017

@bauerj

This comment has been minimized.

Contributor

bauerj commented Nov 23, 2017

Indeed, I changed that.

@ghost

This comment has been minimized.

ghost commented Nov 23, 2017

Looks like OSX had a problem. You could try disabling this code for that platform.

@bauerj

This comment has been minimized.

Contributor

bauerj commented Nov 23, 2017

Hmm, I don't know. The test ran just fine in 2734. Maybe it is just flaky?

@ghost

This comment has been minimized.

ghost commented Nov 23, 2017

Well, there's no output so flakiness is a distinct possibility.

Fake timestamp in pyc-header
Directly bundling pyc-files makes the build output dependent
on the modification time of pyinstaller source files. This
change removes that relation and re-enables deterministic
builds.

@bauerj bauerj force-pushed the bauerj:pyc-file-timestamps branch from 62f3bfe to 06ce7f3 Nov 24, 2017

@bauerj

This comment has been minimized.

Contributor

bauerj commented Nov 25, 2017

Restarted Travis builds and it worked.

@htgoebel htgoebel added this to the PyInstaller 3.4 milestone Dec 7, 2017

@htgoebel htgoebel added the PY3.7 label Mar 1, 2018

@htgoebel htgoebel added this to Todo in Support Python 3.7 Mar 12, 2018

htgoebel added some commits Mar 18, 2018

@htgoebel htgoebel changed the base branch from develop to python37 Mar 20, 2018

@htgoebel htgoebel merged commit 36a3a84 into pyinstaller:python37 Mar 20, 2018

0 of 2 checks passed

continuous-integration/appveyor/pr AppVeyor build failed
Details
continuous-integration/travis-ci/pr The Travis CI build failed
Details

Support Python 3.7 automation moved this from Todo to Done Mar 20, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment