New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: bpo-34990: year 2038 problem in compileall.py #9892

Open
wants to merge 12 commits into
base: master
from

Conversation

Projects
None yet
8 participants
@matrixise
Contributor

matrixise commented Oct 15, 2018

Use 'L' as an unsigned long for the timestamp

https://bugs.python.org/issue34990

bpo-34990: year 2038 problem in compileall.py
Use 'L' as an unsigned long for the timestamp
@bmwiedemann

This comment has been minimized.

Contributor

bmwiedemann commented Oct 15, 2018

Lib/test/test_compileall.py also contains 2 instances of <4sll that probably should be updated accordingly.
and you could add the test code provided by xtreak in the bug.

@matrixise

This comment has been minimized.

Contributor

matrixise commented Oct 15, 2018

@bmwiedemann yep, I was in the plane from PySS and I just commited my code, and now, I just saw the test of @tirkarthi . I am going to include it.

matrixise added some commits Oct 15, 2018

Add a test with compileall and the .pyc file with a timestamp >= 2018…
…-01-20

Co-authored-by: Karthikeyan Singaravelan <tir.karthi@gmail.com>
@tirkarthi

This comment has been minimized.

Contributor

tirkarthi commented Oct 15, 2018

@matrixise Thanks for the PR. I think it's a NEWS entry worthy fix to add a short note that compileall now supports files with modification time greater than or equal to 19-01-2038 (I guess)

@tirkarthi

This comment has been minimized.

Contributor

tirkarthi commented Oct 15, 2018

@matrixise Sorry, I just noticed you have added one :)

@matrixise

This comment has been minimized.

Contributor

matrixise commented Oct 15, 2018

@tirkarthi please, could you check my PR with the last commits.

@bmwiedemann

This comment has been minimized.

Contributor

bmwiedemann commented Oct 15, 2018

btw: the first bad mtime is 0x80000000 or 2147483648 or 2038-01-19T03:14:08+00:00

def test_compile_all_2038(self):
with open(self.source_path, 'r') as f:
os.utime(f.name, (2147558400, 2147558400)) # Jan 20, 2038 as touch
self.assertTrue(compileall.compile_file(pathlib.Path(self.source_path)))

This comment has been minimized.

@bmwiedemann

bmwiedemann Oct 15, 2018

Contributor

not sure, but maybe this is not a good place in this file to add the test in, because SOURCE_DATE_EPOCH changes the default .pyc type to be hash-based, which would not hit the error condition. See #9607

This comment has been minimized.

@matrixise

matrixise Oct 15, 2018

Contributor

I am not sure, just copy the code of @tirkarthi . I will wait for a review from @vstinner or an other core-dev.

This comment has been minimized.

@vstinner

vstinner Oct 17, 2018

Member

It should be moved to CompileallTestsBase.

This comment has been minimized.

@matrixise

matrixise Oct 17, 2018

Contributor

done

@tirkarthi

This comment has been minimized.

Contributor

tirkarthi commented Oct 15, 2018

@matrixise Thanks, Looks good to me from the context of the issue reported but I have limited understanding about the code so I hope someone else might have a better look. Seems there was some discussion about handling 2038 problem as part of the original PR where this code was added : #4575 (comment)

@bmwiedemann

This comment has been minimized.

Contributor

bmwiedemann commented Oct 15, 2018

I think the discussion in PR 4575 was about future-proofing the new .pyc format, but it failed to notice that by 2038, only 31 bits are exhausted and bit 32 (which was erroneously interpreted as the sign-bit here), allows to keep the format alive until 2106 - which we can safely assume out-of-scope for today.

@matrixise

This comment has been minimized.

Contributor

matrixise commented Oct 15, 2018

This is the reason why I move to 'L' instead of 'l'

@serhiy-storchaka

Check also other parts of the stdlib that work with mtime in pyc files. For example zipimport and checkpyc.py need updates. Maybe there are other parts.

@@ -139,7 +139,7 @@ def compile_file(fullname, ddir=None, force=False, rx=None, quiet=0,
if not force:
try:
mtime = int(os.stat(fullname).st_mtime)
expect = struct.pack('<4sll', importlib.util.MAGIC_NUMBER,
expect = struct.pack('<4slL', importlib.util.MAGIC_NUMBER,
0, mtime)

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Oct 16, 2018

Member

mtime & 0xFFFFFFFF

@bedevere-bot

This comment has been minimized.

bedevere-bot commented Oct 16, 2018

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be poked with soft cushions!

@@ -139,7 +139,7 @@ def compile_file(fullname, ddir=None, force=False, rx=None, quiet=0,
if not force:
try:
mtime = int(os.stat(fullname).st_mtime)
expect = struct.pack('<4sll', importlib.util.MAGIC_NUMBER,
expect = struct.pack('<4slL', importlib.util.MAGIC_NUMBER,

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Oct 16, 2018

Member

Also, it would be better to handle the second field as an unsigned integer too.

This comment has been minimized.

@vstinner

vstinner Oct 16, 2018

Member

importlib uses int(mtime) & 0xFFFFFFFF, I suggest... int(mtime) & 0xFFFF_FFFF for readability :-D

@matrixise

This comment has been minimized.

Contributor

matrixise commented Oct 17, 2018

@serhiy-storchaka last update for Tools/scripts/cherrypyc.py by in June 2013, we only read the magic number and timestamp, just that: https://github.com/python/cpython/blob/master/Tools/scripts/checkpyc.py#L45-L46

In https://github.com/python/cpython/blob/master/Lib/zipimport.py#L602 you convert the dword as a uint32 and LGTM.

@matrixise matrixise requested a review from python/import-team as a code owner Oct 17, 2018

@matrixise

This comment has been minimized.

Contributor

matrixise commented Oct 17, 2018

@vstinner and @serhiy-storchaka I have fixed mtime with 0xFFFF_FFFF and use an unsigned long, where I found <4slL but for the other files, I am not an expert of the API and I don't know where I could search, zipimport and checkpy.py seem to be good.

@@ -51,7 +51,7 @@ def main():
print('Bad MAGIC word in ".pyc" file', end=' ')
print(repr(name_c))
continue
mtime = get_long(mtime_str)
mtime = get_long(mtime_str) & 0xFFFF_FFFF

This comment has been minimized.

@bmwiedemann

bmwiedemann Oct 17, 2018

Contributor

This is actually a NOP, because mtime_str is just 4 bytes, that cannot encode anything larger than 0xFFFF_FFFF

This comment has been minimized.

@matrixise

matrixise Oct 17, 2018

Contributor

thanks, I am not familiar with the operations on the bits.

@serhiy-storchaka

zipimport needs truncating the timestamp to 32 bit for comparison.

Also update _code_to_timestamp_pyc() in _bootstrap_external.

@@ -51,7 +51,7 @@ def main():
print('Bad MAGIC word in ".pyc" file', end=' ')
print(repr(name_c))
continue
mtime = get_long(mtime_str)
mtime = get_long(mtime_str) & 0xFFFF_FFFF
if mtime in {0, -1}:

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Oct 17, 2018

Member

mtime can't be -1 after adding & 0xFFFF_FFFF.

This comment has been minimized.

@matrixise

matrixise Oct 17, 2018

Contributor

maybe add the 0xFFFF_FFFF on the line 57?

This comment has been minimized.

@serhiy-storchaka
@@ -40,7 +40,7 @@ def make_pyc(co, mtime, size):
else:
mtime = int(-0x100000000 + int(mtime))
pyc = (importlib.util.MAGIC_NUMBER +

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Oct 17, 2018

Member

I think the above few lines can be removed.

This comment has been minimized.

@matrixise

matrixise Oct 17, 2018

Contributor

the lines from 38->41?

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Oct 18, 2018

Member

Yes. They are roughly equivalent to mtime = int(mtime) & 0xFFFF_FFFF if use an unsigned integer format.

@@ -40,7 +40,7 @@ def make_pyc(co, mtime, size):
else:
mtime = int(-0x100000000 + int(mtime))
pyc = (importlib.util.MAGIC_NUMBER +
struct.pack("<iii", 0, int(mtime), size & 0xFFFFFFFF) + data)

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Oct 17, 2018

Member

Signed format units are used.

@@ -51,7 +51,7 @@ def main():
print('Bad MAGIC word in ".pyc" file', end=' ')
print(repr(name_c))
continue
mtime = get_long(mtime_str)
mtime = get_long(mtime_str) & 0xFFFF_FFFF

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Oct 17, 2018

Member

Sees the definition of get_long(). It returns an unsigned 32-bit integer, and returns -1 only on error. You need to convert not it, but the file timestamp.

@matrixise matrixise closed this Oct 17, 2018

@matrixise matrixise reopened this Oct 17, 2018

@@ -40,7 +40,7 @@ def make_pyc(co, mtime, size):
else:
mtime = int(-0x100000000 + int(mtime))
pyc = (importlib.util.MAGIC_NUMBER +
struct.pack("<iii", 0, int(mtime) & 0xFFFF_FFFF, size & 0xFFFF_FFFF) + data)
struct.pack("<iii", 0, int(mtime), size & 0xFFFF_FFFF) + data)

This comment has been minimized.

@bmwiedemann

bmwiedemann Oct 17, 2018

Contributor

why this revert?

This comment has been minimized.

@matrixise

matrixise Oct 17, 2018

Contributor

just because @serhiy-storchaka indicated that the signed was used here.

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Oct 18, 2018

Member

I meant that you need to update the format string.

@@ -586,7 +586,7 @@ def _code_to_timestamp_pyc(code, mtime=0, source_size=0):
"Produce the data for a timestamp-based pyc."
data = bytearray(MAGIC_NUMBER)
data.extend(_pack_uint32(0))
data.extend(_pack_uint32(mtime))
data.extend(_pack_uint32(mtime) & 0xFFFF_FFFF)

This comment has been minimized.

@vstinner

vstinner Oct 17, 2018

Member

This change is useless, _pack_uint32() already does "& 0xFFFFFFFF". importlib can be left unchanged.

@@ -53,8 +53,8 @@ def add_bad_source_file(self):
def timestamp_metadata(self):
with open(self.bc_path, 'rb') as file:
data = file.read(12)
mtime = int(os.stat(self.source_path).st_mtime)
compare = struct.pack('<4sll', importlib.util.MAGIC_NUMBER, 0, mtime)
mtime = int(os.stat(self.source_path).st_mtime) & 0xFFFF_FFF

This comment has been minimized.

@vstinner

vstinner Oct 17, 2018

Member

That's why I like the underscore: it's easier to spot bugs. You missed something :-)

def test_compile_all_2038(self):
with open(self.source_path, 'r') as f:
os.utime(f.name, (2147558400, 2147558400)) # Jan 20, 2038 as touch
self.assertTrue(compileall.compile_file(pathlib.Path(self.source_path)))

This comment has been minimized.

@vstinner

vstinner Oct 17, 2018

Member

It should be moved to CompileallTestsBase.

@@ -0,0 +1 @@
Use an unsigned long for the timestamp for the .pyc file in compileall.py

This comment has been minimized.

@vstinner

vstinner Oct 17, 2018

Member

You describe the fix, but you should describe the fixed bug instead: compileall now supports timestamps newer than year 2038.

This comment has been minimized.

@matrixise

matrixise Oct 17, 2018

Contributor

+1

@@ -54,7 +54,7 @@ def main():
mtime = get_long(mtime_str)
if mtime in {0, -1}:
print('Bad ".pyc" file', repr(name_c))
elif mtime != st[ST_MTIME]:

This comment has been minimized.

@vstinner

vstinner Oct 17, 2018

Member

I don't know if this code should be modified, but it is modified: should we also cast to int()?

This comment has been minimized.

@matrixise

matrixise Oct 17, 2018

Contributor

like that?

elif int(mtime) != int(st[ST_MTIME] & 0xFFFF_FFFF)

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Oct 18, 2018

Member

mtime is already int. But st[ST_MTIME] is float.

matrixise added some commits Oct 17, 2018

@matrixise

This comment has been minimized.

Contributor

matrixise commented Oct 17, 2018

@serhiy-storchaka can you continue the review with me, because I am a little bit like a chicken without head ;-) I try to understand with 0xFFFF_FFFF ;-)

matrixise added some commits Oct 17, 2018

Revert my modification in Lib/importlib/_bootstrap_external.py: useless,
_pack_uint32() already does "& 0xFFFFFFFF"
@@ -40,7 +40,7 @@ def make_pyc(co, mtime, size):
else:
mtime = int(-0x100000000 + int(mtime))
pyc = (importlib.util.MAGIC_NUMBER +
struct.pack("<iii", 0, int(mtime) & 0xFFFF_FFFF, size & 0xFFFF_FFFF) + data)
struct.pack("<iii", 0, int(mtime), size & 0xFFFF_FFFF) + data)

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Oct 18, 2018

Member

I meant that you need to update the format string.

@@ -51,7 +51,7 @@ def main():
print('Bad MAGIC word in ".pyc" file', end=' ')
print(repr(name_c))
continue
mtime = get_long(mtime_str)
mtime = get_long(mtime_str) & 0xFFFF_FFFF
if mtime in {0, -1}:

This comment has been minimized.

@serhiy-storchaka
@@ -54,7 +54,7 @@ def main():
mtime = get_long(mtime_str)
if mtime in {0, -1}:
print('Bad ".pyc" file', repr(name_c))
elif mtime != st[ST_MTIME]:

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Oct 18, 2018

Member

mtime is already int. But st[ST_MTIME] is float.

@@ -0,0 +1 @@
compileall now supports timestamps newer than year 2038

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Oct 18, 2018

Member

Please use punctuation.

@@ -40,7 +40,7 @@ def make_pyc(co, mtime, size):
else:
mtime = int(-0x100000000 + int(mtime))
pyc = (importlib.util.MAGIC_NUMBER +

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Oct 18, 2018

Member

Yes. They are roughly equivalent to mtime = int(mtime) & 0xFFFF_FFFF if use an unsigned integer format.

class CompileallTestsWithSourceEpoch(CompileallTestsBase,
unittest.TestCase,
metaclass=SourceDateEpochTestMeta,
source_date_epoch=True):
pass

This comment has been minimized.

@serhiy-storchaka

serhiy-storchaka Oct 18, 2018

Member

Unrelated change.

@matrixise matrixise changed the title from bpo-34990: year 2038 problem in compileall.py to WIP: bpo-34990: year 2038 problem in compileall.py Oct 25, 2018

@brettcannon

This comment has been minimized.

Member

brettcannon commented Nov 2, 2018

Due to the WIP in the title I added the DO-NOT-MERGE label.

@matrixise

This comment has been minimized.

Contributor

matrixise commented Nov 3, 2018

@brettcannon Thank you, I will continue to work on this issue on the next week, this week I am at the hospital with my daughter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment