-
-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When iterating over Zipfiles, always use the Unix file separator to fix a Windows issue #638
When iterating over Zipfiles, always use the Unix file separator to fix a Windows issue #638
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @yorinasub17, thank you for submitting this patch!
I believe this PR can be simplified. Rather than ever using os.sep
, we should be able to always use /
as this is a zip file, which as you state is how zip files work everywhere, including in Windows.
To land this PR, could you please do two things:
- Research and confirm that this assumption about zip files always using
/
is indeed 100% true. It would be great to find the spec claiming this and to link to this in the PR description. Because we don't have any CI testing for Windows, this would be a good way to verify correctness. - Simplify the code to restore the original change and simply swap
os.sep
with/
, along with a brief comment explaining how we can do this thanks to zip files' standard (maybe include the link from prior step).
Thank you!
@Eric-Arellano Thanks for the PR review! I am currently bogged down with my other work, but will take a look at this when I free up later in the month. |
It appears that the "official spec" of zip files is the one published by PKWARE (See https://support.pkware.com/display/PKZIP/APPNOTE, the reference to that in wikipedia, and the reference to the appnote file from winzip). From there, I checked the latest spec and confirmed that all paths must use forward slashes (section 4.4.17):
Based on the above, I'll go through the process of simplifying the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for the research to confirm this! That's very helpful.
Couple suggestions and then this should be ready to merge 🎉
It also looks like CI is failing due to the style check. After making the above changes, run |
@Eric-Arellano I believe I addressed all the comments, but please don't merge this yet. I haven't had a chance to test on my windows box yet and I have a suspicion the updated version won't work because I haven't touched the |
Sorry about the style error. I thought tox succeeded, but I was fooled because it actually failed on the
|
@Eric-Arellano Thanks for your patience with me through this review. Ok after testing on my windows box, it turns out the updated version doesn't work for two reasons:
I attempted to address both of these, but it sort of reverts back to what I had originally and I am not sure if the current way is actually simpler. What do you think? Any ideas to improve this? EDIT: Note that I confirmed the latest version at commit 3241d07 works on my windows box. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for continuing to iterate on this!
Indeed this is now closer to the original PR, but with the requested changes it should still be simpler than checking if we're using Windows, plus the comment you are adding is quite useful.
if zipfile.is_zipfile(path): | ||
return cls(zipfile_path=path, prefix=prefix + os.sep if prefix else '') | ||
prefix = os.path.join(prefix, os.path.basename(path)) | ||
return cls(zipfile_path=path, prefix='{}/'.format(prefix) if prefix else '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any guarantee prefix
will never already have os.sep
in it? That's an earnest question - I couldn't figure it out from a very quick search for _ZipIterator
.
Either way, for consistency and safety, might be better to change this line to this idiom:
return cls(zipfile_path=path, prefix='' if not prefix else '{}{}'.format(prefix, os.sep).replace(os.sep, '/')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any guarantee prefix will never already have os.sep in it?
I think this can be guaranteed because prefix
starts off as ''
outside the while loop, and only contains parts constructed with os.path.basename
. os.path.basename
takes the last path element, so it is equivalent to arg.split(os.sep)[-1]
, which will mean the result won't contain the os.sep
character.
That said, there is something that doesn't make sense to me about this while loop. This threw me off at first, but I believe the while loop is repeatedly walking up root
directory by directory until it hits a zip file. E.g ignoring prefix
for a minute, the loop boils down to:
while path:
if zipfile.is_zipfile(path):
return
# if path = foo/bar/baz, then os.path.dirname(path) = foo/bar
path = os.path.dirname(path)
Given that, I think the prefix
is actually backwards. If we go into the loop with foo.zip/bar/baz
, assuming foo.zip
is the zip file, prefix
will become baz/bar
, because after the first iteration, prefix
will be basename of the path (which is baz
), and the second iteration joins that with basename of the path again (the original was prefix = os.path.join(prefix, os.path.basename(path))
, so os.path.join('baz', os.path.basename('foo.zip/bar')) = 'baz/bar'
). Am I missing something with my logic here?
This must be working because the tests pass, but I wonder if this is actually an uncaught bug? I haven't had a chance to walk through all the test cases to know if there is any test case for this, or if the caller of _ZipIterator
somehow guarantees this loop only goes one iteration deep.
Also this loop is an infinite loop if root
is ever an absolute path that does not contain a zip file, because os.path.dirname('/') == '/'
, so it will never be falsy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right about the bug - and it skates by because items are exactly one level deep. The prefix = os.path.join(prefix, os.path.basename(path))
line should be prefix = os.path.join(os.path.basename(path), prefix)
.
As far as the infinite loop goes, the one call of containing
ensures root
is not a dir (but not that it's not a file). It probably makes sense to either guard the infinite loop case or else assert early that root is either a zipfile or else not a file and not a directory (foo.zip/bar/baz
is not either of these).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great investigation @yorinasub17 and thanks @jsirois for confirming.
I think it'd be best to save this bug fix for a separate PR, as this one is focused on Windows support. If you're up for it, that'd be great for you to open this PR. No worries if you're busy - we can do it otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the confirmation. I made the fix for the prefix here, but have not addressed the infinite loop.
I won't be able to address it today though. I'll check back again when I have time and if it still hasn't been fixed yet, I'll open a PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your continued work to land this! Merging now.
Thanks for your continued support on this @jsirois and @Eric-Arellano. I made the changes requested and tested it out on my windows box and confirmed it still works. |
Hello!
I know pex doesn't officially support windows, but there seems to be a regression in pex 1.6.0 with windows with respect to the way the vendoring works. Specifically, the regex for iterating the zip path in the vendoring uses the os separator, which has two problems on windows (where the separator is
\
):\
is an escape sequence, so unless the separator is added twice in the string, it uses it to escape the character immediately after the separator, causing the regex compilation to fail./
), so using the os separator is actually wrong.This PR addresses this by updating the regex to replace windows separators (
\
) in the prefix var with the unix separator.