Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix list zip #68

Merged
merged 7 commits into from
Oct 8, 2019
Merged

Bugfix list zip #68

merged 7 commits into from
Oct 8, 2019

Conversation

belldandyxtq
Copy link
Member

This PR fixes the two bugs mentioned in #66
Closes #66

This commit fixes the two bugs mentioned in #66
Add tests to cover the cases in issue #66
@belldandyxtq belldandyxtq added the cat:bug Bug report or fix. label Oct 2, 2019
Copy link
Member

@kuenishi kuenishi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bugfix must include a test (case) that reproduces the bug.

tests/container_tests/test_zip_container.py Outdated Show resolved Hide resolved
tests/container_tests/test_zip_container.py Outdated Show resolved Hide resolved
tests/container_tests/test_zip_container.py Outdated Show resolved Hide resolved
tests/container_tests/test_zip_container.py Outdated Show resolved Hide resolved
@kuenishi
Copy link
Member

kuenishi commented Oct 3, 2019

Also, this introduces non-compat change on trailing slash, so it would be nice mentioned somewhere at the release time

This commit refactors the test code to use expected_list to improve
readability.
Copy link
Member

@kuenishi kuenishi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code seems good at a glance, but it's actually still broken.

$ unzip -l tmp3.zip 
Archive:  tmp3.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  2019-10-04 10:13   tmp/
    62672  2019-09-13 10:53   tmp/tmp.zip
        0  2019-10-04 10:13   tmp/in/
    62980  2019-10-04 10:13   tmp/in/tmp2.zip
---------                     -------
   125652                     4 files
>>> import chainerio as chio
>>> z = chio.open_as_container('tmp3.zip')
>>> list(z.list('tmp', recursive=True))
['tmp/', 'tmp/tmp.zip', 'tmp/in/', 'tmp/in/tmp2.zip']
>>> list(z.list('tmp/in', recursive=True))
['tmp/', 'tmp/tmp.zip', 'tmp/in/', 'tmp/in/tmp2.zip']
>>> list(z.list('tmp//in', recursive=True))
['tmp/', 'tmp/tmp.zip', 'tmp/in/', 'tmp/in/tmp2.zip']
>>> list(z.list('tmp//in'))
[]

I think the double-slash bug also resides in HDFS listing code, which also does prefix matching.

@kuenishi
Copy link
Member

kuenishi commented Oct 4, 2019

A few more examples:

>>> list(z.list('t'))
['mp']
>>> list(z.list('a'))
['tmp']
>>> list(z.list('/'))
['tmp']
>>> list(z.list('tm'))
['p']
>>> list(z.list('tmp/i'))
['tmp', 'n']
>>> list(z.list('tmp/in/tm'))
['tmp', 'p2.zip']
>>> list(z.list('tmp/tm'))
['tmp', 'p.zip']

@belldandyxtq
Copy link
Member Author

For the recursive case, it is known and supposed to be fixed in #72, which has conflicts with this PR and will be rebased after this is merged

@belldandyxtq
Copy link
Member Author

belldandyxtq commented Oct 4, 2019

I do not think double-slash bug exists in hdfs list as it is not using "prefix matching" as we did here.

>>> list(chainerio.list('hdfs:///user/tianqi//tesetdir///'))
['test']

It first checks the state of the file and used the "converted and normalized path" as the prefix.
https://github.com/chainer/chainerio/blob/af3b45b5d1f84db87cf73fee4bce9cc10a6d1039/chainerio/filesystems/hdfs.py#L145

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:bug Bug report or fix.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug in list of directory in zip
2 participants