Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove FileObject Wrapper #38

Merged
merged 8 commits into from
Aug 9, 2019
Merged

Remove FileObject Wrapper #38

merged 8 commits into from
Aug 9, 2019

Conversation

belldandyxtq
Copy link
Member

@belldandyxtq belldandyxtq commented Aug 6, 2019

This PR implements the redesigned open wrapper discussed in #37,
and replace the file object with a fileobject_returner
to wrap the file object only when necessary.
For more details, please refer to (Issue #37)

This solves problem 1 and partially for problem 2 (except for Python <= 3.6) in #37

@belldandyxtq belldandyxtq added cat:bug Bug report or fix. cat:enhancement Implementation that does not break interfaces. labels Aug 6, 2019
@@ -84,6 +108,7 @@ def test_makedirs(self):
self.assertRaises(io.UnsupportedOperation,
handler.makedirs, "test/test")

@pytest.mark.timeout(1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this timeout is introduced? Introducing non-determinism makes tests very flaky thus must be carefully designed and intended.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I stated in the comment message, I want to make sure the pickle problem is solved in pytest

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout value need to be chosen again probably

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to another PR

chainerio/io.py Outdated
@@ -31,7 +29,26 @@ def __init__(self, io_profiler: Optional[IOProfiler] = None,
self.io_profiler = io_profiler
self.type = "BASEIO"
self.root = root
self.fileobj_class = FileObject

def _fileobject_returner(self, file_obj: Type['IOBase'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_wrap_fileobject would be more suitable name?

chainerio/io.py Outdated
replace the file object returned by underly system.
Derived class overrides this function in order to
add functionalities when needed or match the behaviour.
In the default case, just forward the given file object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term 'forward' is not common IMO.

if self.zip_file_obj is None:
zip_file = self.base_handler.open(self.base, "rb")
if isinstance(self.base_handler, ZipContainer) \
and sys.version_info < (3, 7, ):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion was not like this; to reopen B.zip and replace the file object right before opening A in B.zip in C.zip, not when opening B.zip in C.zip.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I am thinking in a wrong way.

to reopen B.zip and replace the file object right before opening A in B.zip in C.zip, not when opening B.zip in C.zip.

I don't think this is possible because zip file module doesn't do lazy open, hence they check the seekable upon creating the zipfile.ZipFile object. If the file object passed to the zipfile.ZipFile is not seekable, an exception is thrown at that point. So we need to make the file object seekable at open. But since we are using lazy open for now, the actual open is delayed until user open A in B.zip in C.zip. so the behavior to user matches your expectation for now I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed offline, we will move this issue to another PR

chainerio/filesystems/hdfs.py Show resolved Hide resolved
@belldandyxtq belldandyxtq force-pushed the new_wrapper branch 3 times, most recently from 0a3482a to 1f9d02d Compare August 7, 2019 05:15
@belldandyxtq belldandyxtq removed the cat:bug Bug report or fix. label Aug 7, 2019
Copy link
Member

@kuenishi kuenishi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this should be merged after the CI fix.

if self.zip_file_obj is None:
zip_file = self.base_handler.open(self.base, "rb")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get why this line is moved above. Could you explain your intention?

chainerio/io.py Outdated
replace the file object returned by underly system.
Derived class overrides this function in order to
add functionalities when needed or match the behaviour.
In the default case, just return the given file object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe "In the default case, it just returns the given file object."

tests/filesystem_tests/test_hdfs_handler.py Show resolved Hide resolved
@belldandyxtq
Copy link
Member Author

pfnci, test this please

@pfn-ci-bot
Copy link

Successfully created a job for commit ccd5661:

This commit implements the redesigned open wrapper,
and replace the file object with a `fileobject_returner`
to wrap the file object only when necessary.
For more details, please refer to (Issue #37)

A support for nested zip for Python < 3.7 is added in zipfile.
The readline test in hdfs `test_read_bytes` is removed.
The original purpose of this test was to support pickle,
while such functionality is already tested in `test_pickle`.
Having `readline` when opening with 'rb' is weird.
@kuenishi kuenishi changed the title Redesign Open Wrapper Remove FileObject Wrapper Aug 9, 2019
@kuenishi kuenishi merged commit 65a5d26 into master Aug 9, 2019
@kuenishi kuenishi deleted the new_wrapper branch August 9, 2019 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:enhancement Implementation that does not break interfaces.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants