Skip to content

PYTHON-2824 Make GridOut implement full io.IOBase spec #677

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

henrifroese
Copy link
Contributor

No description provided.

@henrifroese henrifroese requested a review from ShaneHarvey July 24, 2021 12:14
Copy link
Member

@ShaneHarvey ShaneHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! A few minor comments. I'm looking into the test failures. It's possible they are unrelated to your changes. Feel free to add yourself to doc/contributors.rst as well.

Copy link
Member

@ShaneHarvey ShaneHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've narrowed down the test failures to be caused by this change:

- class GridOut(object):
+ class GridOut(io.IOBase):

Reverting this line causes the test_retryable_reads tests to pass again but I have no idea why. This needs more investigation.

The other test failure (test_collection.TestCollection.test_manual_last_error) will be fixed if you rebase your changes using git pull --rebase upstream master.

…erited methods.

Make GridOut inherit from io.IOBase to be a fully "file-like" object (https://docs.python.org/3/glossary.html#term-file-like-object).
Implement missing methods `readlines`, `writelines`,
`writable`, `fileno`, `flush`, `isatty`, `truncate`,
and property `closed`, following the spec
(https://docs.python.org/3/library/io.html#io.IOBase.writable).
Add test asserting exceptions / false for `writelines`,
`writable`, `fileno`, `flush`, `isatty`, `truncate`.

Add unittests for `readlines` and `closed`.
… iterator conform to IOBase interface

GridOut does not need its own `__closed` attribute and property `closed`, as base class
IOBase already provides that and thus, a call to `super().close()` in GridOut `close` suffices.

GridOut `readlines` is also already implemented
in IOBase using `readline`, so GridOut does not
need its own implementation.

Iterating over GridOut previously returned
chunks, but IOBase specifies
that lines should be returned. Thus, the
`GridOutIterator` returning chunks is removed
and GridOut simply uses the existing IOBase
iterator implementation (returning `self`
in `__iter__` and using `readline` in `__next__`).

Additionally, iterating over GridOut previously
did not move the "file pointer" along, i.e.
`next(iter(some_grid_out_object))` always gave
the same result (the first chunk of the file)
as it would create a new iterator starting at the
top of the file. This is now fixed as well, so
a first call to `next(iter(some_grid_out_object))`
gives the first line, and subsequent calls return
the subsequent lines.

The test_grid_file.py `test_iterator` is changed
accordingly.
@henrifroese henrifroese force-pushed the PYTHON-2824/grid-out-implement-iobase-interface branch from a266c94 to 1a5a0f4 Compare July 31, 2021 09:58
@henrifroese
Copy link
Contributor Author

Interesting; I've made the other changes and will now have a look at the tests. (As an aside, it took some time to get the test_retryable_reads.py running locally, would be great if there was some docker-compose somewhere in the repo to make it easier for others.)

@henrifroese
Copy link
Contributor Author

henrifroese commented Jul 31, 2021

So the first failure (test gridfs_download_Download_fails_on_first_attempt) (and all the others as well) seems to be due to this 🔎 :

IOBase has a destructor (see code here: https://github.com/python/cpython/blob/main/Lib/_pyio.py#L432)
that accesses attribute _IOBase__closed. The attribute access however actually calls GridOut.__getattr__,
which in turn calls GridOut._ensure_file, and ensure_file notices self._file is not set (as the first setting failed and is not retried - that's exactly what the test is testing: first find throws an error and retryReads is disabled) and fetches it again from the db.

Overriding __del__:

def __del__(self):
    pass

makes all tests pass.

As far as I can see, we don't need the destructor - there're no system resources or anything we want to release come garbage collection

@henrifroese
Copy link
Contributor Author

One possibility would now be to just override the destructor to pass. Another would be to change __getattr__ to not call _ensure_file for every argument.

@henrifroese henrifroese requested a review from ShaneHarvey July 31, 2021 22:33
@ShaneHarvey
Copy link
Member

Great find! I am still confused as to why self.__closed would trigger __getattr__. Normally __getattr__ is called only if the name isn't part of the instance/class already (ie only for "unknown" attribute names). But self.__closed should be an attribute that's already defined. Can you explain why __getattr__ is called at all in this case?

Another would be to change getattr to not call _ensure_file for every argument.

We could but this would be yet another subtle breaking change and I'd like to avoid adding more of these.

@henrifroese
Copy link
Contributor Author

Actually not sure about that. Minimal example to reproduce would be this:

>>> import io
>>> class A(io.IOBase):
...     def __getattr__(self, item):
...             print(f"__getattr__ called with {item=}")
... 
>>> 
>>> a = A()
>>> a.closed
__getattr__ called with item='__IOBase_closed'
True
>>> 
>>> del a
__getattr__ called with item='__IOBase_closed'

@ShaneHarvey
Copy link
Member

ShaneHarvey commented Aug 19, 2021

Thanks @henrifroese, apologies for the delay. My plan is to create a Python bug report for the IOBase.__closed + getattr issue but we don't need to hold up this PR for it. I suggest going with your override __del__ approach with an explanatory comment:

    # Override IOBase.__del__ otherwise it will lead to __getattr__ on
    # __IOBase_closed which calls _ensure_file and potentially performs I/O.
    # We cannot do I/O in __del__ since it can lead to a deadlock.
    def __del__(self):
        pass

IOBase `__del__` calls `__getattr__` on `__IOBase_closed` which calls `_ensure_file and potentially performs I/O.
We cannot do I/O in `__del__` since it can lead to a deadlock.
It is thus overridden to just `pass`.
@henrifroese
Copy link
Contributor Author

Makes sense, done 👌

@ShaneHarvey ShaneHarvey merged commit fa9531b into mongodb:master Aug 23, 2021
@ShaneHarvey
Copy link
Member

Thanks @henrifroese! I opened https://jira.mongodb.org/browse/PYTHON-2874 to make sure GridIn also implements io.IOBase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants