Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make #107 compatible and found HttpReadStream resource leak #117

Closed
wants to merge 5 commits into from
Closed

Make #107 compatible and found HttpReadStream resource leak #117

wants to merge 5 commits into from

Conversation

rctzeng
Copy link

@rctzeng rctzeng commented Mar 22, 2017

Hello, I'm Ruo-Chun Tzeng, who'd like to participant in GSoC 2017.
As a test, I was assigned to fix #112.
There're 2 changes:

  1. I add unit tests for checking file-descriptor leak of local compressed files
    and caught gzip, bz2 are not properly closed on python 2.6.9 (failed my file-closing test)
  2. I do a quick fix to make Add generic HTTP and HTTPS streaming support. #107 compatible with make_closing function. so that HttpReadStream work with compressed formats
    However, I found the HttpReadStream is not properly closed, the contents is still readable after closing the stream.
    with smart_open.smart_open('https://github.com/RaRe-Technologies/smart_open/blob/master/smart_open/tests/test_data/crlf_at_1k_boundary.warc.gz', 'r') as fin:
        print(fin.read())
    fin.read()
    

@tmylk
Copy link
Contributor

tmylk commented Mar 23, 2017

Thanks for investigating closing but the most important of this PR is that new unit tests need to be added when some gzipped/bz2 file is opened via http.

@rctzeng rctzeng changed the title Make #107 compatible and found resource of HttpReadStream Make #107 compatible and found HttpReadStream resource leak Mar 23, 2017
@rctzeng
Copy link
Author

rctzeng commented Mar 26, 2017

OK, my bad, I've added unit tests for read/write mode tests and close tests for bz/gz2 over http.
Also, since there's file-handle leak on

  • local bz/gz2(python2.6.9) and
  • bz/gz2 over http,

the unit test for file-handle leak tests are commented.

return make_closing(BZ2File)(filename, mode)

elif ext == '.gz':
from gzip import GzipFile
if 'http' in root:
return make_closing(HttpReadStream)(filename, mode)
return make_closing(GzipFile)(filename, mode)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't fix the resource leak -- it just ignores the compression extension and then re-opens the URL, downloading it a second time. That's not correct.

@responses.activate
def test_read_write_http_gz(self):
"""Does modes correctly for gz over http?"""
text= 'line1\nline2'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not gzipped content. Please test with gzipped content. There are examples of archived files in https://github.com/RaRe-Technologies/smart_open/tree/master/smart_open/tests/test_data

@responses.activate
def test_read_write_http_bz2(self):
"""Does modes correctly for bz2 over http?"""
text= 'line1\nline2'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not bz2 content. Please use actual bz2 content

Copy link
Contributor

@tmylk tmylk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use archived content in tests

@responses.activate
def test_close_http_gz(self):
"""Is gzip over http closed correctly?"""
text= 'line1\nline2'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use gzip content

@responses.activate
def test_close_http_bz2(self):
"""Is bz2 over http closed correctly?"""
text= 'line1\nline2'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use bz2 content

self.assertRaises(ValueError, fin.read)

if os.path.isfile(test_file):
os.unlink(test_file)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't test that a filehandle hasn't been closed -- it just tests if fin.read honors that the filehandle isn't closed. You could argue that HttpReadStream should raise ValueError if someone tries to read() after close() has been called, but the fact that it allows it isn't, by itself, a resource leak.

@tmylk
Copy link
Contributor

tmylk commented Mar 27, 2017

@Lubebul Please see tests and code in #112

@tmylk
Copy link
Contributor

tmylk commented Apr 12, 2017

Fixed in #112

@tmylk tmylk closed this Apr 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants