Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urllib: urlretrieve() does not close file objects on failure #49786

Closed
petrdolezal mannequin opened this issue Mar 22, 2009 · 2 comments
Closed

urllib: urlretrieve() does not close file objects on failure #49786

petrdolezal mannequin opened this issue Mar 22, 2009 · 2 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@petrdolezal
Copy link
Mannequin

petrdolezal mannequin commented Mar 22, 2009

BPO 5536
Nosy @benjaminp

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2009-03-22.17:45:19.605>
created_at = <Date 2009-03-22.12:28:54.086>
labels = ['type-bug', 'library']
title = 'urllib: urlretrieve() does not close file objects on failure'
updated_at = <Date 2009-03-22.17:45:19.546>
user = 'https://bugs.python.org/petrdolezal'

bugs.python.org fields:

activity = <Date 2009-03-22.17:45:19.546>
actor = 'benjamin.peterson'
assignee = 'none'
closed = True
closed_date = <Date 2009-03-22.17:45:19.605>
closer = 'benjamin.peterson'
components = ['Library (Lib)']
creation = <Date 2009-03-22.12:28:54.086>
creator = 'petr.dolezal'
dependencies = []
files = []
hgrepos = []
issue_num = 5536
keywords = []
message_count = 2.0
messages = ['83970', '83978']
nosy_count = 2.0
nosy_names = ['benjamin.peterson', 'petr.dolezal']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue5536'
versions = ['Python 3.0']

@petrdolezal
Copy link
Mannequin Author

petrdolezal mannequin commented Mar 22, 2009

urllib.request.urlretrieve() does not close the file object created for
the retrieval when it fails during processing of the incoming data and
raises an exception (e.g. on HTTP 404 response). Therefore the file
remains opened until the process terminates and the OS itself closes the
orphaned file handle.

This behaviour may result in orphaned temporary/incomplete files. It is
also not just a resource leak, but it has another bad side effect on
Windows platform (at least): the file can't be deleted (due to the used
creation mode) before the handle is closed. But the entire file object,
including the handle, is lost due to the exception, thus nobody
(including the process itself) is able to delete the file until the
process terminates.

Consider this code snippet demonstrating the described behaviour:
import os
import urllib.request
FILENAME = 'nonexistent.html'
try:
# The host must be valid, else the address resolving fails
# before the target file is even created. But existing host
# and non-existent resource is exactly what's the problem.
NON_EXISTENT_URL = 'http://www.python.org/nonexistent.html'
urllib.request.urlretrieve(NON_EXISTENT_URL, FILENAME)
except Exception:
if os.path.exists(FILENAME):
print('File exists! Attempting to delete.')
os.unlink(FILENAME)
print('Succeeded.')

On Windows, following output appears:
File exists! Attempting to delete.
Traceback (most recent call last):
  File "test.py", line 6, in <module>
    urllib.request.urlretrieve(NON_EXISTENT_URL, FILENAME)
  File "C:\Program Files\Python\lib\urllib\request.py", line 134, in
urlretrieve
    return _urlopener.retrieve(url, filename, reporthook, data)
  File "C:\Program Files\Python\lib\urllib\request.py", line 1502, in
retrieve
    block = fp.read(bs)
  File "C:\Program Files\Python\lib\io.py", line 572, in read
    self._checkClosed()
  File "C:\Program Files\Python\lib\io.py", line 450, in _checkClosed
    if msg is None else msg)
ValueError: I/O operation on closed file.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test.py", line 10, in <module>
    os.unlink(FILENAME)
WindowsError: [Error 32] The process cannot access the file because it
is being used by another process: 'nonexistent.html'

As a quick fix it is possible to ensure closing both source and target
file objects in finally blocks. I also assume the function should delete
the target file on an exception: the file is not only incomplete, but
its name is also unknown to the client code in the case of the temporary
file made by urlretrieve() itself. If the client code is interested in
partial downloads, I guess it should take another way to retrieve the
resource as urlretrieve() interface doesn't look like supporting
something like partial download.

Anyway, the proposed solution is still not the optimal one: ValueError
with message "I/O operation on closed handle" is really nothing I would
expect as a valid error when downloading a non-existent web page. I
guess a check on the source file object before reading begins would
discover the problem early and raise more appropriate IOError or
something like that.

Note:
This bug report probably applies to older versions of urllib, but I
can't verify it now. I know at least I spotted it in 2.6 just before I
upgraded to 3.0.1.

@petrdolezal petrdolezal mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Mar 22, 2009
@benjaminp
Copy link
Contributor

Fixed in r70521.

@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

1 participant