New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tarfile module next() method hides exceptions #71777
Comments
I have seen a similar ticket, however that was opened 2 years ago and has nothing more than a brief description. So I opened this new one here, hoping to get some answers. tarfile.TarFile object is iterable and has a next() method. next() will parse the header and save parsed info. During parsing, a lot of checks are done, to make sure the header is valid. And if there is something wrong with the header, exceptions will be thrown. next() catches a lot of them but not reraise what it catches in all cases. I have a tgz file, one of the headers is corrupted with a wrong checksum section. thus during parsing, InvalidHeaderError was thrown. next() catches that but hide it silently. From source code (https://hg.python.org/cpython/file/2.7/Lib/tarfile.py#l2335), we can see that InvalidHeaderError will ONLY be raised if it happens in the beginning of the tar file. Actually, a lot of exceptions are hidden by tarfile module. tarfile module simply thinks these exceptions mark the end of tarball. Why does tarfile module hide so many exceptions? or in other words, why does tarfile treat these exceptions as the end marker of tarball but not errors? Is it because of this from GNU doc: Thanks! |
That would be my guess. If we are reading along and we hit garbage data, we assume we've reached the end of the tar. That doesn't mean there isn't room for improvement, or perhaps issuing a warning message about why we think we hit the end of the tar. What is the issue number of the other issue? If it is still open we should consolidate the issues if appropriate. |
The other issue is |
OK, I've closed bpo-16858 in favor of this one, since we at least had some discussion here. I see you selected 2.7. Does python3 have the same issues? (I'm guessing it does, though there has been some work done on the module.) |
Yeah, I just tried on Python3.5 and it didn't report any errors either. |
Lars Gustäbel did most of the work on this and it would be nice to get his thoughts. The exception swallowing is explicit here rather than accidental. See http://bugs.python.org/issue6123 |
The question is what you're trying to accomplish. If you just want to prevent tarfile from stopping at the first invalid header in order to extract everything following it, you may use the ignore_zeros=True keyword argument. |
I do want tarfile module to stop at the first invalid header. My question is why does tarfile module NOT throw exception about the error in header, instead it just hide it silently. |
After all these years, it is not that easy to say why the decision to swallow this exception was made. One part surely was a lack of experience with the tar format itself and all of its implementations. The other part I guess was that it was supposed to avoid problems in case users did not use TarFile as an iterator. tarfile was developed on Python 2.2 which was the first release to feature iterators. The problem if you do random access on a tarfile or call TarFile.getmembers() is that first of all all the headers must be collected. If this fails somewhere in the middle, there is no way to resume the current operation and you get nothing out of the archive. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: