Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tarfile missing entries due to omitted uid/gid fields #60062

Closed
tlynn mannequin opened this issue Sep 3, 2012 · 11 comments
Closed

tarfile missing entries due to omitted uid/gid fields #60062

tlynn mannequin opened this issue Sep 3, 2012 · 11 comments
Assignees
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@tlynn
Copy link
Mannequin

tlynn mannequin commented Sep 3, 2012

BPO 15858
Nosy @gustaebel, @vadmium
Superseder
  • bpo-24514: tarfile fails to extract archive (handled fine by gnu tar and bsdtar)
  • Files
  • bad.tar: Example tarfile with one dir and one unlisted file
  • patch.py
  • tarfile-patch.txt
  • test_tarfile.patch: New test to test_tarfile.py to test this issue and its solution
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/gustaebel'
    closed_at = <Date 2015-12-08.21:40:46.987>
    created_at = <Date 2012-09-03.19:47:40.382>
    labels = ['type-bug', 'library']
    title = 'tarfile missing entries due to omitted uid/gid fields'
    updated_at = <Date 2015-12-08.21:40:46.985>
    user = 'https://bugs.python.org/tlynn'

    bugs.python.org fields:

    activity = <Date 2015-12-08.21:40:46.985>
    actor = 'martin.panter'
    assignee = 'lars.gustaebel'
    closed = True
    closed_date = <Date 2015-12-08.21:40:46.987>
    closer = 'martin.panter'
    components = ['Library (Lib)']
    creation = <Date 2012-09-03.19:47:40.382>
    creator = 'tlynn'
    dependencies = []
    files = ['27129', '27130', '32576', '34208']
    hgrepos = []
    issue_num = 15858
    keywords = ['patch']
    message_count = 11.0
    messages = ['169799', '169822', '169868', '169887', '169888', '202641', '208225', '212057', '224013', '256115', '256128']
    nosy_count = 5.0
    nosy_names = ['tlynn', 'lars.gustaebel', 'efloehr', 'martin.panter', 'whitemice']
    pr_nums = []
    priority = 'normal'
    resolution = 'out of date'
    stage = 'resolved'
    status = 'closed'
    superseder = '24514'
    type = 'behavior'
    url = 'https://bugs.python.org/issue15858'
    versions = ['Python 2.7', 'Python 3.3', 'Python 3.4']

    @tlynn
    Copy link
    Mannequin Author

    tlynn mannequin commented Sep 3, 2012

    The tarfile module silently truncates the list of entries when reading a tar file if it sees an entry with a uid/gid field containing only spaces/NULs. I got such a tarball from Java Maven/plexus-archiver. I don't know whether they write such fields deliberately, but it seems reasonable to me, especially since they were providing the user/group names textually.

    I'd like to see two fixes - a None/-1/0 value for the uid/gid and not silently swallowing HeaderErrors in TarFile.next() (or at least documenting why it's being done). 0 would be consistent with the default value when writing, but None seems more honest. -1 seems hard to defend.

    Only tested on silly Python versions (2.6, PyPy-1.8), sorry. It's what I've got to hand, but I think this issue also applies to recent Python too going by looking at the hg trunk.

    @tlynn tlynn mannequin added stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error labels Sep 3, 2012
    @tlynn
    Copy link
    Mannequin Author

    tlynn mannequin commented Sep 4, 2012

    I think the default has to be 0 for consistency with how other empty numeric fields are handled.

    In theory spaces and NULs are supposed to be equivalent terminators in numeric fields, but I've just noticed that plexus-archiver is also using leading spaces rather than leading zeros (against the spec), e.g. ' 10422\x00 '. tarfile currently supports this, which I think is good, so I think the right approach is to lstrip(' ') fields and then treat space as a terminator.

    @gustaebel
    Copy link
    Mannequin

    gustaebel mannequin commented Sep 5, 2012

    Could you provide some sample data and code? I see the problem, but I cannot quite reproduce the behaviour you describe. In all of my testcases tarfile either throws an exception or successfully reads the archive, but never silently stops.

    @gustaebel gustaebel mannequin self-assigned this Sep 5, 2012
    @tlynn
    Copy link
    Mannequin Author

    tlynn mannequin commented Sep 5, 2012

    See attached bad.tar.

    $ less bad.tar | cat
    drwxr-xr-x 0/0               0 2012-09-05 20:04 foo/
    -rw-rw-r-- uname/gname       0 2012-09-05 20:04 foo/a
    $ python -c 'import tarfile; print(tarfile.open("bad.tar").getnames())'
    ['foo']
    $ python -c 'import tarfile, patch; patch.patch_tarfile(); print (tarfile.open("bad.tar").getnames())'
    ['foo', 'foo/a']

    I'm only allowed to attach one file via the tracker web UI, so patch.py will follow.

    Creation code for bad.tar, largely for my benefit:

    import java.io.FileOutputStream;
    import java.io.IOException;
    import org.codehaus.plexus.archiver.tar.TarOutputStream;
    import org.codehaus.plexus.archiver.tar.TarEntry;

    class TarTest {
    public static void main(String[] args) throws IOException {
    FileOutputStream fos = new FileOutputStream("bad.tar");
    TarOutputStream tos = new TarOutputStream(fos);

        TarEntry entry = new TarEntry("foo/");
        entry.setMode(16877); // 0o40755
        entry.setUserId(0);
        entry.setGroupId(0);
        entry.setUserName("");
        entry.setGroupName("");
        tos.putNextEntry(entry);
    
        TarEntry entry2 = new TarEntry("foo/a");
        entry2.setMode(33204); // 0o100664
        entry2.setUserId(-1);  // XXX: dodgy
        entry2.setGroupId(-1); // XXX: dodgy
        entry2.setUserName("uname");
        entry2.setGroupName("gname");
        tos.putNextEntry(entry2);
    
        tos.close();
        fos.close();
    }
    

    }

    @tlynn
    Copy link
    Mannequin Author

    tlynn mannequin commented Sep 5, 2012

    patch.py attached - what I'm using as a workaround at the moment.

    @akuchling
    Copy link
    Member

    Here's the changes from patch.py, put into patch format. I took out the inlining for ntb() and support for different tarfile APIs, and also replaced the use of .split(,1)[0] by .partition().

    @tlynn
    Copy link
    Mannequin Author

    tlynn mannequin commented Jan 16, 2014

    The secondary issue, which the patch doesn't address, is that TarFile.next() seems unpythonic; it treats any {Invalid,Empty,Truncated}HeaderError after offset 0 as EOF rather than propagating the exception. It looks deliberate, but I'm not sure why it would be done like that or if it should be changed.

    @efloehr
    Copy link
    Mannequin

    efloehr mannequin commented Feb 24, 2014

    I have attached a patch file for test_tarfile.py that uses the attached 'bad.tar' to test this issue.

    After applying this test patch, put 'bad.tar' in Lib/test with the name 'bpo-15858.tar'.

    This tests Tarfile.next(), but the issue can be seen via the command line interface, which uses .next() under the covers:

    Pre-patch:
    $ ./python -m tarfile -l Lib/test/issue15858.tar
    foo/

    Post-patch:
    $ ./python -m tarfile -l Lib/test/issue15858.tar
    foo/
    foo/a

    @whitemice
    Copy link
    Mannequin

    whitemice mannequin commented Jul 25, 2014

    test fails for me with provided bad.tar [as described in comment] but test passed after applying patch to tarfile.

    @tlynn
    Copy link
    Mannequin Author

    tlynn mannequin commented Dec 8, 2015

    I think bpo-24514 (fixed in Py2.7.11) is a duplicate of this issue.

    @vadmium
    Copy link
    Member

    vadmium commented Dec 8, 2015

    Yes I think you are right. With Python 3.5.0:

    $ python3 -c 'import tarfile; print(tarfile.open("bad.tar").getnames())'
    ['foo', 'foo/a']
    $ python3 -m tarfile --list bad.tar
    foo/ 
    foo/a 

    The proposed fix here is slightly different: truncate from the first space, rather than trailing whitespace.

    @vadmium vadmium closed this as completed Dec 8, 2015
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants