Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work with an extra field of gzip and zip files #61881

Open
serhiy-storchaka opened this issue Apr 9, 2013 · 8 comments
Open

Work with an extra field of gzip and zip files #61881

serhiy-storchaka opened this issue Apr 9, 2013 · 8 comments
Labels
3.8 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@serhiy-storchaka
Copy link
Member

BPO 17681
Nosy @bsergean, @serhiy-storchaka
Files
  • gzip_extra.diff
  • zipfile_extra.diff
  • README.dz
  • README.zip
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2013-04-09.15:03:01.346>
    labels = ['3.8', 'type-feature', 'library']
    title = 'Work with an extra field of gzip and zip files'
    updated_at = <Date 2021-05-06.07:45:21.873>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2021-05-06.07:45:21.873>
    actor = 'nikratio'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2013-04-09.15:03:01.346>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = ['32653', '32654', '32655', '32656']
    hgrepos = []
    issue_num = 17681
    keywords = ['patch']
    message_count = 8.0
    messages = ['186423', '190295', '190301', '203077', '365626', '391612', '393052', '393053']
    nosy_count = 5.0
    nosy_names = ['Benjamin.Sergeant', 'serhiy.storchaka', 'dmi.baranov', 'Jason Williams', 'amijalis']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue17681'
    versions = ['Python 3.8']

    @serhiy-storchaka
    Copy link
    Member Author

    Gzip files can contains an extra field and some applications use this for extending gzip format. The current GzipFile implementation ignores this field on input and doesn't allow to create a new file with an extra field.

    I propose to save an extra field data on reading as a GzipFile attribute and add new parameter for GzipFile constructor for creating new file with an extra field.

    @serhiy-storchaka serhiy-storchaka added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Apr 9, 2013
    @dmibaranov
    Copy link
    Mannequin

    dmibaranov mannequin commented May 29, 2013

    I'll be glad to do it, but having some questions for discussing.

    First about FEXTRA format - it consists of a series of subfields [1] and current Lib/test/test_gzip.py :: test_read_with_extra having a bit incorrect extra field - sure, if somebody using format from RFC1952. You having a real samples with extra field?.
    Should we parse subfields here (I have already asked Jean-Loup Gailly, maintainer of registry of subfield IDs, for current registry values and waiting reply) or will just provide extra header as byte string?

    Next about GzipFile's public interface - GzipFile(...).extra look ugly. Should I extend this ticket to support all metadata headers? FNAME, FCOMMENT, FHCRC, etc - correctly reading now, but no ways to get it outside (and no ways to create a file with FCOMMENT and FHCRC now).

    Eg, something to like this:
    GzipFile(...).metadata.FNAME == 'sample.gz'
    GzipFile(..., extra=b'AP6Test', comment='comment')

    [1] http://tools.ietf.org/html/rfc1952#section-2.3.1.1

    @serhiy-storchaka
    Copy link
    Member Author

    I have an almost ready patch but I doubt about interface. It can be discussed. ZIP file entries have similar extra field and I'm planning to add similar feature to the zipfile module too.

    Here are preliminary patches.

    @serhiy-storchaka serhiy-storchaka changed the title Work with an extra field of gzip files Work with an extra field of gzip and zip files May 29, 2013
    @serhiy-storchaka
    Copy link
    Member Author

    Some examples:

    >>> import zipfile
    >>> z = zipfile.ZipFile('README.zip')
    >>> z.filelist[0].extra
    b'UT\x05\x00\x03\xe0\xc3\x87Rux\x0b\x00\x01\x04\xe8\x03\x00\x00\x04\xe8\x03\x00\x00'
    >>> z.filelist[0].extra_map
    <zipfile.ExtraMap object at 0xb6fe8bec>
    >>> list(z.filelist[0].extra_map.items())
    [(21589, b'\x03\xe0\xc3\x87R'), (30837, b'\x01\x04\xe8\x03\x00\x00\x04\xe8\x03\x00\x00')]
    >>> import gzip
    >>> gz = gzip.open('README.dz')
    >>> gz.extra_bytes
    b''
    >>> gz.extra_map
    <gzip.ExtraMap object at 0xb6fd04ac>
    >>> list(gz.extra_map.items())
    []
    >>> gz.read(1)
    b'T'
    >>> gz.extra_bytes
    b'RA\x08\x00\x01\x00\xcb\xe3\x01\x00T\x0b'
    >>> list(gz.extra_map.items())
    [(b'RA', b'\x01\x00\xcb\xe3\x01\x00T\x0b')]

    @serhiy-storchaka serhiy-storchaka added the 3.8 only security fixes label Jul 13, 2018
    @JasonWilliams
    Copy link
    Mannequin

    JasonWilliams mannequin commented Apr 2, 2020

    What's needed to get this integrated? It will be great to not have to fork the GZIP.

    @amijalis
    Copy link
    Mannequin

    amijalis mannequin commented Apr 22, 2021

    Agreed, it would be really nice to integrate these changes. These special fields are found in gzipped .bam files, a common DNA sequence alignment format used in the bioinformatics community. It would be nice to be able to read and write them with the standard library.

    @bsergean
    Copy link
    Mannequin

    bsergean mannequin commented May 5, 2021

    There is a comment field too which would be nice to support.

    The Go gzip module has a Header class that describe all the metadata. I see in 3.8 mtime was made configurable, so hopefully we can add comment and extra.

    https://golang.org/pkg/compress/gzip/#Header

    For our purpose we'd like to put arbitrary stuff in a gzip file but it is complicated to do so, I might use the patch here and apply to the python gzip module, but that feels a bit hackish.

    @bsergean
    Copy link
    Mannequin

    bsergean mannequin commented May 5, 2021

    type Header struct {
    Comment string // comment
    Extra []byte // "extra data"
    ModTime time.Time // modification time
    Name string // file name
    OS byte // operating system type
    }

    This is what the header/extra things look like for reference.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    Status: No status
    Status: No status
    Development

    No branches or pull requests

    1 participant