Work with an extra field of gzip and zip files #61881

serhiy-storchaka · 2013-04-09T15:03:01Z

BPO	17681
Nosy	@bsergean, @serhiy-storchaka
Files	gzip_extra.diff zipfile_extra.diff README.dz README.zip

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2013-04-09.15:03:01.346>
labels = ['3.8', 'type-feature', 'library']
title = 'Work with an extra field of gzip and zip files'
updated_at = <Date 2021-05-06.07:45:21.873>
user = 'https://github.com/serhiy-storchaka'

bugs.python.org fields:

activity = <Date 2021-05-06.07:45:21.873>
actor = 'nikratio'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2013-04-09.15:03:01.346>
creator = 'serhiy.storchaka'
dependencies = []
files = ['32653', '32654', '32655', '32656']
hgrepos = []
issue_num = 17681
keywords = ['patch']
message_count = 8.0
messages = ['186423', '190295', '190301', '203077', '365626', '391612', '393052', '393053']
nosy_count = 5.0
nosy_names = ['Benjamin.Sergeant', 'serhiy.storchaka', 'dmi.baranov', 'Jason Williams', 'amijalis']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue17681'
versions = ['Python 3.8']

serhiy-storchaka · 2013-04-09T15:03:01Z

Gzip files can contains an extra field and some applications use this for extending gzip format. The current GzipFile implementation ignores this field on input and doesn't allow to create a new file with an extra field.

I propose to save an extra field data on reading as a GzipFile attribute and add new parameter for GzipFile constructor for creating new file with an extra field.

dmibaranov · 2013-05-29T12:07:24Z

I'll be glad to do it, but having some questions for discussing.

First about FEXTRA format - it consists of a series of subfields [1] and current Lib/test/test_gzip.py :: test_read_with_extra having a bit incorrect extra field - sure, if somebody using format from RFC1952. You having a real samples with extra field?.
Should we parse subfields here (I have already asked Jean-Loup Gailly, maintainer of registry of subfield IDs, for current registry values and waiting reply) or will just provide extra header as byte string?

Next about GzipFile's public interface - GzipFile(...).extra look ugly. Should I extend this ticket to support all metadata headers? FNAME, FCOMMENT, FHCRC, etc - correctly reading now, but no ways to get it outside (and no ways to create a file with FCOMMENT and FHCRC now).

Eg, something to like this:
GzipFile(...).metadata.FNAME == 'sample.gz'
GzipFile(..., extra=b'AP6Test', comment='comment')

[1] http://tools.ietf.org/html/rfc1952#section-2.3.1.1

serhiy-storchaka · 2013-05-29T12:44:35Z

I have an almost ready patch but I doubt about interface. It can be discussed. ZIP file entries have similar extra field and I'm planning to add similar feature to the zipfile module too.

Here are preliminary patches.

serhiy-storchaka · 2013-11-16T19:24:33Z

Some examples:

>>> import zipfile
>>> z = zipfile.ZipFile('README.zip')
>>> z.filelist[0].extra
b'UT\x05\x00\x03\xe0\xc3\x87Rux\x0b\x00\x01\x04\xe8\x03\x00\x00\x04\xe8\x03\x00\x00'
>>> z.filelist[0].extra_map
<zipfile.ExtraMap object at 0xb6fe8bec>
>>> list(z.filelist[0].extra_map.items())
[(21589, b'\x03\xe0\xc3\x87R'), (30837, b'\x01\x04\xe8\x03\x00\x00\x04\xe8\x03\x00\x00')]
>>> import gzip
>>> gz = gzip.open('README.dz')
>>> gz.extra_bytes
b''
>>> gz.extra_map
<gzip.ExtraMap object at 0xb6fd04ac>
>>> list(gz.extra_map.items())
[]
>>> gz.read(1)
b'T'
>>> gz.extra_bytes
b'RA\x08\x00\x01\x00\xcb\xe3\x01\x00T\x0b'
>>> list(gz.extra_map.items())
[(b'RA', b'\x01\x00\xcb\xe3\x01\x00T\x0b')]

JasonWilliams · 2020-04-02T20:51:49Z

What's needed to get this integrated? It will be great to not have to fork the GZIP.

amijalis · 2021-04-22T16:45:40Z

Agreed, it would be really nice to integrate these changes. These special fields are found in gzipped .bam files, a common DNA sequence alignment format used in the bioinformatics community. It would be nice to be able to read and write them with the standard library.

bsergean · 2021-05-05T23:23:53Z

There is a comment field too which would be nice to support.

The Go gzip module has a Header class that describe all the metadata. I see in 3.8 mtime was made configurable, so hopefully we can add comment and extra.

https://golang.org/pkg/compress/gzip/#Header

For our purpose we'd like to put arbitrary stuff in a gzip file but it is complicated to do so, I might use the patch here and apply to the python gzip module, but that feels a bit hackish.

bsergean · 2021-05-05T23:33:10Z

type Header struct {
Comment string // comment
Extra []byte // "extra data"
ModTime time.Time // modification time
Name string // file name
OS byte // operating system type
}

This is what the header/extra things look like for reference.

serhiy-storchaka added stdlib Python modules in the Lib dir type-feature A feature request or enhancement labels Apr 9, 2013

serhiy-storchaka changed the title ~~Work with an extra field of gzip files~~ Work with an extra field of gzip and zip files May 29, 2013

serhiy-storchaka added the 3.8 only security fixes label Jul 13, 2018

ezio-melotti transferred this issue from another repository Apr 10, 2022

encukou mentioned this issue Aug 21, 2023

tarfiles can't open tgz files with gzip features like FEXTRA & FCOMMENT when mode='r|*' #107398

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Work with an extra field of gzip and zip files #61881

Work with an extra field of gzip and zip files #61881

serhiy-storchaka commented Apr 9, 2013

serhiy-storchaka commented Apr 9, 2013

dmibaranov mannequin commented May 29, 2013

serhiy-storchaka commented May 29, 2013

serhiy-storchaka commented Nov 16, 2013

JasonWilliams mannequin commented Apr 2, 2020

amijalis mannequin commented Apr 22, 2021

bsergean mannequin commented May 5, 2021

bsergean mannequin commented May 5, 2021

Work with an extra field of gzip and zip files #61881

Work with an extra field of gzip and zip files #61881

Comments

serhiy-storchaka commented Apr 9, 2013

serhiy-storchaka commented Apr 9, 2013

dmibaranov mannequin commented May 29, 2013

serhiy-storchaka commented May 29, 2013

serhiy-storchaka commented Nov 16, 2013

JasonWilliams mannequin commented Apr 2, 2020

amijalis mannequin commented Apr 22, 2021

bsergean mannequin commented May 5, 2021

bsergean mannequin commented May 5, 2021