Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tarfile.add with bytes path is failing #70185

Closed
PatrikDufresne mannequin opened this issue Jan 2, 2016 · 7 comments
Closed

Tarfile.add with bytes path is failing #70185

PatrikDufresne mannequin opened this issue Jan 2, 2016 · 7 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@PatrikDufresne
Copy link
Mannequin

PatrikDufresne mannequin commented Jan 2, 2016

BPO 25997
Nosy @gustaebel, @vstinner, @ezio-melotti, @bitdancer, @vadmium

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2016-01-03.15:33:45.814>
created_at = <Date 2016-01-02.19:53:25.238>
labels = ['type-bug', 'library']
title = 'Tarfile.add with bytes path is failing'
updated_at = <Date 2016-01-03.15:33:45.813>
user = 'https://bugs.python.org/PatrikDufresne'

bugs.python.org fields:

activity = <Date 2016-01-03.15:33:45.813>
actor = 'Patrik Dufresne'
assignee = 'none'
closed = True
closed_date = <Date 2016-01-03.15:33:45.814>
closer = 'Patrik Dufresne'
components = ['Library (Lib)']
creation = <Date 2016-01-02.19:53:25.238>
creator = 'Patrik Dufresne'
dependencies = []
files = []
hgrepos = []
issue_num = 25997
keywords = []
message_count = 7.0
messages = ['257355', '257356', '257357', '257381', '257386', '257388', '257422']
nosy_count = 6.0
nosy_names = ['lars.gustaebel', 'vstinner', 'ezio.melotti', 'r.david.murray', 'martin.panter', 'Patrik Dufresne']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue25997'
versions = ['Python 3.5', 'Python 3.6']

@PatrikDufresne
Copy link
Mannequin Author

PatrikDufresne mannequin commented Jan 2, 2016

With python 3.4, Tarfile doesn't properly support adding a files with bytes path. Only unicode is supported. It's failing with exception similar to:

tar.add(os.path.join(dirpath, filename), filename)

File "/usr/lib/python3.4/tarfile.py", line 1907, in add
tarinfo = self.gettarinfo(name, arcname)
File "/usr/lib/python3.4/tarfile.py", line 1767, in gettarinfo
arcname = arcname.replace(os.sep, "/")
TypeError: expected bytes, bytearray or buffer compatible object

It uses os.sep, and u"/". Instead, it should use something like posixpath.py:_get_sep(path).

@PatrikDufresne PatrikDufresne mannequin added topic-unicode type-bug An unexpected behavior, bug, or error labels Jan 2, 2016
@SilentGhost SilentGhost mannequin added stdlib Python modules in the Lib dir and removed topic-unicode labels Jan 2, 2016
@bitdancer
Copy link
Member

See also bpo-21996.

@bitdancer
Copy link
Member

Does using a surrogateescape encoded filename work? (You won't get the error you report...my question is, does that do the right thing when building the archive?)

@vadmium
Copy link
Member

vadmium commented Jan 2, 2016

Is the tarfile module designed to support bytes for file names in general? The documentation doesn’t seem to mention bytes anywhere relevant. This seems more like a new feature rather than a bug to me.

@vadmium vadmium changed the title Tarfile.add with bytes path is failling Tarfile.add with bytes path is failing Jan 2, 2016
@PatrikDufresne
Copy link
Mannequin Author

PatrikDufresne mannequin commented Jan 2, 2016

Is the tarfile module designed to support bytes for file names in general? The documentation doesn’t seem to mention bytes anywhere relevant. This seems more like a new feature rather than a bug to me.

I'm using bytes in Unix to represent a path. From os.path docs : The path parameters can be passed as either strings, or bytes. Applications are encouraged to represent file names as (Unicode) character strings. Unfortunately, some file names may not be representable as strings on Unix, so applications that need to support arbitrary file names on Unix should use bytes objects to represent path names. Vice versa, using bytes objects cannot represent all file names on Windows (in the standard mbcs encoding), hence Windows applications should use string objects to access all files.

As such, I'm expecting to use bytes to represent a path with tarfile.

Also, tar file format doesn't define any specific encoding for filename. I'me xpecting to but any kind of bytes data for a given filename... since this was wokring in tarfile with py2.

Does using a surrogateescape encoded filename work? (You won't get the error you report...my question is, does that do the right thing when building the archive?)

I will need to have further look into surrogateescape. I read somewhere it was an experimental feature, so I didn't try it.

Thanks both for your quick feedback in this holidays.

@vadmium
Copy link
Member

vadmium commented Jan 3, 2016

It looks like surrogate-escaped bytes should be supported thanks to bpo-8390, although this is not so useful if you use the “pax” format (which always uses UTF-8 internally).

To generate a surrogate-escaped string, you can “decode” it with the following error handler:

>>> b"non-as\xA9ii".decode("ascii", "surrogateescape")
'non-as\udca9ii'

@PatrikDufresne
Copy link
Mannequin Author

PatrikDufresne mannequin commented Jan 3, 2016

It's a bit tricky, but with help of surrogateescape I get the expected result.

I'm closing this bug.

Thanks

@PatrikDufresne PatrikDufresne mannequin closed this as completed Jan 3, 2016
@ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants