-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tarball header mods choke on non-ascii file names #335
Comments
awood
added a commit
to awood/tito
that referenced
this issue
Apr 8, 2019
…racters in the name. Following the principals of https://nedbatchelder.com/text/unipain.html our goal is to decode bytes in to unicode as soon as we read them and encode unicode date to bytes at the last second. The specific problem we were seeing was caused by calling "encode" on a byte string rather than a unicode string. Python attempts to be "helpful" and tries to decode the bytes as ASCII in order to provide a unicode string to the encode function. Since the bytes aren't ASCII, the decode fails and we get the UnicodeDecodeError despite the fact that we never explicitly asked for a decode at all.
awood
added a commit
to awood/tito
that referenced
this issue
Apr 9, 2019
…racters in the name. Following the principals of https://nedbatchelder.com/text/unipain.html our goal is to decode bytes in to unicode as soon as we read them and encode unicode date to bytes at the last second. The specific problem we were seeing was caused by calling "encode" on a byte string rather than a unicode string. Python attempts to be "helpful" and tries to decode the bytes as ASCII in order to provide a unicode string to the encode function. Since the bytes aren't ASCII, the decode fails and we get the UnicodeDecodeError despite the fact that we never explicitly asked for a decode at all.
awood
added a commit
to awood/tito
that referenced
this issue
Apr 9, 2019
…racters in the name. Following the principals of https://nedbatchelder.com/text/unipain.html our goal is to decode bytes in to unicode as soon as we read them and encode unicode date to bytes at the last second. The specific problem we were seeing was caused by calling "encode" on a byte string rather than a unicode string. Python attempts to be "helpful" and tries to decode the bytes as ASCII in order to provide a unicode string to the encode function. Since the bytes aren't ASCII, the decode fails and we get the UnicodeDecodeError despite the fact that we never explicitly asked for a decode at all.
awood
added a commit
to awood/tito
that referenced
this issue
Apr 9, 2019
…racters in the name. Following the principals of https://nedbatchelder.com/text/unipain.html our goal is to decode bytes in to unicode as soon as we read them and encode unicode date to bytes at the last second. The specific problem we were seeing was caused by calling "encode" on a byte string rather than a unicode string. Python attempts to be "helpful" and tries to decode the bytes as ASCII in order to provide a unicode string to the encode function. Since the bytes aren't ASCII, the decode fails and we get the UnicodeDecodeError despite the fact that we never explicitly asked for a decode at all.
awood
added a commit
to awood/tito
that referenced
this issue
Apr 9, 2019
…racters in the name. Following the principals of https://nedbatchelder.com/text/unipain.html our goal is to decode bytes in to unicode as soon as we read them and encode unicode date to bytes at the last second. The specific problem we were seeing was caused by calling "encode" on a byte string rather than a unicode string. Python attempts to be "helpful" and tries to decode the bytes as ASCII in order to provide a unicode string to the encode function. Since the bytes aren't ASCII, the decode fails and we get the UnicodeDecodeError despite the fact that we never explicitly asked for a decode at all.
awood
added a commit
to awood/tito
that referenced
this issue
Apr 9, 2019
…racters in the name. Following the principals of https://nedbatchelder.com/text/unipain.html our goal is to decode bytes in to unicode as soon as we read them and encode unicode date to bytes at the last second. The specific problem we were seeing was caused by calling "encode" on a byte string rather than a unicode string. Python attempts to be "helpful" and tries to decode the bytes as ASCII in order to provide a unicode string to the encode function. Since the bytes aren't ASCII, the decode fails and we get the UnicodeDecodeError despite the fact that we never explicitly asked for a decode at all.
awood
added a commit
to awood/tito
that referenced
this issue
Apr 9, 2019
…racters in the name. Following the principals of https://nedbatchelder.com/text/unipain.html our goal is to decode bytes in to unicode as soon as we read them and encode unicode date to bytes at the last second. The specific problem we were seeing was caused by calling "encode" on a byte string rather than a unicode string. Python attempts to be "helpful" and tries to decode the bytes as ASCII in order to provide a unicode string to the encode function. Since the bytes aren't ASCII, the decode fails and we get the UnicodeDecodeError despite the fact that we never explicitly asked for a decode at all.
awood
added a commit
to awood/tito
that referenced
this issue
Apr 10, 2019
…racters in the name. Following the principals of https://nedbatchelder.com/text/unipain.html our goal is to decode bytes in to unicode as soon as we read them and encode unicode date to bytes at the last second. The specific problem we were seeing was caused by calling "encode" on a byte string rather than a unicode string. Python attempts to be "helpful" and tries to decode the bytes as ASCII in order to provide a unicode string to the encode function. Since the bytes aren't ASCII, the decode fails and we get the UnicodeDecodeError despite the fact that we never explicitly asked for a decode at all. Also, calculate checksums correctly for tarballs that have files with UTF8 characters in the file name.
awood
added a commit
to awood/tito
that referenced
this issue
Apr 10, 2019
…racters in the name. Following the principals of https://nedbatchelder.com/text/unipain.html our goal is to decode bytes in to unicode as soon as we read them and encode unicode date to bytes at the last second. The specific problem we were seeing was caused by calling "encode" on a byte string rather than a unicode string. Python attempts to be "helpful" and tries to decode the bytes as ASCII in order to provide a unicode string to the encode function. Since the bytes aren't ASCII, the decode fails and we get the UnicodeDecodeError despite the fact that we never explicitly asked for a decode at all. Also, calculate checksums correctly for tarballs that have files with UTF8 characters in the file name.
awood
added a commit
to awood/tito
that referenced
this issue
Apr 10, 2019
…racters in the name. Following the principals of https://nedbatchelder.com/text/unipain.html our goal is to decode bytes in to unicode as soon as we read them and encode unicode date to bytes at the last second. The specific problem we were seeing was caused by calling "encode" on a byte string rather than a unicode string. Python attempts to be "helpful" and tries to decode the bytes as ASCII in order to provide a unicode string to the encode function. Since the bytes aren't ASCII, the decode fails and we get the UnicodeDecodeError despite the fact that we never explicitly asked for a decode at all. Also, calculate checksums correctly for tarballs that have files with UTF8 characters in the file name.
jmrodri
added a commit
that referenced
this issue
Sep 20, 2019
xsuchy
added a commit
to xsuchy/tito
that referenced
this issue
Oct 3, 2019
…ls with UTF8 characters in the name." This partialy reverts commit 03509b3. It removes just test and keep the functionality. The test cannot be there right now because tito 0.6.11 and older will choke on this and will produce demaged tarball. This revert can be added back later when all devel has tito in version 0.6.12 or higher. Resolves: rpm-software-management#337
dgoodwin
added a commit
that referenced
this issue
Oct 3, 2019
Partial revert "Fix #335. Handle source tarballs with UTF8 characters…
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I ran tito on a project that introduced a file with the name
utf8_tést_app.rb
. When tito went to edit the source tarball, it choked on this filename being in the headers:In python2 there's an implicit
decode("ascii")
which is what it's complaining about, this being equivalent toIf I assume that the text is
utf8
(which includes ascii) and explicitlydecode("utf8")
then this code does not raise an exception, but tar does not like the output on the next step.So it seems likely that the text needs to be decoded into unicode at some point before this code, but it's not clear to me where. Also I'm not sure what encoding tar assumes.
The text was updated successfully, but these errors were encountered: