Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tarball header mods choke on non-ascii file names #335

Closed
sosiouxme opened this issue Mar 28, 2019 · 1 comment
Closed

tarball header mods choke on non-ascii file names #335

sosiouxme opened this issue Mar 28, 2019 · 1 comment

Comments

@sosiouxme
Copy link
Contributor

I ran tito on a project that introduced a file with the name utf8_tést_app.rb. When tito went to edit the source tarball, it choked on this filename being in the headers:

  File "/bin/tito", line 23, in <module>
    CLI().main(sys.argv[1:])
  File "/usr/lib/python2.7/site-packages/tito/cli.py", line 202, in main
    return module.main(argv)
  File "/usr/lib/python2.7/site-packages/tito/cli.py", line 593, in main
    scratch=self.options.scratch)
  File "/usr/lib/python2.7/site-packages/tito/release/distgit.py", line 73, in release
    self._git_release()
  File "/usr/lib/python2.7/site-packages/tito/release/distgit.py", line 90, in _git_release
    self.builder.tgz()
  File "/usr/lib/python2.7/site-packages/tito/builder/main.py", line 484, in tgz
    self._setup_sources()
  File "/usr/lib/python2.7/site-packages/tito/builder/main.py", line 519, in _setup_sources
    os.path.join(self.rpmbuild_sourcedir, self.tgz_filename))
  File "/usr/lib/python2.7/site-packages/tito/common.py", line 972, in create_tgz
    tarfixer.fix()
  File "/usr/lib/python2.7/site-packages/tito/tar.py", line 331, in fix
    self.process_chunk(chunk)
  File "/usr/lib/python2.7/site-packages/tito/tar.py", line 314, in process_chunk
    self.process_header(chunk_props)
  File "/usr/lib/python2.7/site-packages/tito/tar.py", line 203, in process_header
    chunk_props['checksum'] = self.calculate_checksum(chunk_props)
  File "/usr/lib/python2.7/site-packages/tito/tar.py", line 241, in calculate_checksum
    values = self.encode_header(chunk_props)
  File "/usr/lib/python2.7/site-packages/tito/tar.py", line 198, in encode_header
    pack_values.append(chunk_props[member].encode("utf8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 75: ordinal not in range(128)

In python2 there's an implicit decode("ascii") which is what it's complaining about, this being equivalent to

    pack_values.append(chunk_props[member].decode("ascii").encode("utf8"))

If I assume that the text is utf8 (which includes ascii) and explicitly decode("utf8") then this code does not raise an exception, but tar does not like the output on the next step.

Error running command: [...] tar xzf openshift-git-0.9896b19.tar.gz

Status code: 512

Command output: tar: Skipping to next header
tar: Exiting with failure status due to previous errors

So it seems likely that the text needs to be decoded into unicode at some point before this code, but it's not clear to me where. Also I'm not sure what encoding tar assumes.

@sosiouxme
Copy link
Contributor Author

@awood @xsuchy

awood added a commit to awood/tito that referenced this issue Apr 8, 2019
…racters in the name.

Following the principals of https://nedbatchelder.com/text/unipain.html
our goal is to decode bytes in to unicode as soon as we read them and
encode unicode date to bytes at the last second.

The specific problem we were seeing was caused by calling "encode" on a
byte string rather than a unicode string.  Python attempts to be
"helpful" and tries to decode the bytes as ASCII in order to provide a
unicode string to the encode function.  Since the bytes aren't ASCII,
the decode fails and we get the UnicodeDecodeError despite the fact that
we never explicitly asked for a decode at all.
awood added a commit to awood/tito that referenced this issue Apr 9, 2019
…racters in the name.

Following the principals of https://nedbatchelder.com/text/unipain.html
our goal is to decode bytes in to unicode as soon as we read them and
encode unicode date to bytes at the last second.

The specific problem we were seeing was caused by calling "encode" on a
byte string rather than a unicode string.  Python attempts to be
"helpful" and tries to decode the bytes as ASCII in order to provide a
unicode string to the encode function.  Since the bytes aren't ASCII,
the decode fails and we get the UnicodeDecodeError despite the fact that
we never explicitly asked for a decode at all.
awood added a commit to awood/tito that referenced this issue Apr 9, 2019
…racters in the name.

Following the principals of https://nedbatchelder.com/text/unipain.html
our goal is to decode bytes in to unicode as soon as we read them and
encode unicode date to bytes at the last second.

The specific problem we were seeing was caused by calling "encode" on a
byte string rather than a unicode string.  Python attempts to be
"helpful" and tries to decode the bytes as ASCII in order to provide a
unicode string to the encode function.  Since the bytes aren't ASCII,
the decode fails and we get the UnicodeDecodeError despite the fact that
we never explicitly asked for a decode at all.
awood added a commit to awood/tito that referenced this issue Apr 9, 2019
…racters in the name.

Following the principals of https://nedbatchelder.com/text/unipain.html
our goal is to decode bytes in to unicode as soon as we read them and
encode unicode date to bytes at the last second.

The specific problem we were seeing was caused by calling "encode" on a
byte string rather than a unicode string.  Python attempts to be
"helpful" and tries to decode the bytes as ASCII in order to provide a
unicode string to the encode function.  Since the bytes aren't ASCII,
the decode fails and we get the UnicodeDecodeError despite the fact that
we never explicitly asked for a decode at all.
awood added a commit to awood/tito that referenced this issue Apr 9, 2019
…racters in the name.

Following the principals of https://nedbatchelder.com/text/unipain.html
our goal is to decode bytes in to unicode as soon as we read them and
encode unicode date to bytes at the last second.

The specific problem we were seeing was caused by calling "encode" on a
byte string rather than a unicode string.  Python attempts to be
"helpful" and tries to decode the bytes as ASCII in order to provide a
unicode string to the encode function.  Since the bytes aren't ASCII,
the decode fails and we get the UnicodeDecodeError despite the fact that
we never explicitly asked for a decode at all.
awood added a commit to awood/tito that referenced this issue Apr 9, 2019
…racters in the name.

Following the principals of https://nedbatchelder.com/text/unipain.html
our goal is to decode bytes in to unicode as soon as we read them and
encode unicode date to bytes at the last second.

The specific problem we were seeing was caused by calling "encode" on a
byte string rather than a unicode string.  Python attempts to be
"helpful" and tries to decode the bytes as ASCII in order to provide a
unicode string to the encode function.  Since the bytes aren't ASCII,
the decode fails and we get the UnicodeDecodeError despite the fact that
we never explicitly asked for a decode at all.
awood added a commit to awood/tito that referenced this issue Apr 9, 2019
…racters in the name.

Following the principals of https://nedbatchelder.com/text/unipain.html
our goal is to decode bytes in to unicode as soon as we read them and
encode unicode date to bytes at the last second.

The specific problem we were seeing was caused by calling "encode" on a
byte string rather than a unicode string.  Python attempts to be
"helpful" and tries to decode the bytes as ASCII in order to provide a
unicode string to the encode function.  Since the bytes aren't ASCII,
the decode fails and we get the UnicodeDecodeError despite the fact that
we never explicitly asked for a decode at all.
awood added a commit to awood/tito that referenced this issue Apr 10, 2019
…racters in the name.

Following the principals of https://nedbatchelder.com/text/unipain.html
our goal is to decode bytes in to unicode as soon as we read them and
encode unicode date to bytes at the last second.

The specific problem we were seeing was caused by calling "encode" on a
byte string rather than a unicode string.  Python attempts to be
"helpful" and tries to decode the bytes as ASCII in order to provide a
unicode string to the encode function.  Since the bytes aren't ASCII,
the decode fails and we get the UnicodeDecodeError despite the fact that
we never explicitly asked for a decode at all.

Also, calculate checksums correctly for tarballs that have files with
UTF8 characters in the file name.
awood added a commit to awood/tito that referenced this issue Apr 10, 2019
…racters in the name.

Following the principals of https://nedbatchelder.com/text/unipain.html
our goal is to decode bytes in to unicode as soon as we read them and
encode unicode date to bytes at the last second.

The specific problem we were seeing was caused by calling "encode" on a
byte string rather than a unicode string.  Python attempts to be
"helpful" and tries to decode the bytes as ASCII in order to provide a
unicode string to the encode function.  Since the bytes aren't ASCII,
the decode fails and we get the UnicodeDecodeError despite the fact that
we never explicitly asked for a decode at all.

Also, calculate checksums correctly for tarballs that have files with
UTF8 characters in the file name.
awood added a commit to awood/tito that referenced this issue Apr 10, 2019
…racters in the name.

Following the principals of https://nedbatchelder.com/text/unipain.html
our goal is to decode bytes in to unicode as soon as we read them and
encode unicode date to bytes at the last second.

The specific problem we were seeing was caused by calling "encode" on a
byte string rather than a unicode string.  Python attempts to be
"helpful" and tries to decode the bytes as ASCII in order to provide a
unicode string to the encode function.  Since the bytes aren't ASCII,
the decode fails and we get the UnicodeDecodeError despite the fact that
we never explicitly asked for a decode at all.

Also, calculate checksums correctly for tarballs that have files with
UTF8 characters in the file name.
jmrodri added a commit that referenced this issue Sep 20, 2019
xsuchy added a commit to xsuchy/tito that referenced this issue Oct 3, 2019
…ls with UTF8 characters in the name."

This partialy reverts commit 03509b3.
It removes just test and keep the functionality.

The test cannot be there right now because tito 0.6.11 and older will choke on this and will produce demaged tarball.

This revert can be added back later when all devel has tito in version 0.6.12 or higher.

Resolves: rpm-software-management#337
dgoodwin added a commit that referenced this issue Oct 3, 2019
Partial revert "Fix #335. Handle source tarballs with UTF8 characters…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant