Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on pull: bsdtar: Pathname can't be converted from UTF-8 to current locale. #355

Closed
cespare opened this issue Apr 8, 2013 · 18 comments
Milestone

Comments

@cespare
Copy link
Contributor

cespare commented Apr 8, 2013

I've had this happen a couple of times now.

I cannot pull this image; docker fails in the middle.

$ docker pull ooyala/quantal64-go1.1beta1
Pulling repository ooyala/quantal64-go1.1beta1
Pulling tag ooyala/quantal64-go1.1beta1:latest
Pulling 826b66604a035b468163b248de5f4eb3c9d292b78bce1af5e8b406225ada3681 metadata
Pulling 826b66604a035b468163b248de5f4eb3c9d292b78bce1af5e8b406225ada3681 fs layer
29286400/29286400 (100%)
Error: exit status 1: bsdtar: Pathname can't be converted from UTF-8 to current locale.
bsdtar: Pathname can't be converted from UTF-8 to current locale.
bsdtar: Pathname can't be converted from UTF-8 to current locale.
bsdtar: Pathname can't be converted from UTF-8 to current locale.
bsdtar: Pathname can't be converted from UTF-8 to current locale.
bsdtar: Linkname can't be converted from UTF-8 to current locale.
bsdtar: Linkname can't be converted from UTF-8 to current locale.
bsdtar: Linkname can't be converted from UTF-8 to current locale.
bsdtar: Linkname can't be converted from UTF-8 to current locale.
bsdtar: Linkname can't be converted from UTF-8 to current locale.
bsdtar: Linkname can't be converted from UTF-8 to current locale.
bsdtar: Linkname can't be converted from UTF-8 to current locale.
bsdtar: Pathname can't be converted from UTF-8 to current locale.
bsdtar: Pathname can't be converted from UTF-8 to current locale.
bsdtar: Pathname can't be converted from UTF-8 to current locale.
bsdtar: Linkname can't be converted from UTF-8 to current locale.
bsdtar: Pathname can't be converted from UTF-8 to current locale.
bsdtar: Linkname can't be converted from UTF-8 to current locale.
bsdtar: Pathname can't be converted from UTF-8 to current locale.
bsdtar: Linkname can't be converted from UTF-8 to current locale.
bsdtar: Error exit delayed from previous errors.

You can try pulling that image if you want. I built this image on my desktop, and I have two EC2 machines A and B with no meaningful difference between them (that I know of). A can pull this image just fine; B fails with this error. All machines involved have $LANG set to en_US.UTF-8.

@sa2ajj
Copy link
Contributor

sa2ajj commented Apr 8, 2013

It's possible that the actual locale -- en_US.UTF-8 -- is available on one machine and is not on the other...

@cespare
Copy link
Contributor Author

cespare commented Apr 8, 2013

How do I know whether this is the case, and how do I fix it?

Also, why is the docker export/import dependent on locale? That seems...wrong.

@sa2ajj
Copy link
Contributor

sa2ajj commented Apr 8, 2013

for example, you can run locale -a to see what locales are currently available.

as for the second question, it might depend because you have some filenames which use characters outside of ascii page (0-127)...

@jpetazzo
Copy link
Contributor

jpetazzo commented Apr 8, 2013

IMHO, it wouldn't be unreasonable to refuse to create images containing files having names outside of the ASCII-7 charset (unless some force/override flag is set).

@cespare
Copy link
Contributor Author

cespare commented Apr 8, 2013

Huh, I wonder what I did to create such files. This is basically the image base plus a few ubuntu packages installed plus the Go toolchain.

@cespare
Copy link
Contributor Author

cespare commented Apr 8, 2013

@sa2ajj locale -a shows identical output on machine A (where the pull works) and machine B (where it doesn't).

C
C.UTF-8
en_US.utf8
POSIX

@shykes
Copy link
Contributor

shykes commented Apr 8, 2013

After discussing on with @cespare with IRC, it seems the problem appeared only when Docker was run inside Runit. Runit passed a value of LANG which caused bsdtar to misbehave.

Question: is there something Docker can do to prevent this, or at least help debug it? Should Docker force a certain value of LANG when calling bsdtar?

@cespare
Copy link
Contributor Author

cespare commented Apr 8, 2013

Oh yeah, forgot to update this issue.

The issue was that runit runs the command with a sort of blank slate environment that didn't have $LANG set, even though the shell where I was running the docker commands did have it set to en_US.UTF-8.

My fix was to do something like this in the runit run script:

exec env LANG="en_US.UTF-8" docker -d 2>&1

Closing this ticket as my original question was answered.

@teepark
Copy link

teepark commented Apr 17, 2013

I get this issue with a fresh "precise64" vagrant VM -- no runit, and setting LANG in the daemon's environment doesn't fix it. Full reproduction steps:

$ vagrant box add precise64 http://files.vagrantup.com/precise64.box
$ mkdir dockerbox
$ cd dockerbox
$ vagrant init
$ vi Vagrantfile # change config.vm.box to "precise64"
$ vagrant up
$ vagrant ssh

vagrant@precise64:~$ sudo apt-get update
vagrant@precise64:~$ sudo apt-get install lxc wget bsdtar curl linux-image-extra-3.2.0-23-virtual
vagrant@precise64:~$ wget http://get.docker.io/builds/$(uname -s)/$(uname -m)/docker-master.tgz
vagrant@precise64:~$ tar -xf docker-master.tgz
vagrant@precise64:~$ cd docker-master/
vagrant@precise64:~/docker-master$ sudo ./docker -d &
vagrant@precise64:~/docker-master$ ./docker pull shykes/pybuilder

@jpetazzo jpetazzo reopened this Apr 18, 2013
@jpetazzo
Copy link
Contributor

We investigated this with @mzdaniel and found out the following:

  • by default, bsdtar will try to create archives using the pax format
  • the pax format will use a special kind of header to encode file names
  • this pax header stores the file name as an UTF-8 string (instead of a raw binary string)
  • the point of this pax header is to allow portability across systems using different encoding formats for special characters
  • if bsdtar cannot convert to UTF-8 (when packing) or from UTF-8 (when unpacking) it will disregard the UTF-8 name contained in the pax header, and use the raw name (and it will work fine), but it will display a warning (and docker will consider that things failed)
  • gnutar doesn't exhibit the problem, since it doesn't use that pax header (and will therefore always store the file name in binary form)
  • it is possible to instruct bsdtar to use gnutar format, which removes the warnings
  • the only downside seems to be portability issues with systems using different internal encodings (note the subtlety: even if you use filesystems with non-UTF encodings, you will be fine, because the OS will translate on the fly; you will be in trouble only if your system doesn't use UTF-8 internally—tested with a vfat mount using latin-1 charset). Since we support only Linux, this doesn't seem to be a realistic issue.

Bottom line:

  • either we ignore the warnings and we're fine,
  • or we create archives with --format=gnutar (which is equivalent to "ignore the warnings")

Additional notes: we also found out that virtually all real-world base images will exhibit the problem, because most distros will store root CA certs with their full names; i.e. on Ubuntu and Debian, /usr/share/ca-certificates/mozilla (among others) contains files named like T?B?TAK_UEKAE_K?k_Sertifika_Hizmet_Sa?lay?c?s?_-_S?r?m_3.crt or NetLock_Arany_=Class_Gold=_F?tan?s?tv?ny.crt. (With ? being some non-ASCII7 character, obviously!)

@cespare
Copy link
Contributor Author

cespare commented Apr 18, 2013

@jpetazzo Thanks for the detailed investigation!

@ojii
Copy link

ojii commented May 12, 2013

Does anyone have a workaround for this for docker pull?

@kencochrane
Copy link
Contributor

OK, I think I solved the problem. with @ojii's help we duplicated the problem on a couple of servers (his and mine) and then with some trial and error, I found out that if you change your init script to this. it will pull correctly.

We should make sure that the get.docker.io and the debian packages include the fix.

We also need to confirm that the path in docker is correct below, it seems to sometimes live in /usr/local/bin or /usr/bin

/etc/init/docker.conf

description     "Run docker"

start on runlevel [2345]
stop on starting rc RUNLEVEL=[016]
respawn

script
    test -f /etc/default/locale && . /etc/default/locale || true
    LANG=$LANG LC_ALL=$LANG /usr/local/bin/docker -d
end script

/cc @jpetazzo @mzdaniel

@vieux
Copy link
Contributor

vieux commented May 31, 2013

FYI I still have this issue on my dev VM.

I have to launch the deamon like this sudo -E LANG=en_US.utf-8 LC_ALL=en_US.utf-8 docker -d to fix the problem

@shykes
Copy link
Contributor

shykes commented Jun 1, 2013

Is there a way to reproduce this? I would like to add a fix directly into docker (by hardcoding env variables passed to bsdtar), instead of depending on the init script. I would like to test the result.

@shykes
Copy link
Contributor

shykes commented Jun 1, 2013

See #777

@jpetazzo
Copy link
Contributor

jpetazzo commented Jun 1, 2013

+1, indeed

@shykes
Copy link
Contributor

shykes commented Jun 1, 2013

#777 has been merged in master, closing tentatively.

@shykes shykes closed this as completed Jun 1, 2013
TomSweeneyRedHat pushed a commit to TomSweeneyRedHat/moby that referenced this issue Aug 27, 2019
kolyshkin pushed a commit to kolyshkin/moby that referenced this issue Sep 23, 2019
…ve_TestSearchCmdOption

[18.09 backport] Revert "Remove TestSearchCmdOptions test"
roman-neuhauser pushed a commit to roman-neuhauser/VoidWSL that referenced this issue Apr 23, 2020
i was getting this error from bsdtar in travis-provided ubuntu version::

  sudo bsdtar -xpmf base.tar.xz -C rootfs
  bsdtar: Ignoring malformed pax extended attribute
  bsdtar: Error exit delayed from previous errors.
  Makefile:39: recipe for target 'rootfs' failed
  make: *** [rootfs] Error 1

i assumed it was something that's fixed in newer bsdtar as i didn't get
that on my laptop.  since i can't be arsed to invest in travis-ci.com,
and since i need to learn about github actions anyway, i went with the
latter, which yielded the same.  switched to using a voidlinux container
inside github actions runner, where i got a different error message::

  sudo bsdtar -xpmf base.tar.xz -C rootfs
  bsdtar: Pathname can't be converted from UTF-8 to current locale.
  bsdtar: Pathname can't be converted from UTF-8 to current locale.
  bsdtar: Pathname can't be converted from UTF-8 to current locale.
  bsdtar: Pathname can't be converted from UTF-8 to current locale.
  bsdtar: Error exit delayed from previous errors.
  make: *** [Makefile:44: rootfs] Error 1

turns out the container was running with LANG=POSIX whereas it needed
LANG=en_US.UTF-8 (i'm guessing C.UTF-8 would've been fine but the
voidlinux/voidlinux image from hub.docker.com doesn't have that
(which, btw, wtf?  the patch is there:
https://github.com/void-linux/void-packages/blob/master/srcpkgs/glibc/patches/glibc-c-utf8-locale.patch))

which also means the whole container saga was perhaps unnecessary...

the following is from
moby/moby#355 (comment)

> We investigated this with @mzdaniel and found out the following:
>
> * by default, bsdtar will try to create archives using the pax format
> * the pax format will use a special kind of header to encode file names
> * this pax header stores the file name as an UTF-8 string (instead of a raw
>   binary string)
> * the point of this pax header is to allow portability across systems using
>   different encoding formats for special characters
> * if bsdtar cannot convert to UTF-8 (when packing) or from UTF-8 (when
>   unpacking) it will disregard the UTF-8 name contained in the pax header, and
>   use the raw name (and it will work fine), but it will display a warning (and
>   docker will consider that things failed)
> * gnutar doesn't exhibit the problem, since it doesn't use that pax header (and
>   will therefore always store the file name in binary form)
> * it is possible to instruct bsdtar to use gnutar format, which removes the
>   warnings
> * the only downside seems to be portability issues with systems using different
>   internal encodings (note the subtlety: even if you use filesystems with
>   non-UTF encodings, you will be fine, because the OS will translate on the
>   fly; you will be in trouble only if your system doesn't use UTF-8
>   internally—tested with a vfat mount using latin-1 charset). Since we support
>   only Linux, this doesn't seem to be a realistic issue.
>
> Bottom line:
>
> * either we ignore the warnings and we're fine,
> * or we create archives with --format=gnutar (which is equivalent to
>   "ignore the warnings")
>
> Additional notes: we also found out that virtually all real-world base images
> will exhibit the problem, because most distros will store root CA certs with
> their full names; i.e. on Ubuntu and Debian, /usr/share/ca-certificates/mozilla
> (among others) contains files named like
> T?B?TAK_UEKAE_K?k_Sertifika_Hizmet_Sa?lay?c?s?_-_S?r?m_3.crt or
> NetLock_Arany_=Class_Gold=_F?tan?s?tv?ny.crt. (With ? being some non-ASCII7
> character, obviously!)

meanwhile, i started getting this error from xbps-install inside the rootfs::

  sudo chroot rootfs /sbin/xbps-install --debug --sync --update --yes xbps
  [DEBUG] XBPS: 0.53 API: 20180730 GIT: UNSET
  ...
  [*] Updating `https://alpha.de.repo.voidlinux.org/current/x86_64-repodata' ...
  ...
  x86_64-repodata: 1658KB [avg rate: 30GB/s]
  [DEBUG] [repo] `//var/db/xbps/https___alpha_de_repo_voidlinux_org_current/x86_64-repodata' failed to open repodata archive Invalid or incomplete multibyte or wide character
  make: *** [Makefile:43: rootfs] Error 95

the answer to which is:

> Void added zstd compression for packages and repodata around 6 months back
> and made it default a few weeks ago.Your xbps version doesn't support zstd(
> it was added in version 0.54, you are on 0.53).

(https://old.reddit.com/r/voidlinux/comments/fpfhrq/help_void_wont_update/)

luckily, the libressl fuckup ("libcrypto45-3.0.2_2 (update) breaks installed
pkg `libressl-2.9.2_1'") is gone, so the Makefile uses the 20191109 snapshot
again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants