Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 20: ordinal not in range(128) #1292

Open
akavel opened this issue Mar 9, 2015 · 12 comments

Comments

@akavel
Copy link

akavel commented Mar 9, 2015

On Ubuntu, when trying to recursively put() a directory with a file which contains accented characters in the filename (specifically, the filename is: Fiat 500 - ciesz się małymi rzeczami!-720.mp4), I got an exception like below when run with --show=debug:

Fatal error: put() encountered an exception while uploading 'salt_master/root'

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/fabric/operations.py", line 395, in put
    mirror_local_mode, mode, temp_dir)
  File "/usr/local/lib/python2.7/dist-packages/fabric/sftp.py", line 317, in put_dir
    n = posixpath.join(rcontext, f)
  File "/usr/lib/python2.7/posixpath.py", line 80, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 20: ordinal not in range(128)


Aborting.
Disconnecting from localhost... done.
put() encountered an exception while uploading 'salt_master/root'

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/fabric/operations.py", line 395, in put
    mirror_local_mode, mode, temp_dir)
  File "/usr/local/lib/python2.7/dist-packages/fabric/sftp.py", line 317, in put_dir
    n = posixpath.join(rcontext, f)
  File "/usr/lib/python2.7/posixpath.py", line 80, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 20: ordinal not in range(128)

Additional info:

$ echo $LANG
en_US.UTF-8
@georgepsarakis
Copy link

Could you add 2 debug lines in /usr/local/lib/python2.7/dist-packages/fabric/sftp.py before line 317:

print type(rcontext), repr(rcontext)
print type(f), repr(f)

and paste the output here?

@bitprophet
Copy link
Member

Gross. I'm guessing this is because rcontext (containing your filename w/ non-ASCII-friendly chars) is being added to the regular string '/' within posixpath and that's triggering an attempt at encoding it (Python 2 has a number of spots where Unicode strings will be automatically encoded using the 'ascii' default encoding and this is likely one of them - and yes, even tho it says 'DecodeError' there is still implicit encoding going on apparently).

Offhand I don't see a great way to handle this on our end besides possibly trying a more lenient, explicit encoding step before calling posixpath.join, but that has its own issues (i.e. your filename would get munged unexpectedly).

Alternately, we could trap this error in sftp.py (and other uses of posixpath probably) and attempt a manual join using u'/', if that avoids the issue. Non ideal but probably better than kaboom.

FTR, Google finds that this is a widespread problem with posixpath.join, including but not limited to sphinx-doc/sphinx#1163

@akavel
Copy link
Author

akavel commented Mar 10, 2015

FYI: As of now, after some googling, I've applied a "workaround hack" as mentioned in http://stackoverflow.com/questions/2276200/changing-default-encoding-of-python (putting reload(sys) and stuff in first lines of my fabfile.py). Seems to "Work For Me Now(tm)", but unfortunately it's reported as "dangerous, may break basic stuff in language" according to a comment in the stackoverflow thread. This particular script is currently non-critical for me, so I can live with the risk.

@georgepsarakis :

<type 'unicode'> u'/home/CENSOREDX/FOOBAR/salt_master/root/./CENSORED/fiat'
<type 'str'> 'Fiat 500 - ciesz si\xc4\x99 ma\xc5\x82ymi rzeczami!-720.mp4'

Fatal error: put() encountered an exception while uploading 'salt_master/root/.'

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/fabric/operations.py", line 395, in put
    mirror_local_mode, mode, temp_dir)
  File "/usr/local/lib/python2.7/dist-packages/fabric/sftp.py", line 319, in put_dir
    n = posixpath.join(rcontext, f)
  File "/usr/lib/python2.7/posixpath.py", line 80, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 20: ordinal not in range(128)


Aborting.
Disconnecting from localhost... done.
put() encountered an exception while uploading 'salt_master/root/.'

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/fabric/operations.py", line 395, in put
    mirror_local_mode, mode, temp_dir)
  File "/usr/local/lib/python2.7/dist-packages/fabric/sftp.py", line 319, in put_dir
    n = posixpath.join(rcontext, f)
  File "/usr/lib/python2.7/posixpath.py", line 80, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 20: ordinal not in range(128)

@georgepsarakis
Copy link

From what I tested with os.path.join, I think that this will work in the files loop (https://github.com/fabric/fabric/blob/master/fabric/sftp.py#L318):

try:
    f = f.decode('utf-8')
except:
    pass

@akavel do you want to patch the file and retry?

@georgepsarakis
Copy link

@bitprophet I believe the problem starts with os.walk, normally it returns str but if unicode_literals are imported, file & directory names are converted to type unicode, at least from what I tested. Which as you note in your comment is perfectly fine, until you face an implicit encoding/decoding when attempting to concatenate str and unicode.

Perhaps importing unicode_literals in the fabfile (or file using Fabric as a library) has this effect? I think it might.

@akavel
Copy link
Author

akavel commented Mar 16, 2015

@georgepsarakis Yes, seemed to work for me ok with the patch!

@georgepsarakis
Copy link

Thanks @akavel for testing!
@bitprophet what do you think? Wouldn't this solve all cases?
I would be happy to submit a pull request (with some tests of course) if you think this is a proper solution.

@bitprophet
Copy link
Member

My understanding is that trying to use UTF-8 as a catchall will still cause issues elsewhere (e.g. folks using other encodings, such as UK Windows users) and possibly even for folks who wouldn't otherwise have encountered this.

Also really not a fan of bare except - I think I get what you're going for but it still makes me squeamish (I've encountered so many issues where well-meaning "try a thing, bare except and continue" setups caused lots of debugging pain :()

I think what would be ideal is to merge the two situations together:

for f in files:
    try:
        n = posixpath.join(rcontext, f)
    except UnicodeDecodeError:
        n = posixpath.join(rcontext, f.decode('utf-8'))
    # ...

This way we ensure the decode only fires in the presence of an otherwise unrecoverable error. Defaulting to UTF-8 is still not perfect but when limited like this I think it's probably okay for now - iterative bugfixing is a thing :)

If that change still works for @akavel and/or others, I'll go ahead and merge a copy of it.

@georgepsarakis
Copy link

@bitprophet I agree on the bare except remark, it may indeed be far more difficult to debug, since it is too generic. Your solution is certainly better :) .

Perhaps a more broad approach could be using getlocale to get the remote locale settings and use it instead of defaulting it to utf-8 as you pointed out? Not sure how reliable would that be though, especially cross-platform.

I still do not get entirely though the part that unicode_literals modifies the os.walk returning type from str to unicode.

@ghenadied
Copy link

ghenadied commented Dec 29, 2017

Hi,

Same issue occurs simply when trying to run remote command, having output with some french characters:

Traceback (most recent call last): File "/usr/local/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/usr/local/lib/python2.7/threading.py", line 754, in run self.__target(*self.__args, **self.__kwargs) File "build/bdist.linux-x86_64/egg/fabric/network.py", line 350, in outputter out_stream.write("%s: %s\n" % (prefix, line)), UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 32: ordinal not in range(128)

As an example, we were trying to run testng (6.11) from ant, on a remote machine.
Here is the source code which logs the character that causes trouble.

[testng] ... [testng] ... TestNG 6.11 by Cédric Beust (cedric@beust.com) [testng] ... [testng]

To reproduce, simply execute command on file with above content:
run('cat file.txt')

Forgot to note version of fabrics: 0.9.3. Probably it's old.
I have to try an upgrade as per issue 1180 it might be fixed.

Thanks

@ploxiln
Copy link

ploxiln commented Dec 29, 2017

Is it possible that your terminal LANG or related vars are not UTF-8?

$ echo LANG=$LANG
LANG=en_US.UTF-8

@ghenadied
Copy link

Hi,

It's UTF-8.
After more research, concluded I have to upgrade fabrics first.

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants