support multibyte key name on getting object operation. #92

merged 1 commit into from Feb 14, 2014


None yet
5 participants

ksauzz commented Dec 6, 2012

No description provided.

Also ran into this issue recently trying to do a rename; doing put with unicode works fine but doing a get or a rename fails. Applying the patch fixes both issues.

Both PUT and GET were failing for me with unicode filename. I made a similar change, but it some additional places to get both working.


I pulled the changes MartinMReed did and this has been working for me. There are some functions unicode does not support, but they haven't been encountered by me while running this code yet.


mdomsch commented Feb 7, 2014

Please try upstream master branch now. e44d9b7 suggests that failures may be related to the LANG environment variable not being set to allow unicode encodings.


ksauzz commented Feb 14, 2014

Thank you for the update. But, unfortunately, current master branch doesn't work for me without my change.

According to the error message, the root cause is that byte sequence(uri string) still is read as single byte character (ascii), not as multibyte characters (unicode). Because IIRC str function read byte sequence as single byte characters even if it's multibyte characters. So whenever we read multibyte characters, we should use unicode function instead of str function.

Please let me know if it is unclear. It was good to you that I should explain this issue enough at first.

Error message

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 76: ordinal not in range(128)

Quick fix

diff --git a/S3/ b/S3/
index 66d00b4..540151c 100644
--- a/S3/
+++ b/S3/
@@ -418,7 +418,7 @@ def fetch_remote_list(args, require_attribs = False, recursive = None):
                 remote_list.record_md5(key, objectlist.get_md5(key))
         for uri in remote_uris:
-            uri_str = str(uri)
+            uri_str = unicode(uri)
             ## Wildcards used in remote URI?
             ## If yes we'll need a bucket listing...
             wildcard_split_result = re.split("\*|\?", uri_str, maxsplit=1)

My test log on master branch

% ./s3cmd get s3://unicode-test/utf8ファイル.txt

    An unexpected error has occurred.
  Please try reproducing the error using
  the latest s3cmd code from the git master
  branch found at:
  If the error persists, please report the
  following lines (removing any private
  info as necessary) to:

You have encountered a UnicodeEncodeError.  Your environment
variable LANG=ja_JP.UTF-8 may not specify a Unicode encoding (e.g. UTF-8).
Please set LANG=en_US.UTF-8 or similar in your environment before
invoking s3cmd.


Traceback (most recent call last):
  File "./s3cmd", line 2381, in <module>
    report_exception(e, msg)
  File "./s3cmd", line 2288, in report_exception
    sys.stderr.write("""Invoked as: %s""" % s)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/", line 351, in write
    data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 76: ordinal not in range(128)

ksauzz commented Feb 14, 2014

Rebased to the latest master branch.

mdomsch added a commit that referenced this pull request Feb 14, 2014

Merge pull request #92 from ksauzz/support-get-operation-to-multibyte…

support multibyte key name on getting object operation.

@mdomsch mdomsch merged commit b2b5a65 into s3tools:master Feb 14, 2014


mdomsch commented Feb 14, 2014

Thanks for explaining more and updating the patch. Pulled onto master now.


ksauzz commented Feb 14, 2014

Wow, Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment