Skip to content

Conversation

@thebeeland
Copy link
Contributor

If a unicode string that couldn't be encoded as ascii was given to the upload method, it would cause an unhandled UnicodeEncodeError to be raised. We now detect the unicode string paths and attempt to encode them as utf-8 strings before uploading.

@thebeeland thebeeland requested a review from jfboismenu June 21, 2018 02:25
if isinstance(path, unicode):
try:
path = path.encode("utf-8")
except UnicodeEncodeError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong, but I don't think we need this exception check. All unicode characters can be turned into utf-8 (UTF-8 is a variable width character encoding capable of encoding all 1,112,064 valid code points in Unicode using one to four 8-bit bytes. - see https://en.wikipedia.org/wiki/UTF-8)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. I was doing it out of an over abundance of caution, but if it's completely unnecessary then I'll remove it!

@coveralls
Copy link

coveralls commented Jun 21, 2018

Pull Request Test Coverage Report for Build 951

  • 15 of 16 (93.75%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.2%) to 66.059%

Changes Missing Coverage Covered Lines Changed/Added Lines %
shotgun_api3/shotgun.py 15 16 93.75%
Totals Coverage Status
Change from base Build 934: 0.2%
Covered Lines: 1304
Relevant Lines: 1974

💛 - Coveralls

@thebeeland thebeeland requested a review from manneohrstrom June 21, 2018 22:23
Copy link
Contributor

@manneohrstrom manneohrstrom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I think there may be a second case where if we pass utf-8 encoded string input to the API it will still fail. I suggest adding more test coverage so we test all different path types - see comments in PR!

# to call open on the unicode path, but we'll use the utf-8 encoded
# string for everything else.
path_to_open = path
if isinstance(path, unicode):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am guessing if someone now uploads a path on windows and passes a utf-8 encoded string to the API, it will still fail, right? Maybe we can add a test for that and add a little bit more logic. Would be great if we tested in our unit tests upload with ascii string, utf-8 encoded string (with special characters), unicode only containing 8 bit ascii chars and unicode containing special characters). If those tests pass we should be ok (i think!)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on a side note, what happens if you pass a shift-jis encoded string to the API - might be worth adding a test for that too. This is not uncommon as a locale still in japan for example. Would be nice to validate that if the input is not utf-8 and is not unicode, we provide a sensible error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good things to check. Thinking about it, you're probably right about getting a utf-8 encoded string on Windows and it failing.

Copy link
Contributor

@manneohrstrom manneohrstrom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!!

raise ShotgunError(
"Could not upload the given file path. It is encoded as "
"something other than utf-8 or ascii. To upload this file, "
"it can be encoded as utf-8, or given as unicode: %s" % path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: To be extra clear, i suggest we say "it can be a string ecoded as utf-8, or given as unicode"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

self.ticket['id'],
u_path,
'attachments',
tag_list="monkeys, everywhere, send, help"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍

@thebeeland thebeeland merged commit d589f4b into master Jun 26, 2018
@thebeeland thebeeland deleted the SG-4483_unicode_file_upload branch June 26, 2018 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants