-
Notifications
You must be signed in to change notification settings - Fork 206
SG-4483: Handles non-ascii unicode string paths gracefully during upload. #179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
shotgun_api3/shotgun.py
Outdated
| if isinstance(path, unicode): | ||
| try: | ||
| path = path.encode("utf-8") | ||
| except UnicodeEncodeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be wrong, but I don't think we need this exception check. All unicode characters can be turned into utf-8 (UTF-8 is a variable width character encoding capable of encoding all 1,112,064 valid code points in Unicode using one to four 8-bit bytes. - see https://en.wikipedia.org/wiki/UTF-8)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. I was doing it out of an over abundance of caution, but if it's completely unnecessary then I'll remove it!
Pull Request Test Coverage Report for Build 951
💛 - Coveralls |
manneohrstrom
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I think there may be a second case where if we pass utf-8 encoded string input to the API it will still fail. I suggest adding more test coverage so we test all different path types - see comments in PR!
| # to call open on the unicode path, but we'll use the utf-8 encoded | ||
| # string for everything else. | ||
| path_to_open = path | ||
| if isinstance(path, unicode): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am guessing if someone now uploads a path on windows and passes a utf-8 encoded string to the API, it will still fail, right? Maybe we can add a test for that and add a little bit more logic. Would be great if we tested in our unit tests upload with ascii string, utf-8 encoded string (with special characters), unicode only containing 8 bit ascii chars and unicode containing special characters). If those tests pass we should be ok (i think!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on a side note, what happens if you pass a shift-jis encoded string to the API - might be worth adding a test for that too. This is not uncommon as a locale still in japan for example. Would be nice to validate that if the input is not utf-8 and is not unicode, we provide a sensible error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good things to check. Thinking about it, you're probably right about getting a utf-8 encoded string on Windows and it failing.
…es non-ascii string paths on Windows.
…aks around the same.
manneohrstrom
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!!
shotgun_api3/shotgun.py
Outdated
| raise ShotgunError( | ||
| "Could not upload the given file path. It is encoded as " | ||
| "something other than utf-8 or ascii. To upload this file, " | ||
| "it can be encoded as utf-8, or given as unicode: %s" % path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: To be extra clear, i suggest we say "it can be a string ecoded as utf-8, or given as unicode"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| self.ticket['id'], | ||
| u_path, | ||
| 'attachments', | ||
| tag_list="monkeys, everywhere, send, help" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😍
If a unicode string that couldn't be encoded as ascii was given to the upload method, it would cause an unhandled UnicodeEncodeError to be raised. We now detect the unicode string paths and attempt to encode them as utf-8 strings before uploading.