New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for scripts with unicode content #1389
Conversation
changelog.d/1366.change.rst
Outdated
@@ -1 +1,2 @@ | |||
In package_index, fixed handling of encoded entities in URLs. | |||
Scripts which have unicode content are now sopported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be in its own changelog file, changelog.d/1389.change.rst
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also s/sopported/supported
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
setuptools/command/easy_install.py
Outdated
@@ -108,7 +108,7 @@ def isascii(s): | |||
else: | |||
|
|||
def _to_ascii(s): | |||
return s.encode('ascii') | |||
return s.encode('utf8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... Looking at what this does I think I agree with this change (though I don't know nearly enough about unicode issues to fully judge it), but maybe we should also change _to_ascii
to _to_bytes
?
Also, this change definitely needs tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean having only the _to_bytes()
function?
BTW, tests added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well just _to_ascii
sounds like it turns something to an encoded ASCII string, but this is actually returning a utf-8-encoded byte string, so it should probably be called _to_bytes()
instead of _to_ascii()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@pganssle fixed |
e239095
to
d94437c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have rebased this and cleaned up the history a bit. Will merge when CI passes.
d94437c
to
c43d0f6
Compare
This also renames the _to_ascii function to better reflect its purpose.
Summary of changes
Makes the
_to_ascii()
function able to handle script's contents in unicode format.Closes #761
Pull Request Checklist