-
-
Notifications
You must be signed in to change notification settings - Fork 450
Towards supporting unicode #242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Looks good so far. I note that |
|
the only case we don't support yet is for e.g. >>> import chardet
>>> chardet.detect("تست".encode())
{'encoding': 'utf-8', 'confidence': 0.87625, 'language': ''}
>>> chardet.detect("Test".encode())
{'encoding': 'ascii', 'confidence': 1.0, 'language': ''}Do you prefer adding this kind of dependency to EDIT: For --- ~/.pyenv/versions/2.7.16/lib/python2.7/site-packages/xdis/unmarshal.py Sat May 25 05:08:37 2019
+++ ~/.pyenv/versions/2.7.16/lib/python2.7/site-packages/xdis/unmarshal.py Sat May 25 05:37:41 2019
@@ -57,7 +57,7 @@
# found it and this code via
# https://www.peterbe.com/plog/unicode-to-ascii where it is a
# dead link. That can potentially do better job in converting accents.
- return unicodedata.normalize('NFKD', u).encode('ascii', 'ignore')
+ return unicodedata.normalize('NFKD', u)
else:
return str(u)
|
Yeah, this was expedient and flaky. Please put in PR for fixing xdis. You should also have an invite to that project.
Sure! (I assume you do to since you suggested it?) chardet goes back to 2.1 and looks pretty cool and well written. However there might also be an option to indicate an encoding so the program doesn't have to guess. And while on the topic of options processing, the 2.7 option-processing branch would be greatly cleaned up if we started using click. For decompile3 there's no excuse not to use (other than not having gotten around to it). |
Nice, this options is a better alternative. I'll commit for the option. Added |
|
@rocky, i did changes for |
|
@x0ret You have my admiration for noticing such things and thinking about them. I don't know if I can be of help other than to be a sounding board for ideas. I have thought about this for the last 10 minutes or so. When you say:
do you mean that normally uncompyle6 would try to turn this into ASCII, but here it might change behavior and turn it into unicode instead? The If I have this right, you had suggested using chardet which can detect if there are non-ascii characters in there, and thus may need to be in unicode?) Right? Would chardet work or help? When I feel like I know something, I'll say so. But when I don't think I know or understand, I won't hesitate to admit it. So I leave to you how you want to ultimately proceed. The way I see this is, and the way I have been working, has been to make a stab in the right direction even if it is flaky or incomplete or I don't understand the problem fully. Almost always that is better than doing nothing. And if there is a problem, unless this is a massive and difficult change to revert (which you are in a better position to know than me), having moved hopefully forward (or at least in a particular direction) we are in a better position to assess what the right or better thing to do is. As you may have seen, I am not even afraid to make the wrong decision and own up to it. Hence I'll leave those "FIXME" or "TODO" comments. Sorry I can't be of more help. |
|
@rocky, thanks for your comment.
Yes, I was worried about breaking generated source in Python2, however after another shot, i am convinced that when someone uses explicit unicode chars in source, using In this case we do not need There was only an issue in code which i used Besides based on your suggestion i added another option Please review the commits and let me know if you prefer changes. Also No changes required for Unicode strings like
This is so valuable, I learned alot since working on this project. Thanks.
Anyway your words encouraged me to recheck again. |
rocky
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me - thanks!
(I will be committing the change to xdis soon since I gather there are no objections with that.
Support unicode docstring
Support unicode strings
There are some issue i think, using python2.7 env to decompile python3.7 pyc results in
\n\ndocstring, which i think this isxdisrelated issue.This is WIP.
fixes #241.