-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Solved Problem with Non-ASCII .gitignore Files #2229
Conversation
When .gitignore file in the user's project directory contained non-alphabetical characters(Japanese, Korean, Chinese, etc), Nothing works and printed this weird message in the console('cp949' is the encoding for Korean characters in this case). It even blocks VSCode's formatting from working. This commit solves the problem. Traceback (most recent call last): File "c:\users\username\anaconda3\envs\project-name\lib\runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "c:\users\username\anaconda3\envs\project-name\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File "C:\Users\username\anaconda3\envs\project-name\Scripts\black.exe\__main__.py", line 7, in <module> File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\black\__init__.py", line 1056, in patched_main main() File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\click\core.py", line 829, in __call__ return self.main(*args, **kwargs) File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\click\core.py", line 782, in main rv = self.invoke(ctx) File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\click\core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\click\core.py", line 610, in invoke return callback(*args, **kwargs) File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\click\decorators.py", line 21, in new_func return f(get_current_context(), *args, **kwargs) File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\black\__init__.py", line 394, in main stdin_filename=stdin_filename, File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\black\__init__.py", line 445, in get_sources gitignore = get_gitignore(root) File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\black\files.py", line 122, in get_gitignore lines = gf.readlines() UnicodeDecodeError: 'cp949' codec can't decode byte 0xb0 in position 13: illegal multibyte sequence
Is there a standard for the encoding of gitignore files? This might break users who use non-utf8 encodings. |
I doubt it, but it's pretty much standard to always use UTF-8 in terms of encoding, especially in non-alphabetical countries. Furthurmore, there are people unable to use VS Code formatting due to this issue. My case was the same. I'll embed the links. https://stackoverflow.com/questions/65101442/formatter-black-is-not-working-on-my-vscode-but-why As you can see, they had to use workarounds to use Black.. I assume these people are having this problem because of this. The country in their profile is each set to 'Korea' and 'Israel', which is both non-alphabetical countries. It is important to make this usable to everyone, though there might be some corner-cases. Right now this blocks quite-frequent-cases from working. |
Let me add that modern editors always use UTF-8 by default, while Python's open() function uses outdated encoding(cp949, shift-jis...) depending on system language by default. This difference makes the issue. It is always recommended to add 'encoding="utf-8"' argument to the open() function in non-alphabetical Python communities. |
I believe you when you say that this will solve problems for some users, but I'm still not convinced it won't break things for other users. Intuitively it seems like we should match whatever logic git uses to decide the encoding for the gitignore file. |
I dug into other files and found the way they detected the encoding. I applied the logic here too. I just confirmed it working well even with .gitignore files with non-alphabetical characters. Is it better now? |
I made the open function use Documentation: https://docs.python.org/3.8/library/tokenize.html#tokenize.open |
Oh I see it's my mistake, since this uses |
…ting" This reverts commit 50dd804.
I'm sorry that I made a long way around. Actually it seems that it's safe to assume that .gitignore file is always encoded in UTF-8. It's said that making it use Unicode BOM or any other encoding method results in Git not detecting .gitignore file properly. Therefore it should be safe to say that .gitignore is UTF-8. https://stackoverflow.com/questions/11451535/gitignore-is-ignored-by-git |
It does indeed seem that utf-8 is the standard, per the mention in https://git-scm.com/docs/git-commit-tree#_discussion. That means this change is OK to go in. However, you should add a changelong entry to CHANGES.md. |
Okay, I just wrote the changes to CHANGES.md file. |
Thanks! I have a suggestion to improve the wording. After that I'll wait for a little while for feedback from other maintainers before merging this PR. |
Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
I've also updated the title of this thread :) |
Note that this is Windows-specific. |
Co-authored-by: Richard Si <63936253+ichard26@users.noreply.github.com>
Any objections? While I'm still not 100% sure this is the right move, I won't block the merge and I'll merge sometime soon (maybe in a few hours, although I'll probably forget till tomorrow). Oh @temeddix, by the way, you can use special keywords in the PR description so GitHub will automatically close an issue upon merge. You can read more here: https://docs.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword. This makes sure any issue that should be closed are in fact closed (and I won't have to go proactively hunt for such issues!), keeping our very chaotic issue tracker ever so slightly less chaotic and scary :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me to get consistency.
Given that all of the active maintainers don't have any objections, merging (and look I did not forget about this lol). |
Thank you so much for your contribution! This project is only possible by contributions like these 🖤. You're awesome, @temeddix. This will definitely unblock certain Windows users from using Black which is always a good thing. FYI, I'm in the process of trying to improve the contributing experience but I need feedback, so it would be great if you take some time and comment about it, more details can be found here: GH-2238 |
When .gitignore file in the user's project directory contained non-alphabetical characters(Japanese, Korean, Chinese, etc), Nothing worked and printed this weird message in the console('cp949' is the encoding for Korean characters in this case). It even blocked VSCode's formatting feature from working. This commit solves the problem.
Traceback (most recent call last):
File "c:\users\username\anaconda3\envs\project-name\lib\runpy.py", line 193, in run_module_as_main
"main", mod_spec)
File "c:\users\username\anaconda3\envs\project-name\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Users\username\anaconda3\envs\project-name\Scripts\black.exe_main.py", line 7, in
File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\black_init.py", line 1056, in patched_main
main()
File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\click\core.py", line 829, in call
return self.main(*args, **kwargs)
File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\click\core.py", line 782, in main
rv = self.invoke(ctx)
File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\click\core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\click\core.py", line 610, in invoke
return callback(*args, **kwargs)
File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\click\decorators.py", line 21, in new_func
return f(get_current_context(), *args, **kwargs)
File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\black_init_.py", line 394, in main
stdin_filename=stdin_filename,
File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\black_init_.py", line 445, in get_sources
gitignore = get_gitignore(root)
File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\black\files.py", line 122, in get_gitignore
lines = gf.readlines()
UnicodeDecodeError: 'cp949' codec can't decode byte 0xb0 in position 13: illegal multibyte sequence