Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solved Problem with Non-ASCII .gitignore Files #2229

Merged
merged 12 commits into from May 24, 2021
Merged

Solved Problem with Non-ASCII .gitignore Files #2229

merged 12 commits into from May 24, 2021

Conversation

@temeddix
Copy link
Contributor

@temeddix temeddix commented May 13, 2021

When .gitignore file in the user's project directory contained non-alphabetical characters(Japanese, Korean, Chinese, etc), Nothing worked and printed this weird message in the console('cp949' is the encoding for Korean characters in this case). It even blocked VSCode's formatting feature from working. This commit solves the problem.

Traceback (most recent call last):
File "c:\users\username\anaconda3\envs\project-name\lib\runpy.py", line 193, in run_module_as_main
"main", mod_spec)
File "c:\users\username\anaconda3\envs\project-name\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Users\username\anaconda3\envs\project-name\Scripts\black.exe_main
.py", line 7, in
File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\black_init
.py", line 1056, in patched_main
main()
File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\click\core.py", line 829, in call
return self.main(*args, **kwargs)
File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\click\core.py", line 782, in main
rv = self.invoke(ctx)
File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\click\core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\click\core.py", line 610, in invoke
return callback(*args, **kwargs)
File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\click\decorators.py", line 21, in new_func
return f(get_current_context(), *args, **kwargs)
File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\black_init_.py", line 394, in main
stdin_filename=stdin_filename,
File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\black_init_.py", line 445, in get_sources
gitignore = get_gitignore(root)
File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\black\files.py", line 122, in get_gitignore
lines = gf.readlines()
UnicodeDecodeError: 'cp949' codec can't decode byte 0xb0 in position 13: illegal multibyte sequence

When .gitignore file in the user's project directory contained non-alphabetical characters(Japanese, Korean, Chinese, etc), Nothing works and printed this weird message in the console('cp949' is the encoding for Korean characters in this case). It even blocks VSCode's formatting from working. This commit solves the problem.

Traceback (most recent call last):
  File "c:\users\username\anaconda3\envs\project-name\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "c:\users\username\anaconda3\envs\project-name\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\username\anaconda3\envs\project-name\Scripts\black.exe\__main__.py", line 7, in <module>
  File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\black\__init__.py", line 1056, in patched_main      
    main()
  File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\click\core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\click\core.py", line 782, in main
    rv = self.invoke(ctx)
  File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\click\core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\click\core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\click\decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\black\__init__.py", line 394, in main
    stdin_filename=stdin_filename,
  File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\black\__init__.py", line 445, in get_sources        
    gitignore = get_gitignore(root)
  File "c:\users\username\anaconda3\envs\project-name\lib\site-packages\black\files.py", line 122, in get_gitignore
    lines = gf.readlines()
UnicodeDecodeError: 'cp949' codec can't decode byte 0xb0 in position 13: illegal multibyte sequence
@JelleZijlstra
Copy link
Collaborator

@JelleZijlstra JelleZijlstra commented May 13, 2021

Is there a standard for the encoding of gitignore files? This might break users who use non-utf8 encodings.

Loading

@temeddix
Copy link
Contributor Author

@temeddix temeddix commented May 13, 2021

Is there a standard for the encoding of gitignore files? This might break users who use non-utf8 encodings.

I doubt it, but it's pretty much standard to always use UTF-8 in terms of encoding, especially in non-alphabetical countries. Furthurmore, there are people unable to use VS Code formatting due to this issue. My case was the same. I'll embed the links.

https://stackoverflow.com/questions/65101442/formatter-black-is-not-working-on-my-vscode-but-why
microsoft/vscode-python#4842

As you can see, they had to use workarounds to use Black..

I assume these people are having this problem because of this. The country in their profile is each set to 'Korea' and 'Israel', which is both non-alphabetical countries. It is important to make this usable to everyone, though there might be some corner-cases. Right now this blocks quite-frequent-cases from working.

Loading

@temeddix
Copy link
Contributor Author

@temeddix temeddix commented May 13, 2021

Let me add that modern editors always use UTF-8 by default, while Python's open() function uses outdated encoding(cp949, shift-jis...) depending on system language by default. This difference makes the issue. It is always recommended to add 'encoding="utf-8"' argument to the open() function in non-alphabetical Python communities.

Loading

@JelleZijlstra
Copy link
Collaborator

@JelleZijlstra JelleZijlstra commented May 13, 2021

I believe you when you say that this will solve problems for some users, but I'm still not convinced it won't break things for other users.

Intuitively it seems like we should match whatever logic git uses to decide the encoding for the gitignore file.

Loading

@temeddix
Copy link
Contributor Author

@temeddix temeddix commented May 13, 2021

I dug into other files and found the way they detected the encoding. I applied the logic here too. I just confirmed it working well even with .gitignore files with non-alphabetical characters. Is it better now?

Loading

src/black/files.py Outdated Show resolved Hide resolved
Loading
@temeddix
Copy link
Contributor Author

@temeddix temeddix commented May 14, 2021

I made the open function use tokenize.open() instead of open(), which detects encoding automatically. This way it is okay to have non-alphabetical characters in .gitignore file. It worked well.

Documentation: https://docs.python.org/3.8/library/tokenize.html#tokenize.open

Loading

@temeddix
Copy link
Contributor Author

@temeddix temeddix commented May 14, 2021

Oh I see it's my mistake, since this uses detect_encoding() internally. It doesn't seem to solve the problem.

Loading

@temeddix
Copy link
Contributor Author

@temeddix temeddix commented May 14, 2021

I'm sorry that I made a long way around. Actually it seems that it's safe to assume that .gitignore file is always encoded in UTF-8. It's said that making it use Unicode BOM or any other encoding method results in Git not detecting .gitignore file properly. Therefore it should be safe to say that .gitignore is UTF-8.

https://stackoverflow.com/questions/11451535/gitignore-is-ignored-by-git
https://sierrasoftworks.com/2019/07/12/gitignore-unicode/#background
git/git-scm.com#663

Loading

@JelleZijlstra
Copy link
Collaborator

@JelleZijlstra JelleZijlstra commented May 14, 2021

It does indeed seem that utf-8 is the standard, per the mention in https://git-scm.com/docs/git-commit-tree#_discussion. That means this change is OK to go in.

However, you should add a changelong entry to CHANGES.md.

Loading

@temeddix
Copy link
Contributor Author

@temeddix temeddix commented May 14, 2021

Okay, I just wrote the changes to CHANGES.md file.

Loading

CHANGES.md Outdated Show resolved Hide resolved
Loading
@JelleZijlstra
Copy link
Collaborator

@JelleZijlstra JelleZijlstra commented May 14, 2021

Thanks! I have a suggestion to improve the wording. After that I'll wait for a little while for feedback from other maintainers before merging this PR.

Loading

Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>
@temeddix temeddix changed the title Solved Problem with Non-alphabetical .gitignore Files Solved Problem with Non-ASCII .gitignore Files May 14, 2021
@temeddix
Copy link
Contributor Author

@temeddix temeddix commented May 14, 2021

I've also updated the title of this thread :)

Loading

@ambv
Copy link
Collaborator

@ambv ambv commented May 16, 2021

Note that this is Windows-specific.

Loading

CHANGES.md Outdated Show resolved Hide resolved
Loading
Co-authored-by: Richard Si <63936253+ichard26@users.noreply.github.com>
@ichard26
Copy link
Collaborator

@ichard26 ichard26 commented May 23, 2021

Any objections? While I'm still not 100% sure this is the right move, I won't block the merge and I'll merge sometime soon (maybe in a few hours, although I'll probably forget till tomorrow).

Oh @temeddix, by the way, you can use special keywords in the PR description so GitHub will automatically close an issue upon merge. You can read more here: https://docs.github.com/en/github/managing-your-work-on-github/linking-a-pull-request-to-an-issue#linking-a-pull-request-to-an-issue-using-a-keyword. This makes sure any issue that should be closed are in fact closed (and I won't have to go proactively hunt for such issues!), keeping our very chaotic issue tracker ever so slightly less chaotic and scary :)

Loading

Copy link
Collaborator

@cooperlees cooperlees left a comment

Makes sense to me to get consistency.

Loading

@ichard26
Copy link
Collaborator

@ichard26 ichard26 commented May 24, 2021

Given that all of the active maintainers don't have any objections, merging (and look I did not forget about this lol).

Loading

@ichard26 ichard26 merged commit 3759b85 into psf:main May 24, 2021
33 checks passed
Loading
@ichard26
Copy link
Collaborator

@ichard26 ichard26 commented May 24, 2021

Thank you so much for your contribution! This project is only possible by contributions like these 🖤. You're awesome, @temeddix. This will definitely unblock certain Windows users from using Black which is always a good thing. FYI, I'm in the process of trying to improve the contributing experience but I need feedback, so it would be great if you take some time and comment about it, more details can be found here: GH-2238

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants