Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lint-commit-messages: count multibyte characters intelligently #44

Merged
merged 1 commit into from
Jun 16, 2021

Conversation

davidchambers
Copy link
Member

If LC_CTYPE is not set appropriately, grep miscounts the characters in this commit message:

$ echo 'provide custom ‘inspect’ behaviour in Node without using ‘require’' | LC_CTYPE=POSIX grep "^.\\{74\\}$"
provide custom ‘inspect’ behaviour in Node without using ‘require’

The commit message actually contains 66 rather than 74 characters, if the multibyte characters are counted intelligently:

$ echo 'provide custom ‘inspect’ behaviour in Node without using ‘require’' | LC_CTYPE="$(locale -a | grep '^C[.]UTF-8$' || echo "$LC_CTYPE")" grep "^.\\{66\\}$"
provide custom ‘inspect’ behaviour in Node without using ‘require’

The fix provided in this pull request is to specify LC_CTYPE=C.UTF-8 if C.UTF-8 is one of the available locales.

Another solution would be to encourage users to set LC_CTYPE locally and on their continuous integration servers, but I think we should make an effort to count characters correctly no matter how the system is configured.

@davidchambers davidchambers requested a review from a team June 16, 2021 16:10
@davidchambers davidchambers merged commit b2f74c4 into master Jun 16, 2021
@davidchambers davidchambers deleted the davidchambers/encoding branch June 16, 2021 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants