-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Force UTF-8 for filenames in breakdown analysis #4465
Conversation
I'm not sure if this issue is connected with the encoding errors I get on OpenBSD. Here's what $ type brake
brake is aliased to `LANG=en_AU.UTF-8 bundle exec rake' This seems to fix the encoding errors. I'm still not sure what's going on, or why this issue only happens on OpenBSD... |
@tenderlove If you've got a free moment, I'd like your thoughts on this fix based on your comment at #4028 (comment) hence I've requested a review from you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. As I said in a comment, we probably want to scrub the string after force encoding, but this will work.
@Alhadis the I hope that helps! |
Ah, that would make sense. Still, isn't there any way we could detect problematic |
🤔 this is more of a user env issue than a Linguist issue. We could put some sort of locale check at the beginning of abort "A UTF-8 compatible locale is required to run github-linguist" unless Encoding.default_external.to_s =~ /UTF-8/ But we may need to think carefully about this. Def a topic for another issue, though you're the only person I know who's hit this issue 😉. |
Is it? Every time I see that warning emitters by In any case, I've grown so used to running my ... but don't blame me though, there's no Docker port for OpenBSD, and I barely know a single damn thing about Docker. 😢 |
Oh, I get those too and everything I do is in UTF-8 😀 |
I know, I see the warnings emitted on TravisCI as well. When I did some digging, I remember reading something about it deferring to the first locale it could find in the site's ANYWAYYYYY, don't mind me. :D I'm happy with my Perl and JavaScript, where locale is more sane than, erm, Ruby's. 😀 (I also realised I've been side-tracked the whole day and haven't finished tending to #2988 yet. Time to get another oldie off our lingering issues list. 👍 |
As reported in #4028 the JSON output for repos which contain filenames that contain unicode chars fails with and encoding error like this:
Performing a breakdown analysis behind the scenes on GitHub.com will result in a 500 error there too.
As pointed out by @tenderlove:
... and he proposed two possible solutions. I've gone for the latter in this PR and forcing UTF-8 on all filenames when performing a breakdown.
@tenderlove do you see any issues with this approach? Seems too easy 😉.
Now analysis of my example produces:
I've added a new fixture file to the
test/attributes
branch in #4464 (will need to change the SHA in this PR when that PR has been merged) in order to test this encoding enforcement.I've gone with this as we don't care about the content of the file and can't add it to the samples as the content is deliberately descriptive about the purpose of the file. As
test/fixtures
is vendored by default the normal testing wouldn't pick up the file, so I've piggy-backed onto the repo tests when we un-vendortest/fixtures
.If the enforcement doesn't happen, the test fails as such:
Fixes #4028