Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve binary file check #205

Closed
rasa opened this issue Jun 19, 2022 · 3 comments
Closed

Improve binary file check #205

rasa opened this issue Jun 19, 2022 · 3 comments
Labels
feature a feature which should be implemented help wanted

Comments

@rasa
Copy link
Contributor

rasa commented Jun 19, 2022

Currently, we determine if a file is binary or not by sniffing the content type using the first 512 characters and seeing if it's text/ or application/octet-stream. This produces lots of false positives, such as files that are UTF16.

I think a better solution would be to determine the charset and then check for binary characters. I found that the tool dos2unix, uses the characters \x00-\x08,\x0b,\x0e-\x1f to determine if a file is binary. This is a very widely used tool, and this check seems reasonable. Thoughts?

@mstruebing mstruebing added help wanted feature a feature which should be implemented labels Jun 20, 2022
@mstruebing
Copy link
Member

Sounds good, I know that this part of this application is actually a weak point so improving it in any way would be great.

@rasa
Copy link
Contributor Author

rasa commented Jun 20, 2022

OK, I will see if I can craft a PR that addresses this. Kinda a pain point for us. Thanks for being open to this.

@mstruebing
Copy link
Member

Thank you very much, a PR would be really great as I do not be able to spend so much time on this tool anymore as it sometimes deserves 👍

rasa added a commit to rasa/editorconfig-checker that referenced this issue Jun 26, 2022
rasa added a commit to rasa/editorconfig-checker that referenced this issue Jun 26, 2022
rasa added a commit to rasa/editorconfig-checker that referenced this issue Jun 26, 2022
rasa added a commit to rasa/editorconfig-checker that referenced this issue Jun 26, 2022
rasa added a commit to rasa/editorconfig-checker that referenced this issue Jun 26, 2022
rasa added a commit to rasa/editorconfig-checker that referenced this issue Jun 27, 2022
rasa added a commit to rasa/editorconfig-checker that referenced this issue Jun 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature which should be implemented help wanted
Projects
None yet
Development

No branches or pull requests

2 participants