New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A lot of warnings for mo files. #11
Comments
Technically pygments identifies >>> import pygments.lexers
>>> lexer = pygments.lexers.get_lexer_for_filename('some.mo')
>>> lexer.name
'Modelica' Automatically excluding binary files from analysis in theory makes sense. However, detecting binaries is non trivial. The most common approach seems to be checking for 0 bytes as used by Subversion and gitattributes. The actual code from git is: #define FIRST_FEW_BYTES 8000
int buffer_is_binary(const char *ptr, unsigned long size)
{
if (FIRST_FEW_BYTES < size)
size = FIRST_FEW_BYTES;
return !!memchr(ptr, 0, size);
} In case of pygments it would make sense to treat files with headers for UTF-16 and UTF-32 as text despite plenty of 0 bytes in it. |
Nice analysis! Looks like diff from diffutils 3.3 does the same:
So diff tells they're binary files:
Looks like your code already sniffs the BOM so just skipping files with zeros after BOM detection should be enough? |
Added detection of binary files and excluded them from the analysis. In particular Django model objects (``*.mo``) are not considered Modelica source code anymore.
I implemented the proposed solution with v0.9 (check for BOM first, then for zero bytes within the initial 8K). It's already available from PyPI so you can give it a try. |
I think that mo files should just be ignored as they are not plain text, while I'm getting loads of:
The text was updated successfully, but these errors were encountered: