You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The reason will be displayed to describe this comment to others. Learn more.
Removing these samples, effectively removes the support for disambiguate .pl files, resulting in all Prolog code being misclassified as Perl code. Linguist cannot rely on an extension, .prolog, that no one uses. Nor linguist/GitHub gets to dictate to a programming language community what's their primary extension. Prolog was using .pl as the de facto extension 15 years before Perl appeared. If there are issues disambiguating Prolog code, these need to be solved by improving the disambiguation process, possibly adding more code samples, not by eliminating them.
The reason will be displayed to describe this comment to others. Learn more.
Complains about linguist misclassifying files are (relatively) common. This thread is just an example. A simple solution would be to allow linguist guesses to be overridden by the contents of a file (say, .github_file_extensions_map) at the root of a repository. This file would provide a mapping between file name extensions and programming languages. This way, any repository owners could fix these issues by themselves with linguist providing the default mappings. As a bonus, programming language statistics would become more reliable. I'm no Ruby programmer so I cannot easily contribute such a feature but I would expect it to be simple to implement.
The reason will be displayed to describe this comment to others. Learn more.
My understanding of #777 is that it covers only ignoring only directories or files. A good idea but I think that's orthogonal to my proposal and the two feature should not be mixed. E.g. my logtalk3 repository doesn't contain any Perl code but a number of Prolog files. Simply being able to say that, for this repository, that .pl means Prolog it would allow it to be correctly classified. For that, and for most cases, a file at the root of the repository that would be read by linguist would be enough.
The reason will be displayed to describe this comment to others. Learn more.
@bkeepers , @pmoura , I could imagine a .language.yml file, which contains both contains custom "vendor" directories and a simple mapping from extension to the language name in github/linguist:
e.g.
f6034b8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#435
f6034b8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing these samples, effectively removes the support for disambiguate
.pl
files, resulting in all Prolog code being misclassified as Perl code. Linguist cannot rely on an extension,.prolog
, that no one uses. Nor linguist/GitHub gets to dictate to a programming language community what's their primary extension. Prolog was using.pl
as the de facto extension 15 years before Perl appeared. If there are issues disambiguating Prolog code, these need to be solved by improving the disambiguation process, possibly adding more code samples, not by eliminating them.f6034b8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Put the samples back.
f6034b8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we get a technical explanation about why these samples were removed?
f6034b8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Put the samples back.
f6034b8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping?
f6034b8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nox ping?
f6034b8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@luxe Hopefully someone is receiving a mail when we comment. Hence the ping.
f6034b8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++inbox;
f6034b8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I'm looking into it, but it'll probably be sometime next week before any progress is made.
f6034b8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Complains about linguist misclassifying files are (relatively) common. This thread is just an example. A simple solution would be to allow linguist guesses to be overridden by the contents of a file (say,
.github_file_extensions_map
) at the root of a repository. This file would provide a mapping between file name extensions and programming languages. This way, any repository owners could fix these issues by themselves with linguist providing the default mappings. As a bonus, programming language statistics would become more reliable. I'm no Ruby programmer so I cannot easily contribute such a feature but I would expect it to be simple to implement.f6034b8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, see #777. I am definitely interested in that solution.
f6034b8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of #777 is that it covers only ignoring only directories or files. A good idea but I think that's orthogonal to my proposal and the two feature should not be mixed. E.g. my logtalk3 repository doesn't contain any Perl code but a number of Prolog files. Simply being able to say that, for this repository, that
.pl
means Prolog it would allow it to be correctly classified. For that, and for most cases, a file at the root of the repository that would be read by linguist would be enough.f6034b8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bkeepers , @pmoura , I could imagine a
.language.yml
file, which contains both contains custom "vendor" directories and a simple mapping from extension to the language name ingithub/linguist
:e.g.
PS: Some work has been done in: #1023
f6034b8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, sorry for my hastiness. #777 only describes half of the problem.
f6034b8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't see an explanation about why the samples were removed in the first place.
f6034b8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nox is right, @tnm did not bother to leave an explanation why he removed the Prolog samples. It does some a tad strange IMHO.
f6034b8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put the samples back! I have a 2,700 byte Prolog code and it thinks the whole thing is Perl.
f6034b8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A solution is to add a
.gitattributes
file to root of your repo with contents like: