Delete primary_extension from language data#985
Conversation
|
Thanks @nox |
|
Btw, why aren't the pedantic tests parsing the languages index with an actual YAML parser instead of clumsily reading it line by line? |
|
"…why aren't the pedantic tests parsing the languages index with an actual YAML parser instead of clumsily reading it line by line." Because actual software engineering takes time away from twirling lariats, barrel races and steer wrestling. |
|
You've kept the concept of "one of the file extensions is special", but moved it from being explicit in a separate property to being implicit at the top of the list. I would suggest keeping the primary_extension property but removing the constraint that it must be unique, or removing the concept of a favoured extension entirely. I'm not familiar enough with the rest of the codebase to suggest which one would be better. |
Of course I kept it! But the order is irrelevant from Linguist's point of view. This property is special because some closed source code at GitHub depends on it, how could I handle such code without looking at it? |
|
👍 |
|
It sounds like keeping it is a good idea (it may help gist etc) but while |
|
Ping? |
|
Can we have an answer? Could GitHub get its act together and actually maintain their FOSS projects properly? |
|
Ping? |
64 bytes from github: time=16 days
We're working on getting a team together to be more responsive to this project, but please cut us a little slack. We have a lot of FOSS and not a lot of resources to keep on top of it. We're all just trying to do our best.
I don't see any tests to show that it fixes this, which makes me a little nervous. It also makes me nervous that it has other adverse effects. Traditionally, changes to linguist have been really difficult for us to handle, because improvements for detection of one language usually come at the cost of another language. That makes some people really happy and others really upset. So we all need to figure out ways to do the classification even better. |
What? All the tests still run, as shown here by Travis. Among them, there is a test that uses As for any adverse effects in your own closed code, I can't really do anything about them if I don't get any answer about what they could be. |
Maybe I misunderstood. Does this fix the detection of Mercury?
There's not really any other closed source code to go along with this. We use linguist directly. The problem is just when we change the heuristics, it affects the way repositories are classified. As I said, this makes some people happy and others upset. So we just want to be very dilligent changes. |
Did you actually read the comments in the linked PR? |
|
I guess you didn't. |
|
Dear Brandon @bkeepers,
Are you sure this is a problem? I mean all the tests are |
|
I want to start with clarifying that this is not a dispute at all. I just want to make sure I understand. I like you all and want to make everything better. Now, to clarify…
Yes. What I'm confused about is, does this PR actually fix it for Mercury? Or is it purely to pave the way for a future fix? If it's fixing something, I'd love to see a test changed so we can be sure we don't break it in the future. |
|
It fixes Linguist's broken design which relies on a primary extension unique per language. The very reason the other PR was rejected more or less as a WONTFIX. What are you confused about in this? |
|
I'm confused about why you haven't responded to their point about adding a test for the fix. |
I'm confused as to what you don't understand in the following sentence:
You can also explain to me why you people are calling this a fix when this is a design improvement that happens to make primary extensions' conflicts disappear. See @arfon's comment on the other PR:
What I remove is a "constraint of the implementation", also known as a "design bug". |
|
How do I test Mercury's detection when there is no Mercury thing in Linguist yet because there can't be without that branch? |
|
Ping? |
|
Hey @nox, I just wrote a response to @DX-MON over here that I think is relevant for this thread. While it's possible remove the Our focus over the past couple of weeks has been to triage outstanding pull requests that are a straightforward merge (new languages etc), this one is less straightforward I'm afraid. |
I see you are still not admitting that the classification is already broken and that my patch can't make it worse. Still not admitting that the primary extension lookup does not shortcut anything. You are afraid to admit that this PR is just as straightforward, that's all. |
As an aside, I greatly appreciate this! |
|
Thanks @joneshf 😄 |
|
That patch can't misclassify anything because it doesn't change the list of looked-up languages for any file extension. |
|
So I've written a small script that outputs the list of candidate languages for all extensions known by Linguist: require 'linguist'
Linguist::Language.all
.map {| name| [name.primary_extension].concat(name.extensions) }
.flatten.sort.uniq
.each do |ext|
print ext, ' -> ',
Linguist::Language.find_by_filename("foo#{ext}")
.map { |lang| lang.name }.join(', '),
"\n"
endHere are the differences between the two lists with and without I don't think checking these corner cases is such a difficult job you claim it to be. And if such order differences do indeed change the classification of some projects, it just points out the existence of another design fault. |
The language attribute is still maintained as the first extension found. This allows Mercury to be properly detected by Linguist, as per github-linguist#748.
|
@sebgod Rebased it as you requested. Weird that your comment disappeared. All tests pass, all those which pass currently on master that is. |
|
@nox, my comment appeared twice, so I deleted one, seems like that deleted both. Thanks a lot :) |
The language attribute is still maintained as the first extension found.
This allows Mercury to be properly detected by Linguist, as per #748.