Delete primary_extension from language data by nox · Pull Request #985 · github-linguist/linguist

nox · 2014-03-13T00:22:01Z

The language attribute is still maintained as the first extension found.

This allows Mercury to be properly detected by Linguist, as per #748.

PaulBone · 2014-03-13T01:12:11Z

Thanks @nox

nox · 2014-03-13T01:13:48Z

Btw, why aren't the pedantic tests parsing the languages index with an actual YAML parser instead of clumsily reading it line by line?

ghost · 2014-03-13T01:41:39Z

"…why aren't the pedantic tests parsing the languages index with an actual YAML parser instead of clumsily reading it line by line."

Because actual software engineering takes time away from twirling lariats, barrel races and steer wrestling.

aJanuary · 2014-03-14T12:01:28Z

You've kept the concept of "one of the file extensions is special", but moved it from being explicit in a separate property to being implicit at the top of the list.

I would suggest keeping the primary_extension property but removing the constraint that it must be unique, or removing the concept of a favoured extension entirely. I'm not familiar enough with the rest of the codebase to suggest which one would be better.

nox · 2014-03-14T12:04:16Z

You've kept the concept of "one of the file extensions is special", but moved it from being explicit in a separate property to being implicit at the top of the list.

Of course I kept it! But the order is irrelevant from Linguist's point of view. This property is special because some closed source code at GitHub depends on it, how could I handle such code without looking at it?

joneshf · 2014-03-14T13:15:40Z

👍

PaulBone · 2014-03-15T01:08:49Z

It sounds like keeping it is a good idea (it may help gist etc) but while
removing the uniqueness requirement.

nox · 2014-03-20T09:38:28Z

Ping?

nox · 2014-03-31T09:44:54Z

Can we have an answer?

Could GitHub get its act together and actually maintain their FOSS projects properly?

nox · 2014-04-16T07:46:59Z

Ping?

bkeepers · 2014-04-16T13:07:20Z

Ping?

64 bytes from github: time=16 days

Could GitHub get its act together and actually maintain their FOSS projects properly?

We're working on getting a team together to be more responsive to this project, but please cut us a little slack. We have a lot of FOSS and not a lot of resources to keep on top of it. We're all just trying to do our best.

This allows Mercury to be properly detected by Linguist, as per #748.

I don't see any tests to show that it fixes this, which makes me a little nervous. It also makes me nervous that it has other adverse effects.

Traditionally, changes to linguist have been really difficult for us to handle, because improvements for detection of one language usually come at the cost of another language. That makes some people really happy and others really upset. So we all need to figure out ways to do the classification even better.

nox · 2014-04-16T13:10:55Z

I don't see any tests to show that it fixes this, which makes me a little nervous. It also makes me nervous that it has other adverse effects.

What? All the tests still run, as shown here by Travis. Among them, there is a test that uses @primary_extension. Please tell me which test to add for you to be satisfied and I will do it.

As for any adverse effects in your own closed code, I can't really do anything about them if I don't get any answer about what they could be.

bkeepers · 2014-04-16T13:28:54Z

Please tell me which test to add for you to be satisfied and I will do it.

Maybe I misunderstood. Does this fix the detection of Mercury?

As for any adverse effects in your own closed code, I can't really do anything about them if I don't get any answer about what they could be.

There's not really any other closed source code to go along with this. We use linguist directly. The problem is just when we change the heuristics, it affects the way repositories are classified. As I said, this makes some people happy and others upset.

So we just want to be very dilligent changes.

nox · 2014-04-16T13:29:38Z

Maybe I misunderstood. Does this fix the detection of Mercury?

Did you actually read the comments in the linked PR?

nox · 2014-04-18T08:03:47Z

I guess you didn't.

sebgod · 2014-04-18T14:49:09Z

Dear Brandon @bkeepers,
you wrote

Traditionally, changes to linguist have been really difficult for us to handle, because improvements for detection of one language usually come at the cost of another language.

Are you sure this is a problem? I mean all the tests are passing in Travis CI, AFAIK all languages are being recognised properly (if the tests are reasonably complete)
In my PR #1049 (based on #748 @PaulBone ) pull request I added enough samples to disambiguate all potential conflicts I could think off.
As I merged the PR with the current upstream/master, the Travis CI is passing, all Obj-C, Moocode, Mercury is properly recognised, I'd propose that you'd first accept my PR if you find it appropriate (the changes are non-intrusive to your detection logic and you mentioned that resources are sparse at the moment), and if all goes well after some testing you merge this PR #985 by @nox once you have finished testing.
I hope then we can finally settle this dispute 👍

bkeepers · 2014-04-18T17:09:12Z

I want to start with clarifying that this is not a dispute at all. I just want to make sure I understand. I like you all and want to make everything better.

Now, to clarify…

Did you actually read the comments in the linked PR?

Yes. What I'm confused about is, does this PR actually fix it for Mercury? Or is it purely to pave the way for a future fix? If it's fixing something, I'd love to see a test changed so we can be sure we don't break it in the future.

nox · 2014-04-18T17:14:33Z

It fixes Linguist's broken design which relies on a primary extension unique per language. The very reason the other PR was rejected more or less as a WONTFIX. What are you confused about in this?

aJanuary · 2014-04-19T09:29:46Z

I'm confused about why you haven't responded to their point about adding a test for the fix.

nox · 2014-04-19T11:40:29Z

@aJanuary

I'm confused about why you haven't responded to their point about adding a test for the fix.

I'm confused as to what you don't understand in the following sentence:

Please tell me which test to add for you to be satisfied and I will do it.

You can also explain to me why you people are calling this a fix when this is a design improvement that happens to make primary extensions' conflicts disappear.

See @arfon's comment on the other PR:

We're working here within the constraints of the implementation of Linguist I'm afraid.

What I remove is a "constraint of the implementation", also known as a "design bug".

nox · 2014-04-19T11:41:39Z

How do I test Mercury's detection when there is no Mercury thing in Linguist yet because there can't be without that branch?

nox · 2014-05-01T13:25:02Z

Ping?

arfon · 2014-05-01T16:39:36Z

Hey @nox, I just wrote a response to @DX-MON over here that I think is relevant for this thread.

While it's possible remove the primary_extension and keep the current test suite passing before we can merge this we would like to assess the impact of this change across a large number of repos. Setting up an environment to assess these changes is going to take us a while longer sorry.

Our focus over the past couple of weeks has been to triage outstanding pull requests that are a straightforward merge (new languages etc), this one is less straightforward I'm afraid.

nox · 2014-05-01T16:41:54Z

While it's possible remove the primary_extension and keep the current test suite passing before we can merge this we would like to assess the impact of this change across a large number of repos. Setting up an environment to assess these changes is going to take us a while longer sorry.

I see you are still not admitting that the classification is already broken and that my patch can't make it worse. Still not admitting that the primary extension lookup does not shortcut anything.

You are afraid to admit that this PR is just as straightforward, that's all.

joneshf · 2014-05-01T16:47:45Z

Our focus over the past couple of weeks has been to triage outstanding pull requests that are a straightforward merge (new languages etc)

As an aside, I greatly appreciate this!

arfon · 2014-05-01T16:48:19Z

Thanks @joneshf 😄

nox · 2014-05-01T16:51:00Z

That patch can't misclassify anything because it doesn't change the list of looked-up languages for any file extension.

nox · 2014-05-01T17:31:58Z

So I've written a small script that outputs the list of candidate languages for all extensions known by Linguist:

require 'linguist'

Linguist::Language.all
    .map {| name| [name.primary_extension].concat(name.extensions) }
    .flatten.sort.uniq
        .each do |ext|
            print ext, ' -> ',
                Linguist::Language.find_by_filename("foo#{ext}")
                    .map { |lang| lang.name }.join(', '),
                "\n"
        end

Here are the differences between the two lists with and without primary_extension:

$ diff -u with*
--- with-primary-extension  2014-05-01 19:23:34.000000000 +0200
+++ without-primary-extension   2014-05-01 19:23:44.000000000 +0200
@@ -87,7 +87,7 @@
 .chs -> C2hs Haskell
 .cirru -> Cirru
 .ck -> ChucK
-.cl -> OpenCL, Common Lisp
+.cl -> Common Lisp, OpenCL
 .cl2 -> Clojure
 .clixml -> XML
 .clj -> Clojure
@@ -109,7 +109,7 @@
 .cppobjdump -> Cpp-ObjDump
 .cproject -> XML
 .cpy -> COBOL
-.cr -> Crystal, Cirru
+.cr -> Cirru, Crystal
 .creole -> Creole
 .cs -> C#
 .csh -> Tcsh
@@ -301,7 +301,7 @@
 .ltx -> TeX
 .lua -> Lua
 .ly -> LilyPond
-.m -> Objective-C, M, Matlab
+.m -> M, Matlab, Objective-C
 .mak -> Makefile
 .mako -> Mako
 .man -> Groff
@@ -398,7 +398,7 @@
 .pm6 -> Perl6
 .pmod -> Pike
 .po -> Gettext Catalog
-.pod -> Pod, Perl
+.pod -> Perl, Pod
 .podsl -> Common Lisp
 .podspec -> Ruby
 .pogo -> PogoScript
@@ -508,7 +508,7 @@
 .sublime_metrics -> JSON
 .sublime_session -> JSON
 .svg -> XML
-.t -> Turing, Perl
+.t -> Perl, Turing
 .tcc -> C++
 .tcl -> Tcl
 .tcsh -> Tcsh
@@ -536,7 +536,7 @@
 .ui -> XML
 .upc -> Unified Parallel C
 .urdf -> XML
-.v -> Verilog, Coq
+.v -> Coq, Verilog
 .vala -> Vala
 .vapi -> Vala
 .vark -> Gosu

I don't think checking these corner cases is such a difficult job you claim it to be. And if such order differences do indeed change the classification of some projects, it just points out the existence of another design fault.

The language attribute is still maintained as the first extension found. This allows Mercury to be properly detected by Linguist, as per github-linguist#748.

nox · 2014-05-01T19:42:24Z

@sebgod Rebased it as you requested. Weird that your comment disappeared.

All tests pass, all those which pass currently on master that is.

$ git ls-files samples/{Mercury,M,Objective-C,Mathematica} | xargs -n 1 ruby -I lib bin/linguist

sebgod · 2014-05-01T19:44:07Z

@nox, my comment appeared twice, so I deleted one, seems like that deleted both. Thanks a lot :)

nox mentioned this pull request Mar 14, 2014

Add the Mercury language to linguist #748

Merged

nox mentioned this pull request Mar 18, 2014

MATLAB uses .m files not .matlab. #989

Merged

bkeepers added the Triage label Apr 21, 2014

sebgod mentioned this pull request Apr 24, 2014

Mercury noconflict #1049

Merged

Delete primary_extension from language data

2d82071

The language attribute is still maintained as the first extension found. This allows Mercury to be properly detected by Linguist, as per github-linguist#748.

arfon merged commit 2d82071 into github-linguist:master May 5, 2014

bkeepers mentioned this pull request May 16, 2014

Groovy: Add .gvy, .grt, and .gtpl to the list of extensions #1188

Merged

nox deleted the rm-primary_extension branch March 24, 2016 14:03

PaulBone mentioned this pull request Apr 4, 2021

Plasma file extension clashes with P Language PlasmaLang/plasma#351

Closed

github-linguist locked as resolved and limited conversation to collaborators Jun 18, 2024

Conversation

nox commented Mar 13, 2014

Uh oh!

PaulBone commented Mar 13, 2014

Uh oh!

nox commented Mar 13, 2014

Uh oh!

ghost commented Mar 13, 2014

Uh oh!

aJanuary commented Mar 14, 2014

Uh oh!

nox commented Mar 14, 2014

Uh oh!

joneshf commented Mar 14, 2014

Uh oh!

PaulBone commented Mar 15, 2014

Uh oh!

nox commented Mar 20, 2014

Uh oh!

nox commented Mar 31, 2014

Uh oh!

nox commented Apr 16, 2014

Uh oh!

bkeepers commented Apr 16, 2014

Uh oh!

nox commented Apr 16, 2014

Uh oh!

bkeepers commented Apr 16, 2014

Uh oh!

nox commented Apr 16, 2014

Uh oh!

nox commented Apr 18, 2014

Uh oh!

sebgod commented Apr 18, 2014

Uh oh!

bkeepers commented Apr 18, 2014

Uh oh!

nox commented Apr 18, 2014

Uh oh!

aJanuary commented Apr 19, 2014

Uh oh!

nox commented Apr 19, 2014

Uh oh!

nox commented Apr 19, 2014

Uh oh!

nox commented May 1, 2014

Uh oh!

arfon commented May 1, 2014

Uh oh!

nox commented May 1, 2014

Uh oh!

joneshf commented May 1, 2014

Uh oh!

arfon commented May 1, 2014

Uh oh!

nox commented May 1, 2014

Uh oh!

nox commented May 1, 2014

Uh oh!

nox commented May 1, 2014

Uh oh!

sebgod commented May 1, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants