Skip to content

Delete primary_extension from language data#985

Merged
arfon merged 1 commit intogithub-linguist:masterfrom
nox:rm-primary_extension
May 5, 2014
Merged

Delete primary_extension from language data#985
arfon merged 1 commit intogithub-linguist:masterfrom
nox:rm-primary_extension

Conversation

@nox
Copy link
Contributor

@nox nox commented Mar 13, 2014

The language attribute is still maintained as the first extension found.

This allows Mercury to be properly detected by Linguist, as per #748.

@PaulBone
Copy link
Contributor

Thanks @nox

@nox
Copy link
Contributor Author

nox commented Mar 13, 2014

Btw, why aren't the pedantic tests parsing the languages index with an actual YAML parser instead of clumsily reading it line by line?

@ghost
Copy link

ghost commented Mar 13, 2014

"…why aren't the pedantic tests parsing the languages index with an actual YAML parser instead of clumsily reading it line by line."

Because actual software engineering takes time away from twirling lariats, barrel races and steer wrestling.

@aJanuary
Copy link

You've kept the concept of "one of the file extensions is special", but moved it from being explicit in a separate property to being implicit at the top of the list.

I would suggest keeping the primary_extension property but removing the constraint that it must be unique, or removing the concept of a favoured extension entirely. I'm not familiar enough with the rest of the codebase to suggest which one would be better.

@nox
Copy link
Contributor Author

nox commented Mar 14, 2014

You've kept the concept of "one of the file extensions is special", but moved it from being explicit in a separate property to being implicit at the top of the list.

Of course I kept it! But the order is irrelevant from Linguist's point of view. This property is special because some closed source code at GitHub depends on it, how could I handle such code without looking at it?

@joneshf
Copy link

joneshf commented Mar 14, 2014

👍

@PaulBone
Copy link
Contributor

It sounds like keeping it is a good idea (it may help gist etc) but while
removing the uniqueness requirement.

@nox
Copy link
Contributor Author

nox commented Mar 20, 2014

Ping?

@nox
Copy link
Contributor Author

nox commented Mar 31, 2014

Can we have an answer?

Could GitHub get its act together and actually maintain their FOSS projects properly?

@nox
Copy link
Contributor Author

nox commented Apr 16, 2014

Ping?

@bkeepers
Copy link
Contributor

Ping?

64 bytes from github: time=16 days

Could GitHub get its act together and actually maintain their FOSS projects properly?

We're working on getting a team together to be more responsive to this project, but please cut us a little slack. We have a lot of FOSS and not a lot of resources to keep on top of it. We're all just trying to do our best.

This allows Mercury to be properly detected by Linguist, as per #748.

I don't see any tests to show that it fixes this, which makes me a little nervous. It also makes me nervous that it has other adverse effects.

Traditionally, changes to linguist have been really difficult for us to handle, because improvements for detection of one language usually come at the cost of another language. That makes some people really happy and others really upset. So we all need to figure out ways to do the classification even better.

@nox
Copy link
Contributor Author

nox commented Apr 16, 2014

I don't see any tests to show that it fixes this, which makes me a little nervous. It also makes me nervous that it has other adverse effects.

What? All the tests still run, as shown here by Travis. Among them, there is a test that uses @primary_extension. Please tell me which test to add for you to be satisfied and I will do it.

As for any adverse effects in your own closed code, I can't really do anything about them if I don't get any answer about what they could be.

@bkeepers
Copy link
Contributor

Please tell me which test to add for you to be satisfied and I will do it.

Maybe I misunderstood. Does this fix the detection of Mercury?

As for any adverse effects in your own closed code, I can't really do anything about them if I don't get any answer about what they could be.

There's not really any other closed source code to go along with this. We use linguist directly. The problem is just when we change the heuristics, it affects the way repositories are classified. As I said, this makes some people happy and others upset.

So we just want to be very dilligent changes.

@nox
Copy link
Contributor Author

nox commented Apr 16, 2014

Maybe I misunderstood. Does this fix the detection of Mercury?

Did you actually read the comments in the linked PR?

@nox
Copy link
Contributor Author

nox commented Apr 18, 2014

I guess you didn't.

@sebgod
Copy link
Contributor

sebgod commented Apr 18, 2014

Dear Brandon @bkeepers,
you wrote

Traditionally, changes to linguist have been really difficult for us to handle, because improvements for detection of one language usually come at the cost of another language.

Are you sure this is a problem? I mean all the tests are passing in Travis CI, AFAIK all languages are being recognised properly (if the tests are reasonably complete)
In my PR #1049 (based on #748 @PaulBone ) pull request I added enough samples to disambiguate all potential conflicts I could think off.
As I merged the PR with the current upstream/master, the Travis CI is passing, all Obj-C, Moocode, Mercury is properly recognised, I'd propose that you'd first accept my PR if you find it appropriate (the changes are non-intrusive to your detection logic and you mentioned that resources are sparse at the moment), and if all goes well after some testing you merge this PR #985 by @nox once you have finished testing.
I hope then we can finally settle this dispute 👍

@bkeepers
Copy link
Contributor

I want to start with clarifying that this is not a dispute at all. I just want to make sure I understand. I like you all and want to make everything better.

Now, to clarify…

Did you actually read the comments in the linked PR?

Yes. What I'm confused about is, does this PR actually fix it for Mercury? Or is it purely to pave the way for a future fix? If it's fixing something, I'd love to see a test changed so we can be sure we don't break it in the future.

@nox
Copy link
Contributor Author

nox commented Apr 18, 2014

It fixes Linguist's broken design which relies on a primary extension unique per language. The very reason the other PR was rejected more or less as a WONTFIX. What are you confused about in this?

@aJanuary
Copy link

I'm confused about why you haven't responded to their point about adding a test for the fix.

@nox
Copy link
Contributor Author

nox commented Apr 19, 2014

@aJanuary

I'm confused about why you haven't responded to their point about adding a test for the fix.

I'm confused as to what you don't understand in the following sentence:

Please tell me which test to add for you to be satisfied and I will do it.

You can also explain to me why you people are calling this a fix when this is a design improvement that happens to make primary extensions' conflicts disappear.

See @arfon's comment on the other PR:

We're working here within the constraints of the implementation of Linguist I'm afraid.

What I remove is a "constraint of the implementation", also known as a "design bug".

@nox
Copy link
Contributor Author

nox commented Apr 19, 2014

How do I test Mercury's detection when there is no Mercury thing in Linguist yet because there can't be without that branch?

@sebgod sebgod mentioned this pull request Apr 24, 2014
@nox
Copy link
Contributor Author

nox commented May 1, 2014

Ping?

@arfon
Copy link
Contributor

arfon commented May 1, 2014

Hey @nox, I just wrote a response to @DX-MON over here that I think is relevant for this thread.

While it's possible remove the primary_extension and keep the current test suite passing before we can merge this we would like to assess the impact of this change across a large number of repos. Setting up an environment to assess these changes is going to take us a while longer sorry.

Our focus over the past couple of weeks has been to triage outstanding pull requests that are a straightforward merge (new languages etc), this one is less straightforward I'm afraid.

@nox
Copy link
Contributor Author

nox commented May 1, 2014

While it's possible remove the primary_extension and keep the current test suite passing before we can merge this we would like to assess the impact of this change across a large number of repos. Setting up an environment to assess these changes is going to take us a while longer sorry.

I see you are still not admitting that the classification is already broken and that my patch can't make it worse. Still not admitting that the primary extension lookup does not shortcut anything.

You are afraid to admit that this PR is just as straightforward, that's all.

@joneshf
Copy link

joneshf commented May 1, 2014

Our focus over the past couple of weeks has been to triage outstanding pull requests that are a straightforward merge (new languages etc)

As an aside, I greatly appreciate this!

@arfon
Copy link
Contributor

arfon commented May 1, 2014

Thanks @joneshf 😄

@nox
Copy link
Contributor Author

nox commented May 1, 2014

That patch can't misclassify anything because it doesn't change the list of looked-up languages for any file extension.

@nox
Copy link
Contributor Author

nox commented May 1, 2014

So I've written a small script that outputs the list of candidate languages for all extensions known by Linguist:

require 'linguist'

Linguist::Language.all
    .map {| name| [name.primary_extension].concat(name.extensions) }
    .flatten.sort.uniq
        .each do |ext|
            print ext, ' -> ',
                Linguist::Language.find_by_filename("foo#{ext}")
                    .map { |lang| lang.name }.join(', '),
                "\n"
        end

Here are the differences between the two lists with and without primary_extension:

$ diff -u with*
--- with-primary-extension  2014-05-01 19:23:34.000000000 +0200
+++ without-primary-extension   2014-05-01 19:23:44.000000000 +0200
@@ -87,7 +87,7 @@
 .chs -> C2hs Haskell
 .cirru -> Cirru
 .ck -> ChucK
-.cl -> OpenCL, Common Lisp
+.cl -> Common Lisp, OpenCL
 .cl2 -> Clojure
 .clixml -> XML
 .clj -> Clojure
@@ -109,7 +109,7 @@
 .cppobjdump -> Cpp-ObjDump
 .cproject -> XML
 .cpy -> COBOL
-.cr -> Crystal, Cirru
+.cr -> Cirru, Crystal
 .creole -> Creole
 .cs -> C#
 .csh -> Tcsh
@@ -301,7 +301,7 @@
 .ltx -> TeX
 .lua -> Lua
 .ly -> LilyPond
-.m -> Objective-C, M, Matlab
+.m -> M, Matlab, Objective-C
 .mak -> Makefile
 .mako -> Mako
 .man -> Groff
@@ -398,7 +398,7 @@
 .pm6 -> Perl6
 .pmod -> Pike
 .po -> Gettext Catalog
-.pod -> Pod, Perl
+.pod -> Perl, Pod
 .podsl -> Common Lisp
 .podspec -> Ruby
 .pogo -> PogoScript
@@ -508,7 +508,7 @@
 .sublime_metrics -> JSON
 .sublime_session -> JSON
 .svg -> XML
-.t -> Turing, Perl
+.t -> Perl, Turing
 .tcc -> C++
 .tcl -> Tcl
 .tcsh -> Tcsh
@@ -536,7 +536,7 @@
 .ui -> XML
 .upc -> Unified Parallel C
 .urdf -> XML
-.v -> Verilog, Coq
+.v -> Coq, Verilog
 .vala -> Vala
 .vapi -> Vala
 .vark -> Gosu

I don't think checking these corner cases is such a difficult job you claim it to be. And if such order differences do indeed change the classification of some projects, it just points out the existence of another design fault.

The language attribute is still maintained as the first extension found.

This allows Mercury to be properly detected by Linguist, as per github-linguist#748.
@nox
Copy link
Contributor Author

nox commented May 1, 2014

@sebgod Rebased it as you requested. Weird that your comment disappeared.

All tests pass, all those which pass currently on master that is.

$ git ls-files samples/{Mercury,M,Objective-C,Mathematica} | xargs -n 1 ruby -I lib bin/linguist

@sebgod
Copy link
Contributor

sebgod commented May 1, 2014

@nox, my comment appeared twice, so I deleted one, seems like that deleted both. Thanks a lot :)

@arfon arfon merged commit 2d82071 into github-linguist:master May 5, 2014
@nox nox deleted the rm-primary_extension branch March 24, 2016 14:03
@github-linguist github-linguist locked as resolved and limited conversation to collaborators Jun 18, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants