Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the Mercury language to linguist #748

Merged
merged 1 commit into from Apr 21, 2014
Merged

Add the Mercury language to linguist #748

merged 1 commit into from Apr 21, 2014

Conversation

PaulBone
Copy link
Contributor

Mercury is a logical/functional language. It was first developed at The University of Melbourne and is now almost 20 years old. Projects on github include:

https://github.com/Mercury-Language/mercury
https://github.com/PaulBone/protobuf-mercury
https://github.com/PaulBone/pbone_thesis
https://github.com/wangp/bower
https://github.com/juliensf/mercury-csv
https://github.com/juliensf/mercury-misc
https://github.com/juliensf/mercury-json
https://github.com/petdr/venus

Thanks.

lib/linguist/languages.yml:
Add the declaration for the language.

samples/Mercury:
Add samples for the classifier as Mercury shares it's filename extension
with several other languages.

lib/linguist/languages.yml:
    Add the declaration for the language.

samples/Mercury:
    Add samples for the classifier as Mercury shares it's filename extension
    with several other languages.
@PaulBone
Copy link
Contributor Author

PaulBone commented Dec 6, 2013

bump

1 similar comment
@PaulBone
Copy link
Contributor Author

bump

type: programming
# This is the background colour on the web page.
color: "#abcdef"
primary_extension: .m
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be problematic as it clashes with the Objective C primary_extension. Is there anything else we could use here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that the purpose of including sample code? So that the code
classifier has a reasonable chance of guessing which language some code
belongs to? As it stands at the moment github thinks that all the Mercury
repositories are Objective C.

Mercury has existed since late 1993, over 20 years. There has been little
real confusion over the .m file extension in that time. I'm not saying that
Objective C should change its file extension (it has existed since 1983).
But that changing the file extension for a language so that it's convenient
for github is not the right way to solve this problem. I'm also confident
that linguist handles other conflicting file extensions without any
problems: I'm sure this is not the first conflict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addendum: We've never used any other file extension other than .m So
there's no 'alternative' that would make sense within linguist. If this is
a technical limitation within linguist then I belive that is where the
problem is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is to a point but the issue we're going to run into here is with the fact that https://github.com/github/linguist/blob/master/lib/linguist/language.rb#L83-L85 is going to raise an error.

We're working here within the constraints of the implementation of Linguist I'm afraid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. It sounds like that bug blocks pulling this patch. I don't mind
leaving the pull request open in the meantime.

@ghost
Copy link

ghost commented Mar 12, 2014

When the people who wrote Linguist ended a day of cattle herding, got down off their horses, set aside their six-guns, and started coding did they forget that there are a lot of programming languages and that extension clashes are not only inevitable, they're commonplace?

How does Linguist handle Prolog (.pl) and Perl (.pl)? Or the various flavours of assembler (.asm)? Or any number of other possible clashes? Is it "first hipster language past the post wins"?

@arfon
Copy link
Contributor

arfon commented Mar 12, 2014

@PaulBone requiring a unique primary_extension isn't really a 'bug', rather it's a consequence of how language detection works in Linguist.

There are two places where file extensions are defined, primary_extension and the extensions array (see example for Lisp here). Linguist builds a lookup hash for all of these primary and secondary extensions at run time.

If you take a look at the detect method here Linguist first tries to identify the language by its filename (see find_by_filename). If the extension of the filename is unique (both in the primary_extension and the extensions definitions for all of the known languages) then that language is returned. This is essentially shortcutting the other classifier options and means that languages such as Assembly can be identified efficiently by their unique file extension (.asm).

If more than one possible language is identified by the find_by_filename method then there are a collection of other methods available for language detection including:

This is a very long-winded way of saying that if you'd like Mercury language detection on GitHub then with the current implementation of Linguist you need to pick a different (unique as Objective-C already defines this) primary_extension and add .m to the extensions array which will force Linguist into using the other detection methods mentioned above.

Cheers
Arfon

@ghost
Copy link

ghost commented Mar 12, 2014

I would submit that requiring a unique primary_extension is in itself a bug in the whole approach taken by the Linguist coders. Clashes in file extensions go back to ... oh ... probably a couple of decades before Github's founders were even born.

@PaulBone: I recommend using a primary_extension of .not_a_linguist_bug_but_rather_a_feature and then do the .m thing in extensions so that the other detection methods can come into play.

@pchaigno
Copy link
Contributor

If there is no other extension for Mercury shouldn't he add an exception to the primary_extension test rather than adding a bogus extension?

@PaulBone In the meantime, I think you can add some heuristics to differentiate Objective-C and Mercury files. Whatever the outcome of this discussion, you will always need one. You could take the Prolog heuristic as an example as its syntax is close to that of Mercury.

@arfon I find your explanation on how linguist works much more complete and clear than the one in the README. Maybe it should replace it?

@nox
Copy link
Contributor

nox commented Mar 12, 2014

If you take a look at the detect method here Linguist first tries to identify the language by its filename (see find_by_filename). If the extension of the filename is unique (both in the primary_extension and the extensions definitions for all of the known languages) then that language is returned. This is essentially shortcutting the other classifier options and means that languages such as Assembly can be identified efficiently by their unique file extension (.asm).

Given that primary_extension and extensions are mixed together when looking up in find_by_filename, what is the point of having a special primary extension at all?

@nox
Copy link
Contributor

nox commented Mar 12, 2014

Is the improvement of Linguist hampered by its ties to your own closed source code which we don't have access to, as mentioned here and there?

@nox
Copy link
Contributor

nox commented Mar 12, 2014

Isn't that weird comment related to that Linguist design bug?

  primary_extension: .mustache # TODO: This is incorrect

@nox
Copy link
Contributor

nox commented Mar 12, 2014

It should be noted that an invariant doesn't seem tested:

INI:
  type: data
  extensions:
  - .ini # <-
  - .prefs
  - .properties
  primary_extension: .ini # <-
Literate Agda:
  type: programming
  group: Agda
  primary_extension: .lagda # <-
  extensions:
    - .lagda # <-
Makefile:
  aliases:
  - make
  extensions:
  - .mak # <-
  - .mk
  primary_extension: .mak # <-
Puppet:
  type: programming
  color: "#cc5555"
  primary_extension: .pp # <-
  extensions:
  - .pp # <-

@whitten
Copy link
Contributor

whitten commented Mar 14, 2014

Just weighing in to the conversation, the M or MUMPS language has an implementation (GT.M) that has used .m for an extension since at least the early 1980's. So there have been several languages that have used .m as an extension in different parts of the computer field for a very long time. That doesn't mean one language or the other has exclusive control over the extension.

@lewellyn
Copy link

(tl;dr of the below: wtf github)

I'd like to weigh in that Limbo has used the .m extension since the mid 1990's. While it's unlikely that there are large amounts of Limbo code on Github, there's really no way to know with how terrible the source code language detection and searching is in general on the service (and I understand that Linguist is at fault, and that this issue is a step toward fixing it). 😦 Amusingly, I know of many Limbo projects on Google Code and Bitbucket, though.

For those reading this (needlessly) way-too-long PR and who are unfamiliar with Limbo, Limbo was written for the Inferno operating system which was designed by the same people we know and love from UNIX and C lore and seems to have heavily inspired Go (which is currently extremely fashionable). In fact, the document describing the language was written by no other than the esteemed Dennis M. Ritchie (may he rest in peace). http://www.vitanuova.com/inferno/papers/limbo.html

The fact this PR has been open for 5 months (for something with such a deceptively trivial title), with such facepalm inducing comments as it's managed to garner, is a testament to just how much Github focuses on the trendy languages, ignoring the fact that there are dozens of languages that are in wide use which they marginalize because the cool kids haven't ever even heard of them. Telling people to choose a different extension for code that has been around far longer than Github itself, just because a popular language also uses it, is pure hubris.

I hate ranting in comments, but really, I find the position against this PR totally unpalatable and an example of the inherent flaws in current software development. Marginalizing things you don't know is traditionally a sign that you are too closed-minded to even use what you do know effectively. I would hope that a site like Github would at least make a small attempt to encourage people to expand their horizons and accommodate programmers of all sorts. Instead, it appears that it's no better than the rest of the "cool kids", and this saddens me a bit.

Basically, Github needs to be accepting of programmers of all stripes, or they are destined to be irrelevant (or at least doing lots of scrambling) once the trendy kids move on from the trendy things they're doing and the currently-popular languages start falling out of style with a reversion to a previous status quo. Github needs to accept that there is a vast wealth of code out there which predates it and which will easily postdate it. Not just accept it, embrace it. Telling people with code that's 20 years old that "you need to pick a different [extension]", as @arfon did above, is the exact opposite of what a healthy code repository steward should be doing.

There's an old saying that you can catch more flies with honey than you can with vinegar. I'm hoping that Github isn't going to try to use carrion to see if that's even more effective.

@semiessessi
Copy link

I would submit that requiring a unique primary_extension is in itself a bug in the whole approach taken by the Linguist coders.

The popular tool 'make' uses timestamps to detect file changes which is a 'rookie mistake' of this nature as well - however, its still used in a load of places and constantly recommended. Its the classic example of 'quick hacky crap' winding up in production - unfortunately this is rather common and the reality is that 99.9% of the time its fine for end users.

This is a classic example of a 'technical debt'. The current implementation creates the illusion of a more robust or finished product than is actually present. This is often fine for the end users, but creates problems for engineers when maintaining or improving things as well as for users with fringe requirements.

Its easy to criticise these things with hindsight - but solving them after the fact can be tricky because to truly be successful you need to 'repay' that technical debt without degrading functionality by creating a solution which is less practical than the original hack.

This means replacing the faulty logic... Relying on file extension should probably be at the bottom of the pile of detection mechanisms... the optimisation argument is great, but thoroughly invalid imo because working correctly is considerably more important than being fast. Fast and wrong is not much help... however if the whole thing becomes unusably slow as a result then other optimisations will need to be found etc...

@nox
Copy link
Contributor

nox commented Mar 14, 2014

Its easy to criticise these things with hindsight - but solving them after the fact can be tricky because to truly be successful you need to 'repay' that technical debt without degrading functionality by creating a solution which is less practical than the original hack.

What do you mean? #985 fixes this easily.

@bfontaine
Copy link
Contributor

@lewellyn you should make a PR to this repo if you want it to be fixed.

@nox
Copy link
Contributor

nox commented Mar 14, 2014

@lewellyn you should make a PR to this repo if you want it to be fixed.

Read previous comments, I did a PR.

@lewellyn
Copy link

@lewellyn you should make a PR to this repo if you want it to be fixed.

Read previous comments, I did a PR.

Indeed, there's this one, #985, and #989 all interrelated. Adding another for the same file extension is putting the cart before the horse and would only serve to confuse the actual issue: they need to fix linguist. Again, it's very telling that their competitors make it a lot easier to find projects by language.

@lgarron
Copy link

lgarron commented Mar 14, 2014

While we're at discussing conflicting languages for the .m extension: Mathematica has used .m for package files since its first release in 1988.
Practically all the .m files in projects I have created are Mathematica files, which are also not detected correctly (they are actually detected as... MATLAB files).

Reference: <<CombinatorialFunctions.m in the Mathematica 1 documentation for packages.

@PaulBone
Copy link
Contributor Author

I'd be happy to use @nox's contribution PR #985 to solve this problem. I'd test it myself but I don't currently have a ruby environment.

@PaulBone
Copy link
Contributor Author

Also, we seem to be beating this dead horse quite a lot, and many people are making good points. However, I'd like to point out that the problem more than 26 times larger than we've discussed so far, there are 25 other letters in the alphabet that may conflict and other longer file name extensions that may also conflict. Thanks.

@lewellyn
Copy link

However, I'd like to point out that the problem more than 26 times larger than we've discussed so far, there are 25 other letters in the alphabet that may conflict and other longer file name extensions that may also conflict. Thanks.

That was the elephant in the room I was trying to motion in the direction of, without explicitly pointing at it, yes. Even two and three letter extensions have a high collision rate. For example, .rb is everyone's beloved Ruby. But it's also the primary extension for RealBasic programs (though some people use things like .rbp and .rbbas now). Again, it's not the world's most popular language, but it's very easy to find programs written in them on competing code hosting; they don't even get classified as having a language here.

This issue needs to be fixed so that Github can become better than it is now. It should also help their analytics if it becomes possible for people to easily submit PRs which extend support to languages which are currently unrecognized. And better analytics due to correct code helps everyone.

@PaulBone
Copy link
Contributor Author

it shouldn't even be about popularity. This shouldn't even be part of the question. We still have a problem if two arguably equally-popular languages share the same file extension: the problem of conflicting extensions still exists.

@ghost
Copy link

ghost commented Mar 14, 2014

Let's not forget the bewildering variety of syntaxes that would be tagged with .asm there guys...

@joneshf
Copy link

joneshf commented Mar 14, 2014

tl;dr; .m is already tainted, so there's no valid reason to not merge this.

I'm not sure what's stopping this PR from being merged. MUMPS has been doing this with .m since 44638e1 and it hasn't seemed to have caused an uproar.

According to the algorithm outlined by @arfon, linquist should be handling this alright. If it's not, it's broken and needs to be fixed. If that means more samples, then get more samples. If that means actually fixing linguist, then fix linguist, but merging this doesn't seem to be related to whether or not obj-c has a certain file extension. It's apparently not unique, as shown above.

@PaulBone: I recommend using a primary_extension of .not_a_linguist_bug_but_rather_a_feature and then do the .m thing in extensions so that the other detection methods can come into play.

If nothing else changes, this might be the best you can do.

@lewellyn
Copy link

@PaulBone: I recommend using a primary_extension of .not_a_linguist_bug_but_rather_a_feature and then do the .m thing in extensions so that the other detection methods can come into play.

If nothing else changes, this might be the best you can do.

How does this affect gists, out of curiosity? If it means that all Mercury gists get the extension of .not_a_linguist_bug_but_rather_a_feature, I'm all for it. But at the same time, I know that means this will simply be revisited... But as developer support issues instead of a pull request.

That may not be the most cost effective solution for Github.

@PaulBone
Copy link
Contributor Author

Thanks @arfon, I hope that #985 is helpful or at least a good starting
point. Let me know if I can help with anything, keeping in mind that I
don't actually know ruby.

Thanks.

@anaisbetts
Copy link
Contributor

You have to remember, this code is run for literally every file of every repo on GitHub, for every push (i.e. group of commits). While there are certainly a ton of ways to make Linguist more correct, many of them would be pretty damaging to site performance.

@nox
Copy link
Contributor

nox commented Mar 15, 2014

Given that the primary_extension lookup doesn't shortcut the other ones, what's your point @paulcbetts?

@nox
Copy link
Contributor

nox commented Mar 15, 2014

And why is Linguist used on push by the way? That sounds like a design issue. Or do you mean to determine the repos' languages? If that is done synchronously, that explains why pushing is so slow, and sounds like something else to fix.

@bartosz-witkowski
Copy link

While there are certainly a ton of ways to make Linguist more correct, many of them would be pretty damaging to site performance.

But is giving bad results faster the ultimate goal? "return ruby" is probably fastest but isn't considered as an alternative - yet the status quo is acceptable?

As a user I can't stress enough how irritating misclassification is - because currently it's impossible to upload a mercury file mark it as a text file by hand - it will be classified as an Objective C file and nothing can be done about that.

Classifying by primary extension is broken by design - this isn't "Worse Is Better" - because the method *_doesn'_t* work most of the time.

If Linguist cannot be correct it would be nice if at least it wouldn't be the ultimate authority - more complicated classification scheme could be requested by some request proper classification feature or at least entered by hand.

@nox
Copy link
Contributor

nox commented Mar 15, 2014

Before:

      langs = [@primary_extension_index[extname]] +
              @filename_index[basename] +
              @extension_index[extname]

After:

      langs = @filename_index[basename] +
              @extension_index[extname]

Leaving alone that a wrong result can be obtained infinitely faster, the whole performance argument is as irrelevant here as processed cheese slices are to a Roquefort cheesemaker.

@semiessessi
Copy link

@nox: in response to

What do you mean? #985 fixes this easily.

I am in no way against the fix suggested in #985 - assuming that it fixes the problem (I can't be bothered to dig into the associated code or test anything - the change looks fairly passive and I can't understand how it is a fix without more work I don't want to do... so I take your word). What it obviously doesn't do though is solve the problem of using file extensions at all, which is extremely well known to be bad practice for this sort of thing.

Mainly I commented because I do not enjoy the "these guys are hipsters living in the clouds" comments... even if I wholeheartedly agree that they probably are, it has no real benefit.

What I advocate is acknowledging the technical debt here and that any "true" fix should have its behaviour weighed against the efficacy of the existing hack, especially regarding performance. #985 is not a "true" fix of this nature at all because it doesn't address the underlying issue - only the undesirable behaviour.

It is of course worth considering that the core issue might never really need fixing if enough quick and easy hacks can make the software usable enough for complaints to go away...

@nox
Copy link
Contributor

nox commented Mar 16, 2014

What efficacy? The primary_extension mechanism does not bring any more performance to the table. Fixing just the uniqueness requirement by removing it would fix all the aforementioned cases of ambiguous language detection, given that someone provides for the cited languages better heuristics than the shameful C++ header disambiguation.

Not relying on file extensions at all is left as an exercice to the Githubber reader.

@nox
Copy link
Contributor

nox commented Mar 16, 2014

Also, I don't see anyone here arguing for the removal of the extensions mechanism altogether, it seems quite reasonable to me to limit the set of languages to actually test beforehand based on the file name.

At the very least, as explained before, its complete removal isn't required for the problem at hand.

Your team seems to agree, as primary_extension is explicitly marked as bothersome and deprecated, while extensions isn't.

@semiessessi
Copy link

@nox I'm obviously failing to communicate very well.

The crux of what I'm saying is "don't allow a regression, and by the way there is a technical debt here which should be identified and treated as such".

This is extremely unimportant.

My original intent was simply to shut up the moaning with a classic example of a similar 'rookie mistake' persisting in very heavily used production code, whilst suggesting the correct way to treat the problem. Namely that a technical debt is identified and treated as such (which might mean never fixing it at all because it is pointless to fix...) and that if it is fixed that some basic checks are done to ensure that the change is not a regression.

I'm sorry to have caused confusion.

@nox
Copy link
Contributor

nox commented Mar 16, 2014

I understand your point, but I also understand that rookie mistakes in such a touted feature from GitHub should be fixed and not dismissed with a handwaving suggestion to rename all files of a 20 years old language.

So if the final decision is to not fix this, they are indeed cowboys.

@PaulBone
Copy link
Contributor Author

@emiessessi, you seem to be saying that:

  1. The primary_extension bug isn't a problem, or isn't a bug. In other
    cases you refer to it as technical debt, which is a type of problem.
    I'm not if you do or don't agree that it's either a bug or a problem.
    This thread has clearly demonstrated that it indeed a (design) bug
    and a problem.

  2. You seem to be saying that it's difficult to fix. Obviously we can't
    read the closed source code at github, but at least from the outside
    it seems as if it's straightforward. If this is indeed difficult to
    fix then please explain why.

    Additionally, just because something is difficult doesn't mean it's
    not the right thing to do. So if this point is true, it's somewhat
    irrelevant, but indeed may affect developer's priorities.

  3. You also seem to be saying that the proposed solution in Delete primary_extension from language data #985 will
    make linguist slower. @nox has shown why it shouldn't, in fact that
    it should be faster. The real test of weather something is faster or
    slower is measuring it. Please benchmark the alternatives before
    making a claim.

  4. When you defend performance you're implying that performance is more
    important than correctness. In general cases this is not true. It
    is also not true In this specific case because using incorrect
    primary extensions breaks the behaviour of things such as gist and
    causes language metrics to be incorrect.

Additionally, I don't agree that make's use of timestamps is a problem at
all. But that's off topic. I mention it because I don't find your
comparison with make convincing, but I understand what you're trying to say.

Thanks.

@semiessessi
Copy link

@PaulBone i am clearly failing to communicate at all. I do not understand how you reach any of those conclusions

1 ) there is a bug reported here that files are misidentified - and #785 is probably a fix for that bug. the related issue which many of the comments are hinting at without being specific or direct, or by being dismissive and derogatory - is that there is an underlying technical debt. this is nothing to do with any implementation detail with primary_extension or using lists of extensions - simply from using a file extension as a way to determine file content.

2 ) I never said it would be difficult to fix. I simply advocated testing a fix by measuring it. This is a necessary requirement for a fix to be a fix at all - its a bit of a tautology. Clearly my ability to use language has failed me because this an extremely fundamental concept.

3 ) Not sure where I said that. I may imply that we should measure if it makes anything slower (rather than e.g. using arguments with reason). The key thing is that degraded functionality should not be allowed as a fix. This is very well established as best practise (regression testing).

4 ) I actually explicitly stated that correctness is extremely important. Performance considerations are pointless if the code doesn't actually work. To quote my first comment:

"the optimisation argument is great, but thoroughly invalid imo because working correctly is considerably more important than being fast."

Additionally, I don't agree that make's use of timestamps is a problem at all.

This is an aside but this comment is a little alarming if I am interpreting it correctly.

Its a measurable problem in production environments. Using source control and working across timezones causes confusion. Aside from that there is a clean theoretical argument that the timestamp is not guaranteed to even be the time of file modification. The one correct solution is to actually detect changes to the file - in practice we compromise and use hashes - unless you were writing make in 'the dark ages' and only had to worry about a very controlled environment then using timestamps was absolutely fine as a shortcut. Problems like this are generally best solved with a balance between the CS perfect theoretical ideals and the software engineer's practical solutions, and often for performance reasons. This is why we use hashes instead of meticulously checking every byte...

File extensions not encoding file content is a specific case of this more general class of problem. Where something is theoretically useless but of real practical value. The only reason to draw attention to this is the various disparaging remarks about linguist development practice...

Above everything else measurement is king. This is a very important part of 'the scientific method' - that we demonstrate the strength of an argument or theory by repeatable experiment. nox may well understand why his fix improves performance and can make convincing arguments for it and tbh its a simple case and i'm inclined to agree. Thinking when you don't have to though is pointless - its easier and safer to actually measure it. Again I'm just trying to suggest extremely well established best practise...

@ghost
Copy link

ghost commented Mar 17, 2014

Jesus!

Could you please just SHUT THE FUCK UP ABOUT MAKE?!

The issue here is how Linguist sucks so strongly that it can extract matter from galactic black holes. This is an issue which has been outstanding for over a year ON THIS SINGLE FUCKING LANGUAGE ALONE. I'm sure if we scanned other PRs we'd find more that are even older. (Like .pl and Prolog which I highlighted above.) The code, as it is now, IS NOT CORRECT. The code, as it has always been, HAS NEVER BEEN CORRECT. The code, IS DEFECTIVE BY ITS VERY DESIGN.

If you really do believe that correctness is paramount ALL of your diarrhetic verbiage about performance is disingenuous, meaningless drivel. (It's meaningless drivel if you don't believe that correctness is paramount, but at least it's honest diarrhetic verbiage.) So which is it? You can't have it both ways.

@nox
Copy link
Contributor

nox commented Mar 17, 2014

I will refrain from replying until a Githubber finally says something useful.

@ghost
Copy link

ghost commented Mar 17, 2014

I won't.

@semiessessi
Copy link

@ttmrichter

Fine. It is not relevant and I noted it as such - but it still pollutes the conversation. Point taken. (I think).

Thanks for your rage. It is appreciated - if unwelcome. I'm obviously not contributing anything of worth here...

Really what we need is someone to test and decide to accept the suggested fix or reject it with reason.

@ghost
Copy link

ghost commented Mar 17, 2014

Really what we need is someone to actually understand at Github that broken code is broken and won't go away by ignoring it. Of course given their approach to dealing with personnel issues I doubt that solving problems is actually a part of their corporate culture.

@nox
Copy link
Contributor

nox commented Mar 17, 2014

Really what we need is someone to test and decide to accept the suggested fix or reject it with reason.

Maybe people wouldn't be angry if not consistently ignored. Though ignoring stuff at GitHub seems like their default modus operandi.

@sebgod
Copy link
Contributor

sebgod commented Apr 6, 2014

@PaulBone , Hello Paul, dear maintainers, I've forked Paul's pull request, #1049 and changed the file extension to a non conflicting one, to see if the sampling recognizer will actually work. It seems that based on the provided samples Mercury is properly handled by the statistical analyser.

@PaulBone
Copy link
Contributor Author

PaulBone commented Apr 7, 2014

Thanks @sebgod, This worked (Even with .m) 15 months ago when I initially
wrote this patch. This was Janurary 2013.

I've since re-submitted the patch a couple of times which is why it looks
like I've only been waiting 5 months.

@nox
Copy link
Contributor

nox commented Apr 7, 2014

@sebgod Will I applaud your pragmatism, the very fact that you had to work around an easily-solved problem makes me shake my fist in anger.

@sebgod
Copy link
Contributor

sebgod commented Apr 7, 2014

@nox, I bet that was just an misunderstanding, actually the fall-back detection mechanism based on the statistical analyser is rather powerful. IMHO, It would be much more useful to first have the language in the github/linguist repository, and then we can improve the lexer.
@PaulBone , yes I see from the history that the maintainer just ignored the pull request for quite a while, maybe it is because the travis build was failing. Not sure.

@nox
Copy link
Contributor

nox commented Apr 7, 2014

@sebgod Read the whole PR, this is not the only ignored pull request, I could document a lot of things getting ignored and not being addressed in that project. It would be much more useful to have proper maintainers instead of people caring only about JS and Ruby.

@sebgod
Copy link
Contributor

sebgod commented Apr 7, 2014

@nox, Yes they seem to have their own "language hierarchy", preferring some over the others. To be honest, GitHub sometimes does have the feeling of trying to please the "hip" languages and projects more than others. Clearly this can be seen with the Linux Kernel which is only read-only on GitHub, as the inventor of git - Linus - says that the way GitHub approaches the pull request and branching is not suitable for large-scale development. It would be much better if one could just specify a proper lexer at the repository level, with fine grained control.

@arfon arfon merged commit f0ad498 into github-linguist:master Apr 21, 2014
@arfon
Copy link
Contributor

arfon commented Apr 21, 2014

Closing in favour of #1098

primary_extension issue to be addressed in a separate PR.

nox added a commit to nox/linguist that referenced this pull request May 1, 2014
The language attribute is still maintained as the first extension found.

This allows Mercury to be properly detected by Linguist, as per github-linguist#748.
@github-linguist github-linguist locked and limited conversation to collaborators Oct 31, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet