New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added detection for the M (aka MUMPS) programming language. #148
Conversation
+1 This is great, Here are other projects in M as well: https://github.com/OSEHR/M-Tools and probably the most important is VistA (The EHR of the Department of Veterans Affairs): VistA has about 40 forks now, and the number will increase soon. |
+1 A free / open source M/MUMPS implementation for Linux on x86 is GT.M (http://fis-gtm.com and http://sf/net/projects/fis-gtm ) |
+1 Excellent. This is great news :) Thanks @lparenteau |
+1 Sounds great! |
+1 Highly desirable.. |
+1 Highly useful addition! |
+1 This would be great. |
+1 |
+1, M will be increasingly popular as VistA rolls out |
+1, will assist in development of VistA. |
+1 cool! |
+1 |
+1 |
1 similar comment
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
@@ -471,6 +474,10 @@ def guess_m_language | |||
elsif lines.grep(/^%/).any? | |||
Language['Matlab'] | |||
|
|||
# M comment | |||
elsif lines.grep(/^[ \t]*;/).any? | |||
Language['M'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only checking for comments is a rather crude method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, a better regex would be these two:
^[ \t%A-Z][A-Za-z0-9]+[ \t]+;*
^\d+[ \t]+;*
If all non-blank lines don't satisfy this two regexes, the program isn't valid MUMPS code.
Edit: I consulted the standard and had to revise.
Source: http://71.174.62.16/Demo/AnnoStd?Frame=Main&Page=a101004
+1 |
I don't have a strong preference between M and MUMPS, but for what it's worth, the official name is M. Ref: http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=29268 |
If it wasn't clear, my +1 was for the original pull request by lparenteau, not a comment on the M vs MUMPS discussion. fwiw: |
+1 (for the original pull request by lparenteau) |
+1 |
3 similar comments
+1 |
+1 |
+1 |
Sorry, but with the name controversy, there being no lexer, and it clashing with another very popular extension (obj-c), this isn't going to work. Thanks for that patch. |
The name controversy exists only in your mind..... |
Josh, I'm wondering how github is dealing with MATLAB It would seems that a file name extension clash Also,
|
I agree... that is a poor excuse (there are many conflicts with the .m Perhaps there is no lexer, but I would imagine that an regular expression For example (not exactly this, but similar): ^[%A-Za-z][A-Za-z0-9]*[\t ]+; |
I agree with you, Larry, seanwoods earlier suggested: I don't know what \d is supposed to signify, Technically, the first line could have MUMPS code on it, but it is such a rare occurrence, By the way, some of the code of the patch appears to be at this URL. The relevant portion is:
|
@luisibanez The lexer is only used to do syntax highlighting when viewing source file directily in GitHub. There are many other languages that don't define a lexer as well. This was something I wanted to look at later, but if you are interested, GitHub use Pygments (http://pygments.org/) for this, so we would need to add a lexer for M in to project, which GitHub will eventually inherit. As shown by @whitten, a .m files is currently considered to be either an Objective-C file or a Matlab file. My patch add M to that list. The regex (or other method) used to detect M source code doesn't need to be exact. @josh I have tested my patch on all the project found in the Objective-C main page (https://github.com/languages/Objective-C), and on the 2317 .m files present, only 1 was wrongly tagged as M. I have fixed the issue and I think I could add that commit to this pull request if you re-open it. Or should I start a new pull request? As for the other regex suggested, I did try them on various M project and the results weren't as good as looking for M comments. But, if GitHub want do go this way instead, I'm sure we can come up with a better regex. |
@whitten According to the standard (linked in my comment to the patch above), a M tag can be an integer as well. I just tested in GT.M. The heuristics expressed in this Ruby code aren't very rigorous. Just look at how it detects Matlab. M is pretty picky about how code needs to be laid out, but it boils down to those regexes. You could also check for strings like M is a pretty simple language. It should be easy to find the elements of M that don't intersect with Objective-C or Matlab. As for the name issue - if it's between "M" and "MUMPS," use "M." This is how the standard is written. If it's against the Microsoft language linked to by Josh, I'd suggest using the syntax to detect the proper format. |
Sean, I agree that you are allowed to put a string of numeric digits as a tag. I didn't suggest that it needed to be an integer, because a string of 00050 is a valid tag in M, but the canonical integer is 50 %000 is a valid tag as well, by the way. I assume \d means "decimal integer" ? David |
@@ -0,0 +1,4 @@ | |||
fox |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't tell if this line has a ls (label-separator) or not.
The M language requires such following a label.
The word "fox" is clearly the label, but it isn't clear whether a space or tab character is following it.
This is documented for the current standard at this URL:
http://71.174.62.16/Demo/AnnoStd?Frame=Main&Page=a106007&Edition=1995
i.e.:
6.2.4 Label separator ls
A label separator (ls) precedes the linebody of each line. A ls consists of one or more spaces. The flexible number of spaces allows programmers to enhance the readability of their programs.
ls ::= SP ...
this is referenced from the URL:
http://71.174.62.16/Demo/AnnoStd?Frame=Main&Edition=1995&Page=a106003#Def_0002
6.2 Routine body routinebody
The routinebody is a sequence of lines terminated by an eor. Each line starts with one ls which may be preceded by an optional label and formallist. The ls is followed by zero or more li (level-indicator) which are followed by zero or more commands and a terminating eol. If there is a comment it is separated from the last command of a line by one or more spaces.
routinebody ::= line ... eor
line ::= │ levelline | formalline │
eor ::= CR FF
IMO you do not get to be that specific in the classification filter (mapping suffix and file contect to language). That is why my pattern stopped at the semi-colon. Sure stuff can follow, but it may not be useful in classifying language. |
+1 |
4 similar comments
+1 |
+1 |
+1 |
+1 |
This pull request was closed 5 years ago. Since then LInguist evolved a lot and it now has support for M. If this is not working for you, please open a new issue. |
This add detection for the M (aka MUMPS) programming language (see https://en.wikipedia.org/wiki/MUMPS).
I have successfully tested this using
bundle exec rake test
.I have also called
bundle exec linguist
on the following projects, which I know have M files in them :