Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matlab extension .m #15

Closed
alcides opened this issue Jun 28, 2011 · 22 comments
Closed

Matlab extension .m #15

alcides opened this issue Jun 28, 2011 · 22 comments

Comments

@alcides
Copy link

alcides commented Jun 28, 2011

I've seen you consider Matlab's extension as .matlab, however it is popular to use .m (one of the standard extensions).

I know this conflicts with Objective-C's m files, but it would be interesting to have an option to make syntax checks to guess the extension in dubious cases.

This is confusing to me, as I have both Objective-C and Matlab repositories.

@bilderbuchi
Copy link
Contributor

+1, for what it's worth.
There were a couple threads about this in the support forum, but these seem to be gone now, only replaced by a contact form. As far as I can recall, the "conclusion" was that since you can't disambiguate by extension, it's too complicated and won't be done.

This is why the README of linguist gives me new hope. It says that they already use "deep content inspection" to correctly identify (extensionless) script files, and use it to correctly assign .h files to their respective languages, so I think it can be concluded that correctly assigning .m files is possible now.

The thing is, I don't know what/how to contribute to Linguist here - there's already a lexer entry for Matlab/origin (even with .m file extension), and it's listed in languages.yml (Although only with .matlab extension). I think in one of the threads in the support forum, there were already a couple of unique syntax characteristics for Matlab and Objective-C identified, but I have no way to check/dig that out (see above).

edit: An example repository is this one (mostly Matlab/Octave, and a bit of C++, no Obj-C): https://github.com/bilderbuchi/OpenTLD
Also, for what it's worth, I have never seen anyone use the .matlab extension. Even official code by The Mathworks (e.g. built-in functions) have a .m extension!

@josh
Copy link
Contributor

josh commented Jun 28, 2011

👍 We'd need to come up with a good heuristic to detect Matlab. Ideas?

@bilderbuchi
Copy link
Contributor

I'm working on something right now. Will post a gist later.
Do you have the permissions to look in the old support forum? ("Matlab language" should be a sufficient search term) Or are those totally nuked?

@josh
Copy link
Contributor

josh commented Jun 28, 2011

@bilderbuchi I did a quick check. Only found reports that .m wasn't working for Matlab. Nothing useful.

@bilderbuchi
Copy link
Contributor

OK, thanks.

So, I adapted from the .h recognition in https://github.com/github/linguist/blob/master/lib/linguist/blob_helper.rb#L276-289
Only in pseudocode unfortunately, I don't speak ruby (or Obj-C for that matter):
https://gist.github.com/1051201

Major points:

If Obj-C can be identified confidently, Matlab could be a fall-through/else option without needing Matlab heuristics.

Maybe, there's an already finished and working heuristic at http://cloc.sourceforge.net/
Update: The relevant code is in function matlab_or_objective_C in http://cloc.svn.sourceforge.net/viewvc/cloc/trunk/cloc?revision=234&view=markup L6352 ff.

Btw, it will be impossible to distinguish between octave and Matlab code, so I think they should consistently be lumped together ("Matlab/Octave")...

@jovo
Copy link

jovo commented Jun 28, 2011

matlab code files are all either "functions" or "scripts".
functions must (i think) start like this:

function varout = fname(varin1,varin2)
or
function [varout1 varout2 ....] = fname(varin1,varin2,...)

they are permitted to have blank space or comments before this line,
so this won't work for all matlab code files, but it should work for a chunk of them.

moreover, the last word of a function should be end but that is not enforced.

@bilderbuchi
Copy link
Contributor

yeah, I know, and that's what I had already written in the gist. Or do I miss something here?

@josh
Copy link
Contributor

josh commented Jun 29, 2011

Unfortunately the gist isn't an applyable patch with test cases.

@bilderbuchi
Copy link
Contributor

Yeah, I know. :-(
That's why we'd need someone who speaks ruby, obj-c and matlab. Optimally perl, to correctly translate the heuristic of cloc I linked to.

@josh
Copy link
Contributor

josh commented Jul 3, 2011

If you guys could put together some solid test fixtures for both matlab and obj-c I could work on putting the implementation together.

@bilderbuchi
Copy link
Contributor

Done. See #30

@josh josh closed this as completed in 5ecc442 Jul 5, 2011
@josh
Copy link
Contributor

josh commented Jul 5, 2011

Basic support is in.

If you find any files that don't match, please send a pull request with a failing test case.

@bilderbuchi
Copy link
Contributor

nice, thank you!
when will this go live/affect existing repository statistics?

@audioplastic
Copy link

Hi all. This works well for one of my repositories although it is still saying that there is a little objective-C in there. There is no objective-C in either example.

https://github.com/audioplastic/MAP/graphs/languages

... but is falls flat on its face for object oriented matlab code.

https://github.com/audioplastic/soma/graphs/languages

@bilderbuchi
Copy link
Contributor

Hm, I just checked in my repo (https://github.com/bilderbuchi/OpenTLD/graphs/languages), and Matlab doesn't get recognized at all. I thought that maybe the graphs take some time to refresh, but that should have already happened by now I guess?

Anyway, possibly the Matlab comment recognition (5ecc442#L0R337) should be moved above the Obj-C to collect most Matlab files by the comments?

Also, I find it curious how it puts Obj-C vs Matlab at 91 vs 9% in your failing repo - this seems to imply one of eleven .m files is Matlab, but there are only 6 .m files in the whole repo (and onyl 10 non-image files), so I wonder where those numbers come from.

@earl
Copy link
Contributor

earl commented Jul 12, 2011

@bilderbuchi

In my OpenTLD repo [..] Matlab doesn't get recognized at all. I thought that maybe the graphs take some time to refresh [..]

Those statistics indeed take time to refresh. They'll get updated if you push some changes, though. So you'll either have to wait for it to be reindexed, or push some commits.

For reference, here's what the current master of linguist thinks about your OpenTLD repo:

71%  Matlab
27%  C++
2%   C
1%   Objective-C

I wonder where those numbers come from.

They are based on lines of code, not on the number of files.

@audioplastic
Copy link

We could probably solve the object oriented matlab problem by just adding a filter that looks for "classdef" on line 1. Right-clicking in the current folder window in matlab and adding a new file of type class gives you a file like the following

    classdef dsfssdf
        %DSFSSDF Summary of this class goes here
        %   Detailed explanation goes here

        properties
        end

        methods
        end

    end

@bilderbuchi
With regard to your comment about my repo, I'm guessing that the percentages are calculated based on the number of lines or characters rather than the number of files?

@earl
I have just done a push to test my failing repo with the latest linguist and I get the same numbers.

@bilderbuchi
Copy link
Contributor

yes, lines of code would be obvious. sorry, i'm stupid. i should get more coffee :-P

@earl: theses figures for my repo look right, thanks for checking. the 1% obj-C is a misclassification afaik, but 1% error is more than OK!

@audioplastic: sounds good, could be a one-line change here if we lump class recognition together with function recognition.

@audioplastic
Copy link

@bilderbuchi
I'll have a shot at making the necessary modifications tonight and will open a pull request unless anyone beats me to it.

@josh
Copy link
Contributor

josh commented Jul 12, 2011

More matlab improvements are welcome. Just be sure to improve the tests to match whatever new heuristics you are adding.

The graphs are weighted by file size instead of loc. Its more convenient since that data is already cached. And sorry, no we can't switch it cause I'd need to reindex all the code on GitHub :)

https://github.com/github/linguist/blob/master/lib/linguist/repository.rb#L75

@Air-Craft
Copy link

I've just tried to create an Objective-C gist and when it saves it reverts to Matlab. Editing has no effect.

@whitten
Copy link
Contributor

whitten commented Apr 11, 2014

If your code is Objective-C then you need to find out how the linguist
program to recognize you did NOT write Matlab code. Make an example of it
and put it into the
https://github.com/github/linguist/tree/master/samples/Objective-Cdirectory.
There are currently only 9 examples of Objective-C to contrast
with Matlab. Not much to tell them apart.

On Fri, Apr 11, 2014 at 7:30 AM, Hari Karam Singh
notifications@github.comwrote:

I've just tried to create an Objective-C gist and when it saves it reverts
to Matlab. Editing has no effect.

Reply to this email directly or view it on GitHubhttps://github.com//issues/15#issuecomment-40194132
.

aroben added a commit that referenced this issue Feb 19, 2015
* vendor/grammars/Modelica f2b1242...e1fd853 (1):
  > Some string improvements

* vendor/grammars/NimLime 58a1e0c...fac6b18 (2):
  > Added support for ST3
  > Merge pull request #15 from fenekku/master

* vendor/grammars/SublimePapyrus 152c7b7...2731300 (1):
  > Updated INI path setting behavior

* vendor/grammars/actionscript3-tmbundle d69fcc8...d24ad7d (1):
  > all contexts

* vendor/grammars/dart-sublime-bundle c1afc62...d55b1d4 (4):
  > Merge pull request #458 from guillermooo-forks/prep-release
  > Merge pull request #457 from guillermooo-forks/refactor
  > Merge pull request #455 from guillermooo-forks/fix-stagehand-unavailable
  > Merge pull request #452 from guillermooo-forks/improve-syntax-def

* vendor/grammars/grace-tmbundle c342d35...acbf9a2 (9):
  > Add simple block parameter highlighting
  > Track open braces for better interpolation
  > Add highlighting for full import syntax
  > Check for extra word characters after var keyword
  > Remove built-ins and change storage to support
  > Highlight untyped block parameters
  > Highlight interpolation braces as keywords
  > Highlight only capitalised words with generic args
  > Include comment highlighting in every construct

* vendor/grammars/language-javascript ac37d2a...d58edec (2):
  > Prepare 0.57.0 release
  > Merge pull request #101 from postcasio/iojs-shebang

* vendor/grammars/latex.tmbundle 0441781...669040b (1):
  > Fix doctest for `run_biber` in `texmate`

* vendor/grammars/mako-tmbundle e039636...da79638 (1):
  > Merge pull request #7 from seedofjoy/patch-1

* vendor/grammars/sublime-text-ox bdd03e0...10ca883 (5):
  > Update README.md
  > Update README.md
  > Moved license to separate md-file.
  > Ctrl+B executes on a single core instaed of two.
  > Added patterns for 'foreach' and 'delete'.
gamogamze15 pushed a commit to gamogamze15/linguist that referenced this issue Mar 18, 2024
@github-linguist github-linguist locked as resolved and limited conversation to collaborators Jun 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants