Mistetected Octave/Matlab code snippet #1747

MayeulC · 2018-05-15T10:40:55Z

Hi,

This code doesn't appear to have syntax highlighting, as far as I know (as it appears on my gitea install with v9.11.0, at least), yet it is a valid Octave script:

%% function cellarray = removeregexp(cellaray, expression_array)
% Removes matching expressions from cell array strings.
% If a string in the cell array is or becomes empty, it is removed from the cell
% array.
function cellarray = removeregexp(cellarray, expression_array)
  cellarray = regexprep(cellarray, expression_array, '');
  cellarray(cellfun(@isempty, cellarray)) = [];

The leading %% should give it away (Though TeX might be a contender). The end/endFunction is optional, and omitted here. It also appears that functions that do not have leading comments are more easily misdetected.

The text was updated successfully, but these errors were encountered:

joshgoebel · 2019-10-06T08:12:36Z

We don't tend to score comments higher than other things, so assuming they are counted (they seem to be) then %% just gets equal points to %. The problem is the rest of this code doesn't look that different than many other languages...

It also appears that functions that do not have leading comments are more easily misdetected.

Because % is unusual for commenting (at least in my experience)... so the more comments the more relevancy points you get for "looking like" Matlab code vs some other code.

Not sure there is anything to be done here unless we wanted to boost the relevancy of %% a bit, but that isn't going to help in larger samples of code there the ratio of %% to code is small and the code still looks pretty generic.

joshgoebel · 2019-10-06T08:20:44Z

@MayeulC Ruby's ERB also uses %%. What are the rules for %% in matlab? Does it alway have to start on a newline?

MayeulC · 2019-10-08T22:40:10Z

@yyyc514 There aren't that many. Basically, in Matlab, those serve as section delimiters.
Sections can be executed individually, and the section you are currently editing is highlighted. Comments that follow %% are emboldened, so this is often used cosmetically to delimit code blocks, thus often at the top of the scripts, as it is serves as a "title". It does need to be at the start of a new line.

I would like to take a bit of what I said. Although it is encountered quite often in Matlab code, it is not necessarily a giveaway. And I do get your point, as the ratio of %% lines is generally around 1 per 20, 1 to 10, 1 to 40 or even zero depending on code bases.

Matlab has a lot of builtins, since everything is imported in the scope by default, and you can shadow functions. Here, regexprep, isempty, cellfun are builtins.

@ is an operator to get a function handle (can be used for lambdas as well). It is in my experience quite commonly used to prefix builtins.

I had a further look at the matching script. Would it help to know the generic function syntax for matlab?

It's (< > denotes optional parts), I saw that the script doesn't try to find =.

function < < [ list, of,> returnvalue < ,matching ] > = > func_name <(argument, list)>

this is what the xdg mime database uses for detection (of course, the rules there are very simple).

GNU Octave can be considered a dialect, with a few modifications, among which:

%% no longer serves a special purpose
# can be used for comments (and is thus quite often used in a #!/usr/bin/env octave shebang).
endif, endfunction, endwhile, endfor, do ... until are keywords.
split a number of toolboxes or specific domain functions in dedicated packages, introduce pkg as a builtin to load those.
special %! unit tests : %!assert, %!error, %!demo.

List of Octave builtins

joshgoebel · 2019-10-09T06:42:56Z

Does it alway have to start on a newline?

Not sure you answered this. This would be one easy way to perhaps increase the relevancy of Matlab a bit over other things as this would be a rather unusual thing I think.

Auto-detection is something I'm very curious about and a tough problem. Especially given that many of our parsers are intended to be "simple" rather than complete. Allowing us to detect/color a LOT of languages without having a huge size.

Would it help to know the generic function syntax for matlab?

Not sure this helps that much... From 10 miles high it looks a lot like expressions in other languages with < ( identifiers, etc...

joshgoebel · 2019-10-09T06:43:49Z

Long-term I'd like to see us get to a place (with tools, metrics, tests) where someone interested in this (improving detection for Matlab say) could get involved and play around and see if anything "sticks" and if they can improve the detection in "meaningful ways"...

Right now it's kind of hard/tricky to do that...

joshgoebel · 2019-10-16T14:56:53Z

@MayeulC If you'd like to submit a PR that bumps the relevancy of %% when it starts a new line by just a little and that doesn't break the brittle balance fo auto-detect that might be useful here.

But first you might want to start by seeing how far apart you currently even are between matching Matlab and whatever else it's matching. You have loaded the matlab grammar JS file - or bundled it, yes? I realize your initial message says "code doesn't appear to have syntax highlighting" rather than that it was highlighted incorrectly.

Otherwise not sure how we can improve this much. Your code sample is just hard to identify without more context. (as all small code samples often are)

joshgoebel · 2019-10-17T00:37:26Z

Closing as resolved/answered.

joshgoebel added the auto-detect Issue with auto detection of language type label Oct 7, 2019

joshgoebel closed this as completed Oct 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mistetected Octave/Matlab code snippet #1747

Mistetected Octave/Matlab code snippet #1747

MayeulC commented May 15, 2018

joshgoebel commented Oct 6, 2019

joshgoebel commented Oct 6, 2019

MayeulC commented Oct 8, 2019

joshgoebel commented Oct 9, 2019 •

edited

Loading

joshgoebel commented Oct 9, 2019

joshgoebel commented Oct 16, 2019 •

edited

Loading

joshgoebel commented Oct 17, 2019

Mistetected Octave/Matlab code snippet #1747

Mistetected Octave/Matlab code snippet #1747

Comments

MayeulC commented May 15, 2018

joshgoebel commented Oct 6, 2019

joshgoebel commented Oct 6, 2019

MayeulC commented Oct 8, 2019

joshgoebel commented Oct 9, 2019 • edited Loading

joshgoebel commented Oct 9, 2019

joshgoebel commented Oct 16, 2019 • edited Loading

joshgoebel commented Oct 17, 2019

joshgoebel commented Oct 9, 2019 •

edited

Loading

joshgoebel commented Oct 16, 2019 •

edited

Loading