-
Notifications
You must be signed in to change notification settings - Fork 662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to Matlab lexer #1271
Conversation
- Detect `.m` files starting with a function definition as MATLAB, not ObjC (pygments#1149). - Require word boundaries in regexes matching numbers and floats, to avoid mishighlighting `load 123file` as starting with a number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small nits...
pygments/lexers/matlab.py
Outdated
@@ -72,6 +72,8 @@ class MatlabLexer(RegexLexer): | |||
"hilb", "invhilb", "magic", "pascal", "rosser", "toeplitz", "vander", | |||
"wilkinson") | |||
|
|||
_operators = r'-|==|~=|<|>|<=|>=|&&|&|~|\|\|?|\.\*|\*|\+|\.\^|\.\\|\.\/|\/|\\' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will find <
before <=
, the order has to be different (I know you just moved the definition)
Or, better, use words()
which will automatically escape and order the regex to avoid this problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just reordered the elements, will pass on the bigger refactoring.
pygments/lexers/matlab.py
Outdated
if re.match(r'^\s*%', text, re.M): # comment | ||
# function declaration. | ||
if next(line for line in text.splitlines() | ||
if not re.match(r'^\s*%', text)).strip().startswith('function'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just re.search(r'^function\b')
with re.MULTILINE
.
Although this is a bit problematic for safe identification, since e.g. JavaScript functions are also defined that way. (analyse_text can also be used when no filename is known.)
Maybe you can also match the equals sign which I gather is always part of the function header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the single regex is equivalent, the point is that the first line that isn't fully commented should start with function
.
No, there isn't always an equal sign (if the function doesn't return anything).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I missed that, sorry. Still very possible in Javascript though... maybe exclude if {
comes later?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, done
pygments/lexers/matlab.py
Outdated
# is recognized if it is either surrounded by spaces or by no | ||
# spaces on both sides; only the former case matters for us. (This | ||
# allows distinguishing `cd ./foo` from `cd ./ foo`.) | ||
(r'(?:(?<=^)|(?<=;))\s*\w+\s+(?!=|\(|(%s)\s+)' % _operators, Name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No lookbehind is needed for ^
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair point.
thanks, all comments handled. |
MATLAB treats a bare word followed by arguments such as `foo bar baz` as the function call `foo('bar', 'baz')`. As such, treat everything that follows the bare word as a string.
Thanks! |
.m
files starting with a function definition as MATLAB, notObjC.
avoid mishighlighting
load 123file
as starting with a number.foo bar baz
asthe function call
foo('bar', 'baz')
. As such, treat everything thatfollows the bare word as a string.
Just a revival of https://bitbucket.org/birkenfeld/pygments-main/pull-requests/676/matlab-detection-float-boundaries/diff.