Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to Matlab lexer #1271

Merged
merged 2 commits into from
Nov 25, 2019
Merged

Improvements to Matlab lexer #1271

merged 2 commits into from
Nov 25, 2019

Conversation

anntzer
Copy link
Contributor

@anntzer anntzer commented Nov 24, 2019

  • Detect .m files starting with a function definition as MATLAB, not
    ObjC.
  • Require word boundaries in regexes matching numbers and floats, to
    avoid mishighlighting load 123file as starting with a number.
  • MATLAB treats a bare word followed by arguments such as foo bar baz as
    the function call foo('bar', 'baz'). As such, treat everything that
    follows the bare word as a string.

Just a revival of https://bitbucket.org/birkenfeld/pygments-main/pull-requests/676/matlab-detection-float-boundaries/diff.

- Detect `.m` files starting with a function definition as MATLAB, not
  ObjC (pygments#1149).
- Require word boundaries in regexes matching numbers and floats, to
  avoid mishighlighting `load 123file` as starting with a number.
Copy link
Member

@birkenfeld birkenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small nits...

@@ -72,6 +72,8 @@ class MatlabLexer(RegexLexer):
"hilb", "invhilb", "magic", "pascal", "rosser", "toeplitz", "vander",
"wilkinson")

_operators = r'-|==|~=|<|>|<=|>=|&&|&|~|\|\|?|\.\*|\*|\+|\.\^|\.\\|\.\/|\/|\\'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will find < before <=, the order has to be different (I know you just moved the definition)

Or, better, use words() which will automatically escape and order the regex to avoid this problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just reordered the elements, will pass on the bigger refactoring.

if re.match(r'^\s*%', text, re.M): # comment
# function declaration.
if next(line for line in text.splitlines()
if not re.match(r'^\s*%', text)).strip().startswith('function'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just re.search(r'^function\b') with re.MULTILINE.

Although this is a bit problematic for safe identification, since e.g. JavaScript functions are also defined that way. (analyse_text can also be used when no filename is known.)

Maybe you can also match the equals sign which I gather is always part of the function header?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the single regex is equivalent, the point is that the first line that isn't fully commented should start with function.
No, there isn't always an equal sign (if the function doesn't return anything).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I missed that, sorry. Still very possible in Javascript though... maybe exclude if { comes later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, done

# is recognized if it is either surrounded by spaces or by no
# spaces on both sides; only the former case matters for us. (This
# allows distinguishing `cd ./foo` from `cd ./ foo`.)
(r'(?:(?<=^)|(?<=;))\s*\w+\s+(?!=|\(|(%s)\s+)' % _operators, Name,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No lookbehind is needed for ^.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair point.

@anntzer
Copy link
Contributor Author

anntzer commented Nov 25, 2019

thanks, all comments handled.

MATLAB treats a bare word followed by arguments such as `foo bar baz` as
the function call `foo('bar', 'baz')`.  As such, treat everything that
follows the bare word as a string.
@birkenfeld birkenfeld merged commit 612fb2b into pygments:master Nov 25, 2019
@birkenfeld
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants