Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Zeek lexer based on the Bro lexer #1269

Merged
merged 4 commits into from
Nov 25, 2019
Merged

Conversation

jsiwek
Copy link
Contributor

@jsiwek jsiwek commented Nov 24, 2019

Bro has been renamed to Zeek, but the language is essentially the
same without any different treatment of .zeek files from .bro files.

This change also adds general improvements to the lexer.

Not sure if .. versionadded:: 2.5 for ZeekLexer was the right thing for me to add.

Original PR: https://bitbucket.org/birkenfeld/pygments-main/pull-requests/824

Bro has been renamed to Zeek, but the language is essentially the
same without any different treatment of .zeek files from .bro files.

This change also adds general improvements to the lexer.
@Anteru Anteru added this to the 2.5 milestone Nov 24, 2019
@Anteru Anteru added the T-feature type: a new feature label Nov 24, 2019
Copy link
Collaborator

@Anteru Anteru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good except for the one typo.

pygments/lexers/dsls.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@Anteru Anteru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from my side. I'm not super happy with the IPv4/IPv6 patterns, but assuming it works ...

@jsiwek
Copy link
Contributor Author

jsiwek commented Nov 24, 2019

I'm not super happy with the IPv4/IPv6 patterns, but assuming it works ...

They are pretty gnarly, but the same patterns for Zeek highlighting are at least being used in other various projects without complaint so far: SublimeText (same rules also used in linguist / GitHub highlighting), Vim, Ace editor

The "not 100% correct" comment related to IPv6 I recall is related to it being more lenient than it technically should be in some cases, like in terms of allowing more than a single :: zero-compressed sequence.

pygments/lexers/dsls.py Outdated Show resolved Hide resolved
pygments/lexers/dsls.py Outdated Show resolved Hide resolved
pygments/lexers/dsls.py Outdated Show resolved Hide resolved
pygments/lexers/dsls.py Outdated Show resolved Hide resolved
pygments/lexers/dsls.py Outdated Show resolved Hide resolved
(r'\??\$', Operator),
# Technically, colons are often used for punctuation/separation.
# E.g. field name/type separation.
(r'[?:]', Operator),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be Punctuation then? (At least colon is also in that group, below.) Operator is usually highlighted a bit more prominent than Punctuation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but either ways seems to end up deserving of a code-comment: the problem was that : is used either in "ternary if" like foo ? expr1 : expr2 or in variable declarations like local foo: int;. The later is most common and disambiguating requires more full-blown parsing treatment, so I do like the idea of just using the less-prominent Punctuation for both ? and : for sake of consistency and now changed it to that.

class BroLexer(ZeekLexer):
"""
For `Zeek <https://www.zeek.org/>`_ scripts.
Note, this is equivalent to ZeekLexer (Bro is the old name for Zeek).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be done by moving the alias/filename pattern to ZeekLexer, and just having a BroLexer = ZeekLexer for compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've hit a point where I may need to modify a unit test to get that way to work, so could you help take a look at what you'd like done here or else point me at another example where this has been done?

Here's the patch I tried:

diff --git a/pygments/lexers/_mapping.py b/pygments/lexers/_mapping.py
index 92aaca39..5405785f 100644
--- a/pygments/lexers/_mapping.py
+++ b/pygments/lexers/_mapping.py
@@ -65,7 +65,7 @@ LEXERS = {
     'BooLexer': ('pygments.lexers.dotnet', 'Boo', ('boo',), ('*.boo',), ('text/x-boo',)),
     'BoogieLexer': ('pygments.lexers.verification', 'Boogie', ('boogie',), ('*.bpl',), ()),
     'BrainfuckLexer': ('pygments.lexers.esoteric', 'Brainfuck', ('brainfuck', 'bf'), ('*.bf', '*.b'), ('application/x-brainfuck',)),
-    'BroLexer': ('pygments.lexers.dsls', 'Bro', ('bro',), ('*.bro',), ()),
+    'BroLexer': ('pygments.lexers.dsls', 'Zeek', ('zeek', 'bro'), ('*.zeek', '*.bro'), ()),
     'BugsLexer': ('pygments.lexers.modeling', 'BUGS', ('bugs', 'winbugs', 'openbugs'), ('*.bug',), ()),
     'CAmkESLexer': ('pygments.lexers.esoteric', 'CAmkES', ('camkes', 'idl4'), ('*.camkes', '*.idl4'), ()),
     'CLexer': ('pygments.lexers.c_cpp', 'C', ('c',), ('*.c', '*.h', '*.idc'), ('text/x-chdr', 'text/x-csrc')),
@@ -474,7 +474,7 @@ LEXERS = {
     'XtlangLexer': ('pygments.lexers.lisp', 'xtlang', ('extempore',), ('*.xtm',), ()),
     'YamlJinjaLexer': ('pygments.lexers.templates', 'YAML+Jinja', ('yaml+jinja', 'salt', 'sls'), ('*.sls',), ('text/x-yaml+jinja', 'text/x-sls')),
     'YamlLexer': ('pygments.lexers.data', 'YAML', ('yaml',), ('*.yaml', '*.yml'), ('text/x-yaml',)),
-    'ZeekLexer': ('pygments.lexers.dsls', 'Zeek', ('zeek',), ('*.zeek',), ()),
+    'ZeekLexer': ('pygments.lexers.dsls', 'Zeek', ('zeek', 'bro'), ('*.zeek', '*.bro'), ()),
     'ZephirLexer': ('pygments.lexers.php', 'Zephir', ('zephir',), ('*.zep',), ()),
     'ZigLexer': ('pygments.lexers.zig', 'Zig', ('zig',), ('*.zig',), ('text/zig',)),
 }
diff --git a/pygments/lexers/dsls.py b/pygments/lexers/dsls.py
index d88bac99..5894606f 100644
--- a/pygments/lexers/dsls.py
+++ b/pygments/lexers/dsls.py
@@ -195,8 +195,8 @@ class ZeekLexer(RegexLexer):
     .. versionadded:: 2.5
     """
     name = 'Zeek'
-    aliases = ['zeek']
-    filenames = ['*.zeek']
+    aliases = ['zeek', 'bro']
+    filenames = ['*.zeek', '*.bro']
 
     _hex = r'[0-9a-fA-F]'
     _float = r'((\d*\.?\d+)|(\d+\.?\d*))([eE][-+]?\d+)?'
@@ -345,16 +345,7 @@ class ZeekLexer(RegexLexer):
     }
 
 
-class BroLexer(ZeekLexer):
-    """
-    For `Zeek <https://www.zeek.org/>`_ scripts.
-    Note, this is equivalent to ZeekLexer (Bro is the old name for Zeek).
-
-    .. versionadded:: 1.5
-    """
-    name = 'Bro'
-    aliases = ['bro']
-    filenames = ['*.bro']
+BroLexer = ZeekLexer
 
 
 class PuppetLexer(RegexLexer):

Here's the failing unit test:

    def test_get_lexers():
        # test that the lexers functions work
        for func, args in [(lexers.get_lexer_by_name, ("python",)),
                           (lexers.get_lexer_for_filename, ("test.py",)),
                           (lexers.get_lexer_for_mimetype, ("text/x-python",)),
                           (lexers.guess_lexer, ("#!/usr/bin/python -O\nprint",)),
                           (lexers.guess_lexer_for_filename, ("a.py", "<%= @foo %>"))
                           ]:
            x = func(opt='val', *args)
            assert isinstance(x, lexers.PythonLexer)
            assert x.options["opt"] == "val"
    
        for cls, (_, lname, aliases, _, mimetypes) in lexers.LEXERS.items():
>           assert cls == lexers.find_lexer_class(lname).__name__
E           AssertionError: assert 'BroLexer' == 'ZeekLexer'
E             - BroLexer
E             + ZeekLexer

tests/test_basic_api.py:124: AssertionError

Except that test does sound legit the way it is: if BroLexer remains in LEXERS at all (think it needs to for compat), then it's going to need a unique name attribute ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the old one needs to stay in LEXERS, as long as the old aliases and filenames work with the new one...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the _mapping.py file containing the LEXERS definition auto-generates itself, so either need to change the generation code to ignore BroLexer somehow or else move it to a different place where that logic won't find it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just needs to be removed from __all__.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, sounds good, new commit does that. Let me know if there's anything further.

pygments/lexers/other.py Outdated Show resolved Hide resolved
Addresses feedback from PR review:

* Use \w instead of [A-Za-z0-9_]
* Simplify IPv4/IPv6 addresses
* Remove superfluous leading \b's
* Change characters used in "ternary if" from Operator to Punctuation
* Remove ZeekLexer from "other.py" compatibility file
@jsiwek
Copy link
Contributor Author

jsiwek commented Nov 25, 2019

@birkenfeld thanks for the feedback, I pushed a new commit to address all of it except the BroLexer = ZeekLexer issues.

This also removes BroLexer from LEXERS.
Copy link
Member

@birkenfeld birkenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks!

@birkenfeld birkenfeld merged commit dff6629 into pygments:master Nov 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-feature type: a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants