Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new lexer request: dot (graphviz) #731

Closed
Anteru opened this issue Aug 31, 2019 · 16 comments · Fixed by #1657
Closed

new lexer request: dot (graphviz) #731

Anteru opened this issue Aug 31, 2019 · 16 comments · Fixed by #1657
Labels
S-major severity: major T-feature type: a new feature X-imported imported from Bitbucket

Comments

@Anteru
Copy link
Collaborator

Anteru commented Aug 31, 2019

(Original issue 1024 created by psuter on 2014-08-12T08:29:53.471324+00:00)

The DOT language is used by the open source graph visualization software Graphviz to represent structural information as diagrams of abstract graphs and networks.

The language grammar is described at http://www.graphviz.org/doc/info/lang.html

An example:

#!dot
    digraph G {
       Hello->Pygments
    }

Would be nice to have a lexer for this in Pygments.

@Anteru Anteru added T-feature type: a new feature X-imported imported from Bitbucket S-major severity: major labels Aug 31, 2019
@Anteru
Copy link
Collaborator Author

Anteru commented Aug 31, 2019

(Original issue was assigned to tshatch)

@Anteru
Copy link
Collaborator Author

Anteru commented Aug 31, 2019

(Original comment by tshatch on 2015-10-17T02:58:14.574333+00:00)

Issue #1135 was marked as a duplicate of this issue.

@Anteru
Copy link
Collaborator Author

Anteru commented Aug 31, 2019

(Original comment by auge on 2016-03-10T13:55:34.707881+00:00)

in the meantime, I use "C".
It produces reasonable colors for comments, strings and numbers ...

@Anteru
Copy link
Collaborator Author

Anteru commented Aug 31, 2019

(Original comment by nikeee on 2018-04-09T12:32:11.459759+00:00)

Aby news on this one?

@Anteru
Copy link
Collaborator Author

Anteru commented Aug 31, 2019

(Original comment by psuter on 2018-04-09T21:30:52.797972+00:00)

A basic attempt:

#!python

from pygments.lexer import RegexLexer, bygroups
from pygments.token import (Comment, Keyword, Operator, Name, String,
    Number, Punctuation, Whitespace)

__all__ = ['GraphvizLexer']


class GraphvizLexer(RegexLexer):
    """
    For graphviz DOT graph description language.
    
    .. versionadded:: 2.3.0
    """
    name = 'Graphviz'
    aliases = ['graphviz']
    filenames = ['*.gv', '*.dot']
    mimetypes = ['text/x-graphviz']
    tokens = {
        'root': [
            (r'\s+', Whitespace),
            (r'(#|//).*?$', Comment.Single),
            (r'/(\\\n)?[*](.|\n)*?[*](\\\n)?/', Comment.Multiline),
            (r'(?i)(node|edge|graph|digraph|subgraph|strict)\b', Keyword),
            (r'--|->', Operator),
            (r'[{}[\]:;,]', Punctuation),
            (r'(\b\D\w*)(\s*)(=)(\s*)', bygroups(Name.Attribute, Whitespace, Punctuation, Whitespace), 'attr_id'),
            (r'\b(n|ne|e|se|s|sw|w|nw|c|_)\b', Name.Builtin),
            (r'\b\D\w*', Name.Tag), # node
            (r'[-]?((\.[0-9]+)|([0-9]+(\.[0-9]*)?))', Number),
            (r'"(\\"|[^"])*?"', Name.Tag), # quoted node
            (r'<', Punctuation, 'xml'),
        ],
        'attr_id': [
            (r'\b\D\w*', String, '#pop'),
            (r'[-]?((\.[0-9]+)|([0-9]+(\.[0-9]*)?))', Number, '#pop'),
            (r'"(\\"|[^"])*?"', String.Double, '#pop'),
            (r'<', Punctuation, ('#pop', 'xml')),
        ],
        'xml': [
            (r'<', Punctuation, '#push'),
            (r'>', Punctuation, '#pop'),
            (r'\s+', Whitespace),
            (r'[^<>\s]', Name.Tag),
        ]
    }

@Anteru
Copy link
Collaborator Author

Anteru commented Aug 31, 2019

(Original comment by nikeee on 2018-04-21T01:02:13.099452+00:00)

Is there anything that's missing in this lexer?

@Anteru
Copy link
Collaborator Author

Anteru commented Aug 31, 2019

(Original comment by psuter on 2018-04-21T06:08:53.029824+00:00)

Like what?

It explicitly lists all the following things mentioned in the language grammar:

  • All six case-independent keywords (node, edge, graph, digraph, subgraph, strict).

  • All ten compass point values (n, ne, e, se, s, sw, w, nw, c, _).

  • All seven "punctuation" characters ({, }, [, ], :, ;, ,).

  • Both edge operators (--, ->).

  • All comment styles (/* */, //, #).

  • Whitespace.

  • All four ID identifiers:

    • Strings. (But not using the exact character ranges.)
    • Numerals.
    • Double quoted strings ("). (Missing: Multi-line escaping and + operator to concatenate strings.)
    • HTML strings (<, >). (But not using real XML parsing. No & escape sequences etc.)

I don't see any other tokens mentioned in the grammar. The missing things (XML, multi-line escaping etc.) all seem quite exotic and unimportant to me, but feel free to add them.

On a higher level, the strings can represent different things like attribute names, node names etc.
The ~200 attribute names etc. are not explicitly listed.
The node names are user defined, so can't be explicitly listed.
But attribute and node names are distinguished implicitly by position.

If this all works 100% correctly in all cases I don't know.

@Anteru
Copy link
Collaborator Author

Anteru commented Aug 31, 2019

(Original comment by nikeee on 2018-04-21T11:46:03.043356+00:00)

I mean to be able to be merged to pygments. I think it's a pretty decent solution.

@Anteru
Copy link
Collaborator Author

Anteru commented Aug 31, 2019

(Original comment by nikeee on 2018-05-26T13:56:08.111416+00:00)

As I need this lexer in my current paper using minted, I put this lexer (with some changes) on GitHub and added a setup.py.

https://github.com/nikeee/pygments-lexer-graphviz

@prodigion
Copy link

As a lexer is ready for this, can it please be integrated?

@Anteru
Copy link
Collaborator Author

Anteru commented Dec 6, 2020

If someone can turn that into a PR, then yes, should be fairly quick to integrate. Would you mind preparing one? The advantage is that a PR has the full test coverage so we can immediately see if this merges cleanly or requires lots of work.

@prodigion
Copy link

@nikeee As you created the repo at https://github.com/nikeee/pygments-lexer-graphviz, could you please create the pull request?

@nikeee
Copy link
Contributor

nikeee commented Dec 13, 2020

I'd do it.
Keep in mind that I'm not the author of the lexer. It was the work of psuter.

@prodigion
Copy link

I'd do it.
Keep in mind that I'm not the author of the lexer. It was the work of psuter.

Is that @petsuter?

@petsuter
Copy link

Right, please feel free to create a PR and integrate it.

@nikeee
Copy link
Contributor

nikeee commented Dec 30, 2020

I created a PR for this one (#1657).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-major severity: major T-feature type: a new feature X-imported imported from Bitbucket
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants