Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exhale forces code highlighting language to be cpp despite Doxygen supporting multiple languages #28

Closed
mithro opened this issue Feb 28, 2018 · 6 comments
Labels

Comments

@mithro
Copy link

mithro commented Feb 28, 2018

I'm using Doxygen to generate documentation from Verilog with Exhale. This seems to mostly work pretty well except here -> https://github.com/svenevs/exhale/blob/master/exhale/graph.py#L2534

            if len(f.program_listing) > 0:
                include_program_listing = True
                full_program_listing = '.. code-block:: cpp\n\n'

Exhale is forcing the program listing code highlighting to be C++. This means that Sphinx's gets very grumpy when it tries to highlight this code block and Verilog is not valid C++ :-).

It would be good if we could use one of the following for providing the code block language parameters;

  • A config setting (maybe allow a function which returns the value?)
  • Looking up the language via file extension?
  • Read the language data from Doxygen's XML somehow?

Thoughts?

@svenevs
Copy link
Owner

svenevs commented Mar 1, 2018

This means that Sphinx's gets very grumpy when it tries to highlight this code block and Verilog is not valid C++ :-).

Hehe. I was wondering when this was going to surface. What follows is a bypass, and notes to myself for fixing it. Generally, the doxygen XML can be used. However, I'd love to hear your thoughts on the last part (adding a config variable). As you can see, I've thought about this problem for a good while, just never gotten around to fixing it.

Bypassing the Problem

Currently, you can bypass all program listing issues by adding XML_PROGRAMLISTING = NO (on by default):

exhale_args = {
    # ...
    "exhaleDoxygenStdin" = textwrap.dedent('''
        INPUT = ../path
        XML_PROGRAMLISTING = NO
    ''')
}

I don't know how relevant this is to Verilog, but the XML program listing is used to infer some missing relationships for these types:

exhale/exhale/graph.py

Lines 1747 to 1749 in d5a6adb

if child.kind == "enum" or child.kind == "variable" or \
child.kind == "function" or child.kind == "typedef" or \
child.kind == "union":

TBH I've always felt that me doing that means that I'm ultimately missing something during parsing. The reason for mentioning it is simply to point out that (unfortunately) disabling XML_PROGRAMLISTING may have undesirable consequences.

Fixing the Problem

Looking up the language via file extension?

The primary issue with this is the C++ community does not agree on conventions here. The way I've always operated: .h means pure C, .hpp means C++. But many projects will use .h for both C and C++. Evidently, doxygen has (somehow) solved this problem. *.h files do end up as C++ in the XML (see next section).

Read the language data from Doxygen's XML somehow?

This is definitely possible and preferred. We'll get something like

<compounddef id="common_8h" kind="file" language="C++">
    <compoundname>common.h</compoundname>

This comes from Doxygen's default EXTENSION_MAPPING, which can also be overriden. The defaults should be sufficient for most projects.

I'll work on this soon, it's just a matter of mapping the doxygen language="X" to the sphinx lexers.

A config setting (maybe allow a function which returns the value?)

However, this is actually required as well. CUDA can generally be parsed well by Doxygen, but it's not official. In one of my projects, I do

FILE_PATTERNS          = *.hpp *.cuh
EXTENSION_MAPPING     += cuh=c++

The implication is the following will be created:

<compounddef id="bilateral__filter_8cuh" kind="file" language="C++">
    <compoundname>bilateral_filter.cuh</compoundname>

but Sphinx actually has a CUDA lexer! Ideally, I would just let users specify a function object so that they can do whatever they want and return a lexer to me. It would be something like filename -> lexer. Unfortunately, function objects cannot be "pickled" (I ran into this problem with Customizing Breathe Output, and the solution was far from elegant). I think what would be the most straightforward would be something like

exhale_args {
    # ...
    "lexerMapping": {
        r".*\.cuh": "cuda",
        r"path/to/exact_filename\.ext": "c"
    }
}

This would cover the final case of languages not officially supported by Doxygen. "lexerMapping" would specify keys as patterns, which can just be matched as regular expressions. There is an unfortunate blunder in python's regular expressions where something like *.cuh will not work.

  1. Before parsing, re.compile everything in "lexerMapping"'s keys.
  2. If it starts with "*", add a . before it automatically?
  3. Make it very clear that re.match will be used (rather than re.search).

@svenevs
Copy link
Owner

svenevs commented Mar 3, 2018

@mithro what does the Doxygen XML produce for you for verilog files? As in with the .xml for a given file, what shows up under language="???"?

<compounddef id="common_8h" kind="file" language="C++">

They officially only support VHDL, the following mapping is what I have:

LANG_TO_LEX = {
    "IDL":          "idl",
    "Java":         "java",
    "Javascript":   "js",
    "C#":           "csharp",
    "C":            "c",
    "C++":          "cpp",
    "D":            "d",
    "PHP":          "php",
    "Objecive-C":   "objective-c",
    "Python":       "py",
    "Fortran":      "fortran",
    "FortranFree":  "fortran",
    "FortranFixed": "fortranfixed",
    "VHDL":         "vhdl"
}

@mithro
Copy link
Author

mithro commented Mar 4, 2018

FYI The Doxygen with verilog support is here -> https://github.com/avelure/doxygen-verilog

Doing a grep -R "language=" _doxygen | grep -v Python | grep -v "C++" | grep -v VHDL I get the following;

_doxygen/gateware/xml/enumwb__async__reg.xml:  <compounddef id="enumwb__async__reg" kind="enum" language="Verilog" prot="public">
_doxygen/gateware/xml/wb__async__reg_8v.xml:  <compounddef id="wb__async__reg_8v" kind="file" language="Verilog">
_doxygen/gateware/xml/enumHeaderRam.xml:  <compounddef id="enumHeaderRam" kind="enum" language="Verilog" prot="public">
_doxygen/gateware/xml/HeaderRAM_8v.xml:  <compounddef id="HeaderRAM_8v" kind="file" language="Verilog">
_doxygen/libuip/xml/ip64_2README_8md.xml:  <compounddef id="ip64_2README_8md" kind="file" language="Markdown">
_doxygen/libuip/xml/ipv6_2multicast_2README_8md.xml:  <compounddef id="ipv6_2multicast_2README_8md" kind="file" language="Markdown">
_doxygen/libuip/xml/ip64-addr_2README_8md.xml:  <compounddef id="ip64-addr_2README_8md" kind="file" language="Markdown">

Looks like you should add;

<compounddef id="ip64-addr_2README_8md" kind="file" language="Markdown">
<compounddef id="HeaderRAM_8v" kind="file" language="Verilog">
LANG_TO_LEX = {
    "IDL":          "idl",
    "Java":         "java",
    "Javascript":   "js",
    "C#":           "csharp",
    "C":            "c",
    "C++":          "cpp",
    "D":            "d",
    "PHP":          "php",
    "Objecive-C":   "objective-c",
    "Python":       "py",
    "Fortran":      "fortran",
    "FortranFree":  "fortran",
    "FortranFixed": "fortranfixed",
    "VHDL":         "vhdl",
    "Verilog":      "verilog",
    "Markdown":     "markdown",
}

@svenevs
Copy link
Owner

svenevs commented Mar 14, 2018

@mithro Excellent, I believe this is fixed now on the fix-language-lexers branch. Can you try installing it locally and seeing if it works? It adds a new config variable lexerMapping, but you shouldn't need to set that since Verilog is in LANG_TO_LEX.

You should be able to clone exhale, checkout the branch fix-language-lexers and pip3 install . -- if things work on your end without setting lexerMapping (kudos if you test it too though xD) I'll merge to master and add a new release on PyPi.

@svenevs
Copy link
Owner

svenevs commented May 25, 2018

Update: when writing the test case I ran into issues where configs._compiled_lexer_mapping tests will conflict with each other, necessitating once and for all a redo of configs to instantiate an object per execution.

@mithro the plan here is to do that and include some other updates in a 0.2.0 release, which will break your monkey patching.

@mithro
Copy link
Author

mithro commented May 25, 2018

Don't wait on me. If you have a solution that works for my use case I'll update my configs when I move to the new release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants