swig 4.x can not parse large c enum #2876

sunnyqeen · 2024-04-16T16:37:51Z

I try to wrap a large c enum that has more than 10000 items to csharp/java
swig will fail and nothing is generated.
I attached the example here.
test1.h.txt

By the way, swig 3.x version has no problem.

ojwb · 2024-04-16T20:47:57Z

Curiously, swig exits with status 0 but doesn't generate anything. Happens with -python too, so doesn't seem target-language specific.

git bisect points to 74adaa5 (by @vadz) - it looks to me like this introduce a right-recursive grammar rule as enumlist can now be ``enumlist_item COMMA enumlistorenumlist_item COMMA DOXYGENPOSTSTRING enumlist`.

This is something you're meant to avoid doing because "Right recursion uses up space on the Bison stack in proportion to the number of elements in the sequence, because all the elements must be shifted onto the stack before the rule can be applied even once." So I'm guessing we run out of parser stack space for a large enum.

It doesn't seem entirely trivial to rewrite this to use left recursion though as the comment needs to be attached to the thing before.

I have wondered before if it would work better for doxygen comments to not be part of the grammar, and instead we could have the last seen doxygen "pre" comment noted by the scanner and get picked up by the grammar when it parser a doxygen-able entity, and for "post" comments the grammar could note the last doxygen-able entity seen and the scanner could use that to attach any post comment it sees.

A couple of significant advantages of this approach are that doxygen comment issues could no longer result in parse errors, and the grammar would be simpler.

vadz · 2024-04-16T23:06:19Z

Sorry, I hadn't realized this change could result in such deep recursion, even if it seems pretty obvious in retrospect. But I do remember that trying to do it using left recursion was difficult, which is why I ended up doing it like this.

In principle, handling Doxygen comments outside of the grammar could indeed be better, but this would require a lot of changes and I'm not sure if such a change would be accepted. If @wsfulton thinks it would be worth it, I could at least look at it.

wsfulton · 2024-04-18T09:06:34Z

Sounds like a good idea to me. @ojwb is doing a great job of fundamental parser changes, so would love it if he continues to drive changes in this area, see #2612. I would just suggest that implementing the named references in there would be prudent before tackling a doxygen rewrite. Supporting attributes in #2840 seems like it is a similar problem to solve as attributes can appear in so many different places and similar places to doxygen comments.

wsfulton · 2024-04-18T09:15:08Z

Just so my last comment is not misinterpreted, @vadz, I'd love you to spend time on this, it just needs to be carefully coordinated with these other fundamental parser changes and as long as @ojwb is happy to continue to manage the parser changes, I'd like to just watch from the sidelines.

ojwb · 2024-04-18T21:52:57Z

Supporting attributes in #2840 seems like it is a similar problem to solve as attributes can appear in so many different places and similar places to doxygen comments.

The same thought had struck me.

I would just suggest that implementing the named references in there would be prudent before tackling a doxygen rewrite.

I can take a look at moving to named references. Although it's a very widespread change to the parser source, I think we can have confidence that it doesn't break anything because the generated code should be unchanged based on my limited poking at this so far (because internally bison resolves the named reference to the same thing as the current number reference).

ojwb · 2024-04-19T02:47:47Z

I can take a look at moving to named references.

Done in 393de5b.

Previously SWIG would quietly exit with status 0 in this situation. See swig#2876

ojwb · 2024-05-15T00:08:40Z

Curiously, swig exits with status 0 but doesn't generate anything.

I've pushed a change to test the return value of yyparse() and exit with an error if it indicates the maximum parser stack depth was reached, which at least means we now report these situations clearly.

ojwb added the parser label Apr 16, 2024

ojwb mentioned this issue Apr 18, 2024

Review right recursion rules in our parser #2884

Open

ojwb added a commit to ojwb/swig that referenced this issue May 14, 2024

Report error if parser stack depth exceeded

6fed8d9

Previously SWIG would quietly exit with status 0 in this situation. See swig#2876

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

swig 4.x can not parse large c enum #2876

swig 4.x can not parse large c enum #2876

sunnyqeen commented Apr 16, 2024

ojwb commented Apr 16, 2024

vadz commented Apr 16, 2024

wsfulton commented Apr 18, 2024

wsfulton commented Apr 18, 2024

ojwb commented Apr 18, 2024

ojwb commented Apr 19, 2024

ojwb commented May 15, 2024

swig 4.x can not parse large c enum #2876

swig 4.x can not parse large c enum #2876

Comments

sunnyqeen commented Apr 16, 2024

ojwb commented Apr 16, 2024

vadz commented Apr 16, 2024

wsfulton commented Apr 18, 2024

wsfulton commented Apr 18, 2024

ojwb commented Apr 18, 2024

ojwb commented Apr 19, 2024

ojwb commented May 15, 2024