Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

swig 4.x can not parse large c enum #2876

Open
sunnyqeen opened this issue Apr 16, 2024 · 7 comments
Open

swig 4.x can not parse large c enum #2876

sunnyqeen opened this issue Apr 16, 2024 · 7 comments
Labels

Comments

@sunnyqeen
Copy link

I try to wrap a large c enum that has more than 10000 items to csharp/java
swig will fail and nothing is generated.
I attached the example here.
test1.h.txt

By the way, swig 3.x version has no problem.

@ojwb
Copy link
Member

ojwb commented Apr 16, 2024

Curiously, swig exits with status 0 but doesn't generate anything. Happens with -python too, so doesn't seem target-language specific.

git bisect points to 74adaa5 (by @vadz) - it looks to me like this introduce a right-recursive grammar rule as enumlist can now be ``enumlist_item COMMA enumlistorenumlist_item COMMA DOXYGENPOSTSTRING enumlist`.

This is something you're meant to avoid doing because "Right recursion uses up space on the Bison stack in proportion to the number of elements in the sequence, because all the elements must be shifted onto the stack before the rule can be applied even once." So I'm guessing we run out of parser stack space for a large enum.

It doesn't seem entirely trivial to rewrite this to use left recursion though as the comment needs to be attached to the thing before.

I have wondered before if it would work better for doxygen comments to not be part of the grammar, and instead we could have the last seen doxygen "pre" comment noted by the scanner and get picked up by the grammar when it parser a doxygen-able entity, and for "post" comments the grammar could note the last doxygen-able entity seen and the scanner could use that to attach any post comment it sees.

A couple of significant advantages of this approach are that doxygen comment issues could no longer result in parse errors, and the grammar would be simpler.

@ojwb ojwb added the parser label Apr 16, 2024
@vadz
Copy link
Member

vadz commented Apr 16, 2024

Sorry, I hadn't realized this change could result in such deep recursion, even if it seems pretty obvious in retrospect. But I do remember that trying to do it using left recursion was difficult, which is why I ended up doing it like this.

In principle, handling Doxygen comments outside of the grammar could indeed be better, but this would require a lot of changes and I'm not sure if such a change would be accepted. If @wsfulton thinks it would be worth it, I could at least look at it.

@wsfulton
Copy link
Member

Sounds like a good idea to me. @ojwb is doing a great job of fundamental parser changes, so would love it if he continues to drive changes in this area, see #2612. I would just suggest that implementing the named references in there would be prudent before tackling a doxygen rewrite. Supporting attributes in #2840 seems like it is a similar problem to solve as attributes can appear in so many different places and similar places to doxygen comments.

@wsfulton
Copy link
Member

Just so my last comment is not misinterpreted, @vadz, I'd love you to spend time on this, it just needs to be carefully coordinated with these other fundamental parser changes and as long as @ojwb is happy to continue to manage the parser changes, I'd like to just watch from the sidelines.

@ojwb
Copy link
Member

ojwb commented Apr 18, 2024

Supporting attributes in #2840 seems like it is a similar problem to solve as attributes can appear in so many different places and similar places to doxygen comments.

The same thought had struck me.

I would just suggest that implementing the named references in there would be prudent before tackling a doxygen rewrite.

I can take a look at moving to named references. Although it's a very widespread change to the parser source, I think we can have confidence that it doesn't break anything because the generated code should be unchanged based on my limited poking at this so far (because internally bison resolves the named reference to the same thing as the current number reference).

@ojwb
Copy link
Member

ojwb commented Apr 19, 2024

I can take a look at moving to named references.

Done in 393de5b.

ojwb added a commit to ojwb/swig that referenced this issue May 14, 2024
Previously SWIG would quietly exit with status 0 in this situation.

See swig#2876
@ojwb
Copy link
Member

ojwb commented May 15, 2024

Curiously, swig exits with status 0 but doesn't generate anything.

I've pushed a change to test the return value of yyparse() and exit with an error if it indicates the maximum parser stack depth was reached, which at least means we now report these situations clearly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants