Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Help wanted] Turn lexer back into flex definition #485

Closed
ISSOtm opened this issue Feb 17, 2020 · 8 comments · Fixed by #557
Closed

[Help wanted] Turn lexer back into flex definition #485

ISSOtm opened this issue Feb 17, 2020 · 8 comments · Fixed by #557
Labels
enhancement Typically new features; lesser priority than bugs rgbasm This affects RGBASM
Milestone

Comments

@ISSOtm
Copy link
Member

ISSOtm commented Feb 17, 2020

The lexer is currently a 1 kloc (+ 800 kloc if you count globlex) monstrosity of a modified auto-generated file.
This should really be changed back to a flex definition somehow. The modifications made to it might require a custom program skeleton (probably, I think, but eh), but that would be certainly much better than what we currently have.

@ISSOtm ISSOtm added enhancement Typically new features; lesser priority than bugs rgbasm This affects RGBASM labels Feb 17, 2020
@ISSOtm ISSOtm added this to the v1.0.0 milestone Feb 17, 2020
@ISSOtm ISSOtm pinned this issue Apr 9, 2020
@ISSOtm ISSOtm mentioned this issue Apr 9, 2020
@ISSOtm
Copy link
Member Author

ISSOtm commented Apr 11, 2020

This is actually progressing—who would've guessed.

I have currently the following left to implement (at least, that I'm aware of):

  • Symbol interpolation in strings (in progress)
  • Block scanning (macros, if, rept)
  • "Naked" symbol interpolation

We do have a problem, though. Some features need to be axed in order for this to be possible..!

  • OPT b and g, and relevant CLI flags. Those relied on modifying the lexer's algorithm at run-time, which I don't know how to do with flex, if at all possible.
  • Naked symbol interpolation. This is significantly complex to handle, and cannot be handled in the lexer proper due to spanning tokens. I would much rather prefer introducing an alternative, such as insta-expanded EQUS (EVAL?).
  • The EQUS stack (and recursion limit). It's hard to know at which position we're exiting a symbol now. Might be doable, I need to check YY_USER_ACTION.

Side note: if the above two OPTs are removed, this only leaves OPT p, which I believe is made redundant by ds cnt, val. So then OPT could probably be removed entirely.

@JL2210
Copy link
Member

JL2210 commented Apr 12, 2020

How will this affect existing code? Can you give examples of what won't work afterwards?

@ISSOtm
Copy link
Member Author

ISSOtm commented Apr 13, 2020

How this affects existing code is the question I always ask myself about each feature change. We don't have any usage statistics beyond pret's disassemblies, and this doesn't even include any hacks based on them.

  • I do believe OPT and the flags see very little use. According to @Rangi42, they use OPT b in a single place to "draw" some graphic using bits. It has been suggested to shim around this by supporting two binary / "gfx" digit sets, the current default and the one presented in the documentation.
  • Symbol interpolation outside of strings is actually pretty useless. People expect it to work like macro args ({symbol} should work in label names #269) but in reality the docs do state it: {sym} is strictly equivalent to "{sym}", except in macro args (because). I discuss my stance on implementing this below.
  • The EQUS stack missing comes from the fact that while it's pretty clear when an expansion begins, when it ends is much less so, since it requires char counting. This means that EQUS expansion status can no longer be given in error reports, and infinite recursion can no longer be trapped. That said, there's a facility to execute code when any rule is matched, so that might be usable for implementing EQUS tracking again.

The reason why I do not want to make "naked" interpolations work like macro args is that parsing such interpolations is very complex, especially as they can be nested. Thus, it's OK to handle inside a string, since we're already working with a blob of data, but outside of it, it's expected to span multiple tokens (like macro args).

Macro args are not handled in the lexer rules proper, since they happen beyond the concept of tokens; instead, the code responsible for filling the lexer's text buffer (YY_INPUT) is overridden to look for macro args, and "paste" their contents in. The complication is that flex puts a cap on the amount of characters that it expects from the function, so the entire contents can't be plainly dumped.
To avoid unnecessary copies/allocations, and thus keep the lexer—one of the two most performance-critical pieces of RGBASM—efficient, additional logic is present to "spread" the pasting across buffer refills.

The problem that "naked" symbol interpolations introduce is that they are really complicated to process. The hardest part is that they can be nested, which opens the can of worms of where to write the expanded results to, and so on. (It worked in the old lexer because the target buffer was large)

@JL2210
Copy link
Member

JL2210 commented Apr 27, 2020

The only issue I see is that some auto-generation tools (particularly sdcc) generate an .optsdcc -mgbz80.

@ISSOtm
Copy link
Member Author

ISSOtm commented Apr 27, 2020

This is irrelevant, because SDCC uses a custom weird assembler as its back-end instead of RGBDS.

@JL2210
Copy link
Member

JL2210 commented Apr 27, 2020

@ISSOtm And it also has an option to generate RGBDS assembly, which is where I found that example.

@ISSOtm
Copy link
Member Author

ISSOtm commented Apr 27, 2020

But that wouldn't be valid RGBDS syntax; where is the code generating that?

@JL2210
Copy link
Member

JL2210 commented Apr 27, 2020

My best guess would be in sdcc-4.0.0/sdas/asgb/gbpst.c, and then maybe sdcc-4.0.0/src/z80/mappings.i.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Typically new features; lesser priority than bugs rgbasm This affects RGBASM
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants