Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIRRTL][FIRParser] Tokenize Commas #7206

Merged
merged 5 commits into from
Jul 18, 2024

Conversation

mmaloney-sf
Copy link
Contributor

Due to historical circumstances, the SFC parser for FIRRTL treated commas (,) as whitespace. This behavior was carried over into the CIRCT implementation of firtool.

This behavior is utterly bizarre.

Moreover, the FIRRTL spec has indicated comma tokens (,) for at least every version since last year.

This PR removes this quirk and properly tokenizes commas, requiring them in the places dictated by the FIRRTL spec.

The impact of this PR should be low, but it may break code (in CIRCT, in Chisel, and elsewhere) in places where FIRRTL was being emitted with a loose interpretation, or where the spec was ambiguous.

Changes:

  • Update several tests which were leaning into the "commas-are-whitespace" behavior.
  • Removed ',' from the list of "horizontal whitespace" and added an explicit FIRToken::comma token.
  • Made changes to the various parse*() methods to consume the tokens.
  • Note the use of a pattern I found useful in several places where I use a boolean first to skip parsing comma tokens on the first iteration. Please check me here, since I learned C++ before lambdas were a thing.
  • Made a few judgment calls around unspeced constructs (smem).
    • I changed the FIRRTL emitter in one place where the syntax was unspeced, and there seemed to be conflicting examples.
  • @seldridge For some reason, it seemed ambiguous whether layer decls need commas. I opted for commas. I can swap if I got this backwards.
  • I added parseRUW() as a non-optional variant of parseOptionalRUW(), since the comma token can be used to determine optionality in one case.

@uenoku uenoku changed the title Tokenize Commas [FIRRTL][FIRParser] Tokenize Commas Jun 19, 2024
@uenoku uenoku added the FIRRTL Involving the `firrtl` dialect label Jun 19, 2024
Copy link
Member

@uenoku uenoku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation generally looks good to me! I agree that comma as whitespace is weird and should be fixed but practically I guess the hack improved parser performance quite a bit :) I'm curious how parser performance changes due to increased toke size.

@dtzSiFive dtzSiFive linked an issue Jun 19, 2024 that may be closed by this pull request
@dtzSiFive
Copy link
Contributor

dtzSiFive commented Jun 19, 2024

Technically this is only mandatory as of 4.0.0, chipsalliance/firrtl-spec#191 . However I think just requiring this regardless is fine re:compatibility.

I too am interested in impact of this.

cc chipsalliance/firrtl-spec#188 .

@mmaloney-sf
Copy link
Contributor Author

@dtzSiFive Thanks for pointing that out. I feel like I removed that verbiage and then forgot it.

And thank you for pointing out that I didn't feature-gate this. I'm not sure how I should proceed on that point. I had plans to add a lot of small tweaks to the syntax, but it could easily make it a bit of spaghetti to keep strict backwards compat. But then again, I don't know much people care about that compatibility.

@seldridge
Copy link
Member

Note: no version of Chisel that I have ever seen has ever not emitted commas unconditionally. There should be no risk in changing this and it should be backwards compatible.

@mmaloney-sf mmaloney-sf force-pushed the mmaloney/tokenize_commas branch 2 times, most recently from c89b7ff to 406b95c Compare June 26, 2024 17:54
Comment on lines 4177 to 4172
if (consumeIf(FIRToken::comma)) {
if (parseRUW(ruw) || parseOptionalInfo())
return failure();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think parseOptionalInfo needs to be called regardless of RUW.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, this both allows a dangling comma and doesn't properly parse optional parts. I think most consumeIf calls I saw are probably wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch, @uenoku.

@darthscsi I'm not sure what you mean. Feel free to add more information. If you have an example that should be added to the parse-errors file, that would be great.

Copy link
Contributor

@dtzSiFive dtzSiFive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing on this! Left some feedback, still going through but wanted to submit that much now.

lib/Dialect/FIRRTL/Import/FIRParser.cpp Outdated Show resolved Hide resolved
lib/Dialect/FIRRTL/Import/FIRParser.cpp Outdated Show resolved Hide resolved
lib/Dialect/FIRRTL/Import/FIRParser.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@darthscsi darthscsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add explicit tests that dangling commas in lists and before optional components are rejected.

@mmaloney-sf
Copy link
Contributor Author

Please add explicit tests that dangling commas in lists and before optional components are rejected.

Done.

@mmaloney-sf
Copy link
Contributor Author

If there are no other comments, could someone merge?

Copy link
Member

@uenoku uenoku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but can we merge after firtool 1.77 release? I don't believe this PR will cause an issue but just in case.

@mmaloney-sf
Copy link
Contributor Author

Sounds good to me, @uenoku

@uenoku uenoku merged commit f15c4a7 into llvm:main Jul 18, 2024
4 checks passed
@mmaloney-sf mmaloney-sf deleted the mmaloney/tokenize_commas branch July 18, 2024 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FIRRTL Involving the `firrtl` dialect
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FIRRTL] Change FIRRTL Parser to Require Commas
5 participants