Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify ID sanitize logic #62

Merged
merged 2 commits into from
Aug 31, 2020
Merged

Simplify ID sanitize logic #62

merged 2 commits into from
Aug 31, 2020

Conversation

leonardt
Copy link
Owner

Instead of performing the map lookup and regex every time we call
toString, instead just do it once when creating the ID node. This
should improve performance for code that uses toString multiple times
for identifiers (e.g. in AssignInliner we do this to populate and lookup
entries in a map)

We also may consider making this optional (via a flag to the
constructor, or as a pass that traverses the tree and sanitizes
identifiers) to further improve performance (IIRC there's some valid
CoreIR names that are invalid verilog, so we'd want to make sure we're
enforcing valid verilog names downstream before making this change).

Instead of performing the map lookup and regex every time we call
toString, instead just do it once when creating the ID node.  This
should improve performance for code that uses toString multiple times
for identifiers (e.g. in AssignInliner we do this to populate and lookup
entries in a map)

We also may consider making this optional (via a flag to the
constructor, or as a pass that traverses the tree and sanitizes
identifiers) to further improve performance (IIRC there's some valid
CoreIR names that are invalid verilog, so we'd want to make sure we're
enforcing valid verilog names downstream before making this change).
@rsetaluri
Copy link
Collaborator

Is there any reason we would want to keep both sanitized and unsanitized names (e.g. referring to the original name externally or upstream)? Is there any case where sanitization causes names previously different to be the same (i.e. collisions)?

If this solves the common case it seems reasonable, but just wondering if it might be better posed as a Transformer pass right before code generation? (Also seems pretty low implementation cost since the actual sanitization logic is already there).

@leonardt
Copy link
Owner Author

I don't think there's any reason to keep both. I think there's a set of "valid" verilog symbols, and this logic prevents the user from introducing invalid symbols, since this is a verilog AST, I don't see a compelling reason to let the user construct and manipulate an "invalid" AST (containing invalid symbols). I think this is mainly just to be a convenience for ensuring validity of symbols (alternatively the user can be responsible for sanitization, either by pre-processing the string when constructing the node, or running a pass to sanitize IDs after construction.

There is a case for collision, where a user uses a valid sanitized ID (introduced the escape), and then used a non-escaped ID that collides with the escaped ID when it's sanitized. I don't think this is a case we need to worry about, since it most likely is either a user error or the intention of the user (to have the ID sanitized). I think it's probably simplest to enforce an invariant that the constructor of the this node will sanitize IDs so the tree is valid, but there may be a case for optimization where we use a flag to guard sanitization logic (to avoid the runtime when we know the string is valid) or run a pass to do it at a later stage (but this technically breaks the invariant which isn't ideal).

@rsetaluri
Copy link
Collaborator

That makes sense to me.

@leonardt leonardt merged commit b23b953 into master Aug 31, 2020
@leonardt leonardt deleted the simplify-regex branch August 31, 2020 22:15
leonardt added a commit to rdaly525/coreir that referenced this pull request Aug 31, 2020
For leonardt/verilogAST-cpp#62 which optimizes the regex logic for ID sanitization
leonardt added a commit to rdaly525/coreir that referenced this pull request Sep 1, 2020
For leonardt/verilogAST-cpp#62 which optimizes the regex logic for ID sanitization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants