Simplify ID sanitize logic #62

leonardt · 2020-08-28T21:35:14Z

Instead of performing the map lookup and regex every time we call
toString, instead just do it once when creating the ID node. This
should improve performance for code that uses toString multiple times
for identifiers (e.g. in AssignInliner we do this to populate and lookup
entries in a map)

We also may consider making this optional (via a flag to the
constructor, or as a pass that traverses the tree and sanitizes
identifiers) to further improve performance (IIRC there's some valid
CoreIR names that are invalid verilog, so we'd want to make sure we're
enforcing valid verilog names downstream before making this change).

Instead of performing the map lookup and regex every time we call toString, instead just do it once when creating the ID node. This should improve performance for code that uses toString multiple times for identifiers (e.g. in AssignInliner we do this to populate and lookup entries in a map) We also may consider making this optional (via a flag to the constructor, or as a pass that traverses the tree and sanitizes identifiers) to further improve performance (IIRC there's some valid CoreIR names that are invalid verilog, so we'd want to make sure we're enforcing valid verilog names downstream before making this change).

rsetaluri · 2020-08-31T21:10:58Z

Is there any reason we would want to keep both sanitized and unsanitized names (e.g. referring to the original name externally or upstream)? Is there any case where sanitization causes names previously different to be the same (i.e. collisions)?

If this solves the common case it seems reasonable, but just wondering if it might be better posed as a Transformer pass right before code generation? (Also seems pretty low implementation cost since the actual sanitization logic is already there).

leonardt · 2020-08-31T21:35:36Z

I don't think there's any reason to keep both. I think there's a set of "valid" verilog symbols, and this logic prevents the user from introducing invalid symbols, since this is a verilog AST, I don't see a compelling reason to let the user construct and manipulate an "invalid" AST (containing invalid symbols). I think this is mainly just to be a convenience for ensuring validity of symbols (alternatively the user can be responsible for sanitization, either by pre-processing the string when constructing the node, or running a pass to sanitize IDs after construction.

There is a case for collision, where a user uses a valid sanitized ID (introduced the escape), and then used a non-escaped ID that collides with the escaped ID when it's sanitized. I don't think this is a case we need to worry about, since it most likely is either a user error or the intention of the user (to have the ID sanitized). I think it's probably simplest to enforce an invariant that the constructor of the this node will sanitize IDs so the tree is valid, but there may be a case for optimization where we use a flag to guard sanitization logic (to avoid the runtime when we know the string is valid) or run a pass to do it at a later stage (but this technically breaks the invariant which isn't ideal).

rsetaluri · 2020-08-31T21:59:30Z

That makes sense to me.

For leonardt/verilogAST-cpp#62 which optimizes the regex logic for ID sanitization

leonardt requested a review from rsetaluri August 28, 2020 21:35

Merge branch 'master' into simplify-regex

f8793ca

rsetaluri approved these changes Aug 31, 2020

View reviewed changes

leonardt merged commit b23b953 into master Aug 31, 2020

leonardt deleted the simplify-regex branch August 31, 2020 22:15

leonardt added a commit to rdaly525/coreir that referenced this pull request Aug 31, 2020

Update to latest verilogAST-cpp

3a76bb9

For leonardt/verilogAST-cpp#62 which optimizes the regex logic for ID sanitization

leonardt mentioned this pull request Aug 31, 2020

Update to latest verilogAST-cpp rdaly525/coreir#958

Merged

leonardt added a commit to rdaly525/coreir that referenced this pull request Sep 1, 2020

Update to latest verilogAST-cpp (#958)

5be0954

For leonardt/verilogAST-cpp#62 which optimizes the regex logic for ID sanitization

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify ID sanitize logic #62

Simplify ID sanitize logic #62

leonardt commented Aug 28, 2020

rsetaluri commented Aug 31, 2020

leonardt commented Aug 31, 2020

rsetaluri commented Aug 31, 2020

Simplify ID sanitize logic #62

Simplify ID sanitize logic #62

Conversation

leonardt commented Aug 28, 2020

rsetaluri commented Aug 31, 2020

leonardt commented Aug 31, 2020

rsetaluri commented Aug 31, 2020