-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify ID sanitize logic #62
Conversation
Instead of performing the map lookup and regex every time we call toString, instead just do it once when creating the ID node. This should improve performance for code that uses toString multiple times for identifiers (e.g. in AssignInliner we do this to populate and lookup entries in a map) We also may consider making this optional (via a flag to the constructor, or as a pass that traverses the tree and sanitizes identifiers) to further improve performance (IIRC there's some valid CoreIR names that are invalid verilog, so we'd want to make sure we're enforcing valid verilog names downstream before making this change).
Is there any reason we would want to keep both sanitized and unsanitized names (e.g. referring to the original name externally or upstream)? Is there any case where sanitization causes names previously different to be the same (i.e. collisions)? If this solves the common case it seems reasonable, but just wondering if it might be better posed as a Transformer pass right before code generation? (Also seems pretty low implementation cost since the actual sanitization logic is already there). |
I don't think there's any reason to keep both. I think there's a set of "valid" verilog symbols, and this logic prevents the user from introducing invalid symbols, since this is a verilog AST, I don't see a compelling reason to let the user construct and manipulate an "invalid" AST (containing invalid symbols). I think this is mainly just to be a convenience for ensuring validity of symbols (alternatively the user can be responsible for sanitization, either by pre-processing the string when constructing the node, or running a pass to sanitize IDs after construction. There is a case for collision, where a user uses a valid sanitized ID (introduced the escape), and then used a non-escaped ID that collides with the escaped ID when it's sanitized. I don't think this is a case we need to worry about, since it most likely is either a user error or the intention of the user (to have the ID sanitized). I think it's probably simplest to enforce an invariant that the constructor of the this node will sanitize IDs so the tree is valid, but there may be a case for optimization where we use a flag to guard sanitization logic (to avoid the runtime when we know the string is valid) or run a pass to do it at a later stage (but this technically breaks the invariant which isn't ideal). |
That makes sense to me. |
For leonardt/verilogAST-cpp#62 which optimizes the regex logic for ID sanitization
For leonardt/verilogAST-cpp#62 which optimizes the regex logic for ID sanitization
Instead of performing the map lookup and regex every time we call
toString, instead just do it once when creating the ID node. This
should improve performance for code that uses toString multiple times
for identifiers (e.g. in AssignInliner we do this to populate and lookup
entries in a map)
We also may consider making this optional (via a flag to the
constructor, or as a pass that traverses the tree and sanitizes
identifiers) to further improve performance (IIRC there's some valid
CoreIR names that are invalid verilog, so we'd want to make sure we're
enforcing valid verilog names downstream before making this change).