Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Add word boundary support #167
I have a rule in my grammar to match numbers containing commas with an optional decimal component:
Given an input of
If word boundaries were supported, I could change my pattern to:
Then the rule would theoretically know that the end of the string is met and return the token prior to reaching the end of the string.
I'm not sure how much work would be involved to support this, so if you have a recommendation for a workaround please let me know. My only idea to address this is to specifically include null bytes in my rules.
Word boundary woudn't help in this case: it would need to read one character past valid input
Works like that:
However, if you cannot guarantee existence of terminating character (your input is valid up to the last character and cannot be temporarily modified to end with a terminating character), then you cannot use sentinel method to stop the lexer. There is a number of other options.
Your definition of
If you don't use buffering and you just want to stop at exactly
Yesterday evening after posting I attempted to include null byte checks in my lexer and it worked with surprisingly few modifications, however I believe your last paragraph is exactly what I need. Thank you for your detailed and timely response! I'm going to try your suggestions next.