Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
GitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
regexp: RuneReader functions are hard to use correctly. #5988
The functions taking a RuneReader will read 2 runes past the end of the match. This means that unless your RuneReader supports seeking, you will lose some of your input. bufio is the most common RuneReader and it does not support seeking. We can defer reading the lookahead rune until after we've checked for a match. That gets gives us one of the Runes. However, we must read one past the final match because that rune might also match. We could consider using RuneScanner and putting the rune back after the final match. Not sure if it's possible to have the property of just reading the match in all cases. I think it might break when the last op is an alternation or a conditional. Still, I think it's worth doing.
The docs are clear about this: "Note that regular expression matches may need to examine text beyond the text returned by a match, so the methods that match text from a RuneReader may read arbitrarily far into the input before returning." It's not 2, it's arbitrarily far. If you are matching /x(.*y)?/ you have to read the entire input just in case there is a y that would extend the match. This is fundamental to regexp search. Perhaps it is true that they read 2 runes beyond the match no matter what; I don't know. If so, that strikes me as a good thing, because then people will be less surprised by longer "readahead".
Status changed to Unfortunate.