Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.Sign up
Improve document about the absence operator #87
After the release of Ruby 2.4.1, some blog posts about the absent operator were written, and they caused some discussions.
Unfortunately, some people might not understand the advantage of the absent operator. Maybe one of the reason is that the original paper by Tanaka Akira is written in Japanese. Another reason would be that the document of the operator in Onigmo is not enough.
First, I'd like to translate the important points of his paper very very roughly.
Complement sets are useful to express C-style comments, CR LF terminated lines,
Regular expressions consist of:
A regular expressions engine can be implemented with DFA or backtracking.
3.1. Abstract Syntax Tree of regular expressions
3.2. A basic regular expressions engine
A basic implementation by Ruby.
# re: AST of a regex # str: An array of characters # pos: Start position # block: A block executed when it is matched # (Callback) def try(re, str, pos, &block)
If a regex engine uses DFA, complement sets can be handled easily.
(Note: Onigmo actually uses (?~r) instead of !r, because !r is not compatible
This is also described in pp.26-27 of his slide:
7.1. Easy to write
C-style comments can be expressed with the following:
("a" and "b" are used here instead of "/" and "*" to reduce the complexity of
If the absent operator is used, it can be:
7.2. Fit with regular language theory
7.2.1. Repetition of lazy match
This doesn't work well when concatenated with other regex. E.g.
This wrongly matches
this correctly matches
7.2.2. No backtracking
This works well when concatenated with
However this hardly depends on the strategy of backtracking, and it doesn't
7.2.3. Negative lookahead
This works well.
Note: I'm not sure that the negative lookahead is regular or not.
7.3. Inefficiency of complement sets
It is possible to implement a negation operator which matches the complement
Ragel has the following operators:
And partial translation of his slide.
I think so. "absent operator" is a confusing name, because it could just as well refer to an operator that is not there. "absence operator" does not have this problem.
On the other hand, would you say that it is also a group? In the docs you have put it under "7. Extended groups". If so, it might be more consistent to use the adjective form, as with "passive" or "atomic". I'd still choose clarity over consistency, though.
Thank you for the explanation.
In Akira's paper,