Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Enhanced RDoc for Regexp #5812

Merged
merged 2 commits into from Apr 16, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
10 changes: 9 additions & 1 deletion doc/regexp.rdoc
Expand Up @@ -30,6 +30,13 @@ _s_ followed by the letter _t_, so it matches _haystack_, also.
Note that any Regexp matching will raise a RuntimeError if timeout is set and
exceeded. See "Timeout" section in detail.

== \Regexp Interpolation

A regexp may contain interpolated strings; trivially:

foo = 'bar'
/#{foo}/ # => /bar/

== <tt>=~</tt> and Regexp#match

Pattern matching may be achieved by using <tt>=~</tt> operator or Regexp#match
Expand Down Expand Up @@ -672,9 +679,10 @@ regexp's encoding can be explicitly fixed by supplying
# raises Encoding::CompatibilityError: incompatible encoding regexp match
# (ISO-8859-1 regexp with UTF-8 string)

== Special Global Variables
== \Regexp Global Variables

Pattern matching sets some global variables :

* <tt>$~</tt> is equivalent to Regexp.last_match;
* <tt>$&</tt> contains the complete matched text;
* <tt>$`</tt> contains string before match;
Expand Down
196 changes: 105 additions & 91 deletions re.c
Expand Up @@ -1488,31 +1488,31 @@ rb_backref_set_string(VALUE string, long pos, long len)

/*
* call-seq:
* rxp.fixed_encoding? -> true or false
*
* Returns false if rxp is applicable to
* a string with any ASCII compatible encoding.
* Returns true otherwise.
*
* r = /a/
* r.fixed_encoding? #=> false
* r =~ "\u{6666} a" #=> 2
* r =~ "\xa1\xa2 a".force_encoding("euc-jp") #=> 2
* r =~ "abc".force_encoding("euc-jp") #=> 0
*
* r = /a/u
* r.fixed_encoding? #=> true
* r.encoding #=> #<Encoding:UTF-8>
* r =~ "\u{6666} a" #=> 2
* r =~ "\xa1\xa2".force_encoding("euc-jp") #=> Encoding::CompatibilityError
* r =~ "abc".force_encoding("euc-jp") #=> 0
*
* r = /\u{6666}/
* r.fixed_encoding? #=> true
* r.encoding #=> #<Encoding:UTF-8>
* r =~ "\u{6666} a" #=> 0
* r =~ "\xa1\xa2".force_encoding("euc-jp") #=> Encoding::CompatibilityError
* r =~ "abc".force_encoding("euc-jp") #=> nil
* fixed_encoding? -> true or false
*
* Returns +false+ if +self+ is applicable to
* a string with any ASCII-compatible encoding;
* otherwise returns +true+:
*
* r = /a/ # => /a/
* r.fixed_encoding? # => false
* r.match?("\u{6666} a") # => true
* r.match?("\xa1\xa2 a".force_encoding("euc-jp")) # => true
* r.match?("abc".force_encoding("euc-jp")) # => true
*
* r = /a/u # => /a/
* r.fixed_encoding? # => true
* r.match?("\u{6666} a") # => true
* r.match?("\xa1\xa2".force_encoding("euc-jp")) # Raises exception.
* r.match?("abc".force_encoding("euc-jp")) # => true
*
* r = /\u{6666}/ # => /\u{6666}/
* r.fixed_encoding? # => true
* r.encoding # => #<Encoding:UTF-8>
* r.match?("\u{6666} a") # => true
* r.match?("\xa1\xa2".force_encoding("euc-jp")) # Raises exception.
* r.match?("abc".force_encoding("euc-jp")) # => false
*
*/

static VALUE
Expand Down Expand Up @@ -3116,12 +3116,13 @@ rb_reg_regcomp(VALUE str)

static st_index_t reg_hash(VALUE re);
/*
* call-seq:
* rxp.hash -> integer
* call-seq:
* hash -> integer
*
* Produce a hash based on the text and options of this regular expression.
* Returns the integer hash value for +self+.
*
* Related: Object#hash.
*
* See also Object#hash.
*/

VALUE
Expand All @@ -3145,17 +3146,18 @@ reg_hash(VALUE re)

/*
* call-seq:
* rxp == other_rxp -> true or false
* rxp.eql?(other_rxp) -> true or false
* regexp == object -> true or false
*
* Returns +true+ if +object+ is another \Regexp whose pattern,
* flags, and encoding are the same as +self+, +false+ otherwise:
*
* Equality---Two regexps are equal if their patterns are identical, they have
* the same character set code, and their <code>casefold?</code> values are the
* same.
* /foo/ == Regexp.new('foo') # => true
* /foo/ == /foo/i # => false
* /foo/ == Regexp.new('food') # => false
* /foo/ == Regexp.new("abc".force_encoding("euc-jp")) # => false
*
* Regexp#eql? is an alias for Regexp#==.
*
* /abc/ == /abc/x #=> false
* /abc/ == /abc/i #=> false
* /abc/ == /abc/u #=> false
* /abc/u == /abc/n #=> false
*/

VALUE
Expand Down Expand Up @@ -3264,49 +3266,57 @@ reg_match_pos(VALUE re, VALUE *strp, long pos, VALUE* set_match)

/*
* call-seq:
* rxp =~ str -> integer or nil
* regexp =~ string -> integer or nil
*
* Returns the integer index (in characters) of the first match
* for +self+ and +string+, or +nil+ if none;
* also sets the
* {rdoc-ref:Regexp Global Variables}[rdoc-ref:Regexp@Regexp+Global+Variables]:
*
* Match---Matches <i>rxp</i> against <i>str</i>.
* /at/ =~ 'input data' # => 7
* $~ # => #<MatchData "at">
* /ax/ =~ 'input data' # => nil
* $~ # => nil
*
* /at/ =~ "input data" #=> 7
* /ax/ =~ "input data" #=> nil
* Assigns named captures to local variables of the same names
* if and only if +self+:
*
* If <code>=~</code> is used with a regexp literal with named captures,
* captured strings (or nil) is assigned to local variables named by
* the capture names.
* - Is a regexp literal;
* see {Regexp Literals}[rdoc-ref:literals.rdoc@Regexp+Literals].
* - Does not contain interpolations;
* see {Regexp Interpolation}[rdoc-ref:Regexp@Regexp+Interpolation].
* - Is at the left of the expression.
*
* /(?<lhs>\w+)\s*=\s*(?<rhs>\w+)/ =~ " x = y "
* p lhs #=> "x"
* p rhs #=> "y"
* Example:
*
* If it is not matched, nil is assigned for the variables.
* /(?<lhs>\w+)\s*=\s*(?<rhs>\w+)/ =~ ' x = y '
* p lhs # => "x"
* p rhs # => "y"
*
* /(?<lhs>\w+)\s*=\s*(?<rhs>\w+)/ =~ " x = "
* p lhs #=> nil
* p rhs #=> nil
* Assigns +nil+ if not matched:
*
* This assignment is implemented in the Ruby parser.
* The parser detects 'regexp-literal =~ expression' for the assignment.
* The regexp must be a literal without interpolation and placed at left hand side.
* /(?<lhs>\w+)\s*=\s*(?<rhs>\w+)/ =~ ' x = '
* p lhs # => nil
* p rhs # => nil
*
* The assignment does not occur if the regexp is not a literal.
* Does not make local variable assignments if +self+ is not a regexp literal:
*
* re = /(?<lhs>\w+)\s*=\s*(?<rhs>\w+)/
* re =~ " x = y "
* p lhs # undefined local variable
* p rhs # undefined local variable
* r = /(?<foo>\w+)\s*=\s*(?<foo>\w+)/
* r =~ ' x = y '
* p foo # Undefined local variable
* p bar # Undefined local variable
*
* A regexp interpolation, <code>#{}</code>, also disables
* the assignment.
* The assignment does not occur if the regexp is not at the left:
*
* rhs_pat = /(?<rhs>\w+)/
* /(?<lhs>\w+)\s*=\s*#{rhs_pat}/ =~ "x = y"
* p lhs # undefined local variable
* ' x = y ' =~ /(?<foo>\w+)\s*=\s*(?<foo>\w+)/
* p foo, foo # Undefined local variables
*
* The assignment does not occur if the regexp is placed at the right hand side.
* A regexp interpolation, <tt>#{}</tt>, also disables
* the assignment:
*
* " x = y " =~ /(?<lhs>\w+)\s*=\s*(?<rhs>\w+)/
* p lhs, rhs # undefined local variable
* r = /(?<foo>\w+)/
* /(?<foo>\w+)\s*=\s*#{r}/ =~ 'x = y'
* p foo # Undefined local variable
*
*/

Expand Down Expand Up @@ -3388,34 +3398,38 @@ rb_reg_match2(VALUE re)

/*
* call-seq:
* rxp.match(str, pos=0) -> matchdata or nil
* rxp.match(str, pos=0) {|match| block } -> obj
* match(string, offset = 0) -> matchdata or nil
* match(string, offset = 0) {|matchdata| ... } -> object
*
* Returns a MatchData object describing the match, or
* <code>nil</code> if there was no match. This is equivalent to
* retrieving the value of the special variable <code>$~</code>
* following a normal match. If the second parameter is present, it
* specifies the position in the string to begin the search.
* With no block given, returns the MatchData object
* that describes the match, if any, or +nil+ if none;
* the search begins at the given byte +offset+ in +self+:
*
* /(.)(.)(.)/.match("abc")[2] #=> "b"
* /(.)(.)/.match("abc", 1)[2] #=> "c"
* /abra/.match('abracadabra') # => #<MatchData "abra">
* /abra/.match('abracadabra', 4) # => #<MatchData "abra">
* /abra/.match('abracadabra', 8) # => nil
* /abra/.match('abracadabra', 800) # => nil
*
* If a block is given, invoke the block with MatchData if match succeed, so
* that you can write
* With a block given, calls the block if and only if a match is found;
* returns the block's value:
*
* /M(.*)/.match("Matz") do |m|
* puts m[0]
* puts m[1]
* end
* /abra/.match('abracadabra') {|matchdata| p matchdata }
* # => #<MatchData "abra">
* /abra/.match('abracadabra', 4) {|matchdata| p matchdata }
* # => #<MatchData "abra">
* /abra/.match('abracadabra', 8) {|matchdata| p matchdata }
* # => nil
* /abra/.match('abracadabra', 8) {|marchdata| fail 'Cannot happen' }
* # => nil
*
* instead of
* Output (from the first two blocks above):
*
* if m = /M(.*)/.match("Matz")
* puts m[0]
* puts m[1]
* end
* #<MatchData "abra">
* #<MatchData "abra">
*
* /(.)(.)(.)/.match("abc")[2] #=> "b"
* /(.)(.)/.match("abc", 1)[2] #=> "c"
*
* The return value is a value from block execution in this case.
*/

static VALUE
Expand Down Expand Up @@ -3445,8 +3459,8 @@ rb_reg_match_m(int argc, VALUE *argv, VALUE re)

/*
* call-seq:
* rxp.match?(str) -> true or false
* rxp.match?(str, pos=0) -> true or false
* match?(string) -> true or false
* match?(string, offset = 0) -> true or false
*
* Returns <code>true</code> or <code>false</code> to indicate whether the
* regexp is matched or not without updating $~ and other related variables.
Expand Down