Skip to content

Commit 2ac97dd

Browse files
committed
Add 11: "Regular Extremism"
1 parent bb02ae4 commit 2ac97dd

File tree

1 file changed

+106
-0
lines changed

1 file changed

+106
-0
lines changed
+106
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
---
2+
title: Regular Extremism
3+
date: 2015-05-11
4+
tags: strings, rexex
5+
---
6+
7+
You are here for a collection of 10 advanced features of regular expressions in Ruby!
8+
9+
ARTICLE
10+
11+
## Regex Conditionals
12+
13+
Regular expressions can have embedded conditionals (*if-then-else*) with `(?ref)then|else`. "ref" stands for a group reference (number or name of a capture group):
14+
15+
# will match everything if string contains "ä", or only match first two chars
16+
regex = /(?=(.*ä))?(?(1).*|..)/
17+
18+
"Ruby"[regex] #=> "Ru"
19+
"Idiosyncrätic"[regex] #=> "Idiosyncrätic"
20+
21+
## Keep Expressions
22+
23+
The possible ways to [look around](http://www.regular-expressions.info/lookaround.html) within a regex are:
24+
25+
Syntax | Description | Example
26+
---------|---------------------|-------------------------------
27+
`(?=X)` | Positive lookahead | `"Ruby"[/.(?=b)/] #=> "u"`
28+
`(?!X)` | Negative lookahead | `"Ruby"[/.(?!u)/] #=> "u"`
29+
`(?<=X)` | Positive lookbehind | `"Ruby"[/(?<=u)./] #=> "b"`
30+
`(?!X)` | Negative lookbehind | `"Ruby"[/(?<!R|^)./] #=> "b"`
31+
32+
But Ruby also has an additional shortcut syntax to do *positive lookbehinds* via `\K`:
33+
34+
"Ruby"[/Ru\Kby/] #=> "by"
35+
"Ruby"[/ru\Kby/] #=> nil
36+
37+
## Character Class Intersections
38+
39+
You can nest character classes and AND-connect them with `&&`. Matching all non-vowels here:
40+
41+
"Idiosyncratic".scan /[[a-z]&&[^aeiou]]+/
42+
# => ["d", "syncr", "t", "c"]
43+
44+
## Regex Sub-Expressions
45+
46+
You can recursively apply regex groups again with `\g<ref>`. "ref" stands for a group reference (number or name of a capture group). This is different from back-references (`\1` .. `\9`), which will re-match the already matched string, instead of executing the regex again:
47+
48+
# match any number of sequences of 3 identical chars
49+
regex = /((.)\2{2})\g<1>*/
50+
"aaa"[regex] #=> "aaa"
51+
"abc"[regex] #=> nil
52+
"aaab"[regex] #=> "aaa"
53+
"aaabbb"[regex] #=> "aaabbb"
54+
"aaabbbc"[regex] #=> "aaabbb"
55+
"aaabbbccc"[regex] #=> "aaabbbccc"
56+
57+
## Match Characters that Belong Together
58+
59+
`\X` treats combined characters as a single character. See [grapheme clusters](http://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries) for more information.
60+
61+
string = "R\u{030A}uby"
62+
string[/./] #=> "R"
63+
string[/.../] #=> "R̊u"
64+
string[/\X\X/] #=> "R̊u"
65+
66+
## Relative Back-References
67+
68+
Back-refs can be relatively referenced from the current position via `\k<-n>`:
69+
70+
"Ruby by"[/(R)(u)(by) \k<-1>/] #=> "Ruby by"
71+
72+
73+
## Deactivate Backtracking
74+
75+
[Atomic groups](http://www.regular-expressions.info/atomic.html), defined via `(?>X)`, will always try to match the first of all alternatives:
76+
77+
"Rüby"[/R(u*|ü)by/] #=> "Rüby"
78+
"Rüby"[/R(?>u*|ü)by/] #=> nil
79+
80+
## Turn On Unicode-Matching for `\w`, `\d`, `\s`, and `\b`
81+
82+
"Rüby"[/\w*/] #=> "R"
83+
"Rüby"[/(?u)\w*/] #=> "Rüby"
84+
85+
## Continue Matching at Last Match Position
86+
87+
When using a method that matches a regex multiple times against a string (like `String#gsub` or `String#scan`), you can reference the position of the last match via `\G`:
88+
89+
"923823723".scan(/\G(.)23/) #=> [["9"], ["8"], ["7"]]
90+
91+
## `String#split` with Capture Groups
92+
93+
The normal way of using `String#split` is this:
94+
95+
"0-0".split(/-/) #=> ["0", "0"]
96+
97+
But if you want to make your code as hard to read as possible, remember that captured groups will be added to the resulting array:
98+
99+
"0-0".split(/(-)/) #=> ["0", "-", "0"]
100+
"0-0".split(/-(?=(.))/) #=> ["0", "0", "0"]
101+
"0-0".split(/(((-)))/) #=> ["0", "-", "-", "-", "0"]
102+
103+
## Resources
104+
105+
- [RDoc: Regexp](http://ruby-doc.org/core-2.2.2/Regexp.html)
106+
- [Onigmo Documentation](https://github.com/k-takata/Onigmo/blob/master/doc/RE)

0 commit comments

Comments
 (0)