default_highlighter |
---|
oils-sh |
YSH has Egg Expressions, a composable and readable syntax for regular expressions. You can use Eggex with both:
-
A convenient Perl-like operator:
'mystr' ~ / [a-z]+/
- access submatches with global
_group()
_start()
_end()
- access submatches with global
-
A powerful Python-like API:
'mystr' => search(/ [a-z]+ /)
andleftMatch()
- access submatches with
Match
object methodsm => group()
m => start()
m => end()
- access submatches with
You can also use plain POSIX regular expressions (ERE) instead of Eggex.
The ~
operator tests if a string matches a pattern. The captured groups are
available through "global register" functions starting with _
.
var s = 'days 04-01 and 10-31'
var eggex = /<capture d+ as month> '-' <capture d+ as day>/
if (s ~ eggex) {
= _group(1) # => '04', the first capture
= _group(2) # => '01', the second capture
= _start(1) # => 5, start index of the first capture
= _end(1) # => 7, end index of the first capture
}
The eggex pattern has named capture as month
, so it's more typical to
write:
if (s ~ eggex) {
= _group('month') # => '04'
= _group('day') # => '01'
= _start('month') # => 5
= _end('month') # => 7
}
You can test if a string does not match a pattern with !~
:
if (s !~ / space /) {
echo 'no whitespace'
}
The pattern can also be a string, in plain ERE syntax:
if (s ~ '([[:digit:]]+)') {
= _group(1)
}
Help topics:
The search()
method is like the ~
operator, but it returns either null
or
a Match
object.
Match
objects have group()
, start()
, and end()
methods.
var m = 's' => search(eggex)
if (m) { # test if it matched
= m => group('month') # => '04'
= m => group('day') # => '01'
}
You can search from a given starting position:
var m = 's' => search(eggex, pos=12)
if (m) {
= m => group('month') # => '10', first month after pos 12
= m => group('day') # => '31', first day after pos 12
}
The search()
method is a bit like Str => find()
, which searches for a
substring rather than a pattern.
Help topics:
The leftMatch()
method is like search()
, but the string must match the
pattern at the left-most position.
It's useful for writing iterative lexers.
var s = 'hi 123'
var Name = / <capture [a-z]+ as name> /
var Num = / <capture d+ as num> /
var Space = / <capture s+ as space> /
# 3 kinds of tokens.
# (For CapWords variables, splicing @Name doesn't require @.)
var lexer = / Name | Num | Space /
var pos = 0 # start at position 0
while (true) {
var m = s => leftMatch(lexer, pos=pos)
if (not m) {
break
}
# Test which subgroup matched
var id = null
if (m => group('name') !== null) {
setvar id = 'name'
} elif (m => group('num') !== null) {
setvar id = 'num'
} elif (m => group('space') !== null) {
setvar id = 'space'
}
# Calculate the token value
var end_pos = m => end(0)
var val = s[pos:end_pos]
echo "Token $id $val"
setvar pos = end_pos # Advance position
}
(YSH leftMatch()
vs. search()
is like Python's re.match()
vs.
re.search().
)
- Help topic: leftMatch()
As noted about, you can name the capture groups with as month
, and access
them with m => group('month')
.
TODO(not implemented):
You can also add : funcName
to conver the string to a different value.
var pat = / <capture d+ as month: int> /
if ('10-31' ~ pat) {
= _group('month') # the integer 10, not the string '10'
}
We plan to have unevaluted string literals like ^"hello $1"
, instead of
custom Python's custom replacement language 'hello \g<1>
.
# var new = s => replace(/<capture d+ as month>/, ^"month is $month")
- Help topic: replace()
YSH is designed to have the convenience of Perl and Awk, and the power of Python and JavaScript.
Eggexes can be composed by splicing. Splicing works on expressions, not strings.
Replacement will use shell's string literal syntax, rather than a new
printf`-like mini-language.
Python's findall()
function can be emulated by using search()
in a loop,
similar to the lexer example above:
func findAll(s, pat) {
var pos = 0
var result = []
while (true) {
var m = s => search(pat, pos=pos)
if (not m) {
break
}
var left = m => start(0)
var right = m => end(0)
call result->append(s[left:right])
setvar pos = right
}
return (result)
}
var matches = findAll('days 04-01 and 10-31', / d+ '-' d+ /)
json write (matches) # => ['04-01', '10-31']
Python's re.split()
can also be emulated by using search()
in a loop.