Skip to content

Commit

Permalink
[eggex] Rename _match() -> _group()
Browse files Browse the repository at this point in the history
_match() is stil kept for backward compatiblity.
  • Loading branch information
Andy C committed Dec 16, 2023
1 parent 36cd2e0 commit 3473f7c
Show file tree
Hide file tree
Showing 12 changed files with 88 additions and 35 deletions.
5 changes: 3 additions & 2 deletions core/shell.py
Expand Up @@ -814,8 +814,9 @@ def Main(

_SetGlobalFunc(mem, 'len', func_misc.Len())

# TODO: rename to group
_SetGlobalFunc(mem, '_match', func_eggex.MatchFunc(mem, func_eggex.G))
g = func_eggex.MatchFunc(mem, func_eggex.G)
_SetGlobalFunc(mem, '_group', g)
_SetGlobalFunc(mem, '_match', g) # TODO: remove this backward compat alias
_SetGlobalFunc(mem, '_start', func_eggex.MatchFunc(mem, func_eggex.S))
_SetGlobalFunc(mem, '_end', func_eggex.MatchFunc(mem, func_eggex.E))

Expand Down
40 changes: 30 additions & 10 deletions doc/eggex.md
Expand Up @@ -231,11 +231,11 @@ group. POSIX ERE has no non-capturing groups.

Capture with `<capture pat>`:

<capture d+> # Becomes _match(1)
<capture d+> # Becomes _group(1)

Add a variable after `as` for named capture:

<capture d+ as myvar> # Becomes _match('myvar')
<capture d+ as myvar> # Becomes _group('myvar')

### Character Class Literals Use `[]`

Expand Down Expand Up @@ -323,25 +323,45 @@ You can spread regexes over multiple lines and add comments:

### The YSH API

(Still to be implemented.)

Testing and extracting matches:

if (mystr ~ pat) {
echo $_match(1) # or _group(1) ?
var s = 'days 04-01 and 10-31'
var pat = /<capture d+ as month> '-' <capture d+ as day>/

if (s ~ pat) {
echo $[_group(1)]
}

Iterative matching:
More explicit API with with search():

var pat = /<capture d+ as month> '-' <capture d+ as day>/
var m = 's' => search(pat)
if (m) {
echo $[m => group(1)]
}

Iterative matching with with leftMatch():

var s = 'hi 123'
var lexer = / <capture [a-z]+> | <capture d+> | <capture s+> /
var pos = 0
while (true) {
var m = pat => findNext('04-01 10-31')
var m = s => leftMatch(lexer, pos=pos)
if (not m) {
break
}
echo $[m => group(0)]
if (m => group(1) !== null) {
echo 'letter'
elif (m => group(2) !== null) {
echo 'digit'
elif (m => group(3) !== null) {
echo 'space'
}

setvar pos = m => end(0)
}

(Still to be implemented.)

Substitution:

var new = s => replace(/<capture d+ as month>/, ^"month is $month")
Expand Down
4 changes: 2 additions & 2 deletions doc/idioms.md
Expand Up @@ -894,8 +894,8 @@ No:

Yes:

if (x ~ / 'foo-' <d+> /) { # <> is capture
echo $_match(1) # first submatch
if (x ~ / 'foo-' <capture d+> /) { # <> is capture
echo $[_group(1)] # first submatch
}

## Glob Matching
Expand Down
2 changes: 1 addition & 1 deletion doc/language-influences.md
Expand Up @@ -267,7 +267,7 @@ Hay blocks in YSH allow this to be expressed very similarly:

PHP has global variables like `_REQUEST` and `_POST`.

YSH will have `_argv`, `_match()`, `_start()`, etc. These are global variables
YSH has `_status`, `_group()`, `_start()`, etc. These are global variables
that are "silently" mutated by the interpreter (and functions to access such
global data).

Expand Down
6 changes: 3 additions & 3 deletions doc/ref/chap-builtin-cmd.md
Expand Up @@ -145,10 +145,10 @@ from interfering with user code. Example:

Current list of registers:

BASH_REMATCH aka _match()
Regex data underlying BASH_REMATCH, _group(), _start(), _end()
$?
_status set by the try builtin
PIPESTATUS aka _pipeline_status
_status # set by the try builtin
PIPESTATUS # aka _pipeline_status
_process_sub_status


Expand Down
20 changes: 19 additions & 1 deletion doc/ref/chap-builtin-func.md
Expand Up @@ -231,12 +231,30 @@ Note, you will need to `source --builtin list.ysh` to use this function.

## Pattern

### `_match()`
### `_group()`

Like `Match => group()`, but accesses the global match created by `~`:

if ('foo42' ~ / d+ /) {
echo $[_group(0)] # => 42
}

### `_start()`

Like `Match => start()`, but accesses the global match created by `~`:

if ('foo42' ~ / d+ /) {
echo $[_start(0)] # => 3
}

### `_end()`

Like `Match => end()`, but accesses the global match created by `~`:

if ('foo42' ~ / d+ /) {
echo $[_end(0)] # => 5
}

## Str

### countRunes()
Expand Down
2 changes: 1 addition & 1 deletion doc/ref/toc-ysh.md
Expand Up @@ -277,7 +277,7 @@ X [Codecs] quoteUrl() quoteHtml() quoteSh() quoteC()
quoteMake() quoteNinja()
X [Serialize] toJ8() fromJ8()
toJson() fromJson()
[Pattern] _match() X _start() X _end()
[Pattern] _group() X _start() X _end()
[Introspection] shvarGet() evalExpr()
[Hay Config] parseHay() evalHay()
X [Wok] _field()
Expand Down
2 changes: 1 addition & 1 deletion doc/style-guide.md
Expand Up @@ -71,7 +71,7 @@ Global variables that are **silently mutated** by the interpreter start with

As do functions to access such mutable vars:

_match() _start() _end()
_group() _start() _end()

Example:

Expand Down
2 changes: 1 addition & 1 deletion doc/ysh-vs-shell.md
Expand Up @@ -115,7 +115,7 @@ A `shvar` is similar to a `shopt`, but it has a string value, like `$IFS` and
**Registers** are special variables set by the interpreter, beginning with `_`:

- `try` sets `_status` (preferred over `$?`)
- `_pipeline_status`, `_match()`, etc.
- `_pipeline_status`, `_group()`, etc.

<!--
## TODO
Expand Down
14 changes: 14 additions & 0 deletions spec/TODO-deprecate.test.sh
Expand Up @@ -27,3 +27,17 @@ echo @x
## STDOUT:
one two
## END

#### _match() instead of _group()

shopt --set ysh:upgrade

if ('foo42' ~ / <capture d+> /) {
echo $[_match(0)]
echo $[_group(0)]
}

## STDOUT:
42
42
## END
4 changes: 2 additions & 2 deletions spec/ysh-case.test.sh
Expand Up @@ -117,8 +117,8 @@ string
#### eggex capture
for name in foo/foo.py bar/bar.cc zz {
case (name) {
/ '/f' <capture dot*> '.' / { echo "g0=$[_match(0)] g1=$[_match(1)]" }
/ '/b' <capture dot*> '.' / { echo "g0=$[_match(1)] g1=$[_match(1)]" }
/ '/f' <capture dot*> '.' / { echo "g0=$[_group(0)] g1=$[_group(1)]" }
/ '/b' <capture dot*> '.' / { echo "g0=$[_group(1)] g1=$[_group(1)]" }
(else) { echo 'no match' }
}
}
Expand Down
22 changes: 11 additions & 11 deletions spec/ysh-regex-api.test.sh
Expand Up @@ -6,11 +6,11 @@ shopt -s ysh:upgrade
var s = 'foo'
if (s ~ '.([[:alpha:]]+)') { # ERE syntax
echo matches
argv.py $[_match(0)] $[_match(1)]
argv.py $[_group(0)] $[_group(1)]
}
if (s !~ '[[:digit:]]+') {
echo "does not match"
argv.py $[_match(0)] $[_match(1)]
argv.py $[_group(0)] $[_group(1)]
}

if (s ~ '[[:digit:]]+') {
Expand All @@ -19,14 +19,14 @@ if (s ~ '[[:digit:]]+') {
# Should be cleared now
# should this be Undef rather than ''?
try {
var x = _match(0)
var x = _group(0)
}
if (_status === 2) {
echo 'got expected status 2'
}

try {
var y = _match(1)
var y = _group(1)
}
if (_status === 2) {
echo 'got expected status 2'
Expand Down Expand Up @@ -84,7 +84,7 @@ yes
yes
## END

#### Positional captures with _match
#### Positional captures with _group
shopt -s ysh:all

var x = 'zz 2020-08-20'
Expand All @@ -99,9 +99,9 @@ setvar BASH_REMATCH = :| reset |

if (x ~ /<capture d+> '-' <capture d+>/) {
argv.py "${BASH_REMATCH[@]}"
argv.py $[_match(0)] $[_match(1)] $[_match(2)]
argv.py $[_group(0)] $[_group(1)] $[_group(2)]

argv.py $[_match()] # synonym for _match(0)
argv.py $[_group()] # synonym for _group(0)

# TODO: Also test _start() and _end()
}
Expand All @@ -112,12 +112,12 @@ if (x ~ /<capture d+> '-' <capture d+>/) {
['2020-08']
## END

#### _match() returns null when group doesn't match
#### _group() returns null when group doesn't match
shopt -s ysh:upgrade

var pat = / <capture 'a'> | <capture 'b'> /
if ('b' ~ pat) {
echo "$[_match(1)] $[_match(2)]"
echo "$[_group(1)] $[_group(2)]"
}
## STDOUT:
null b
Expand Down Expand Up @@ -265,13 +265,13 @@ null/ab/null/
pos=2
## END

#### Named captures with _match
#### Named captures with _group
shopt -s ysh:all

var x = 'zz 2020-08-20'

if (x ~ /<capture d+ as year> '-' <capture d+ as month>/) {
argv.py $[_match('year')] $[_match('month')]
argv.py $[_group('year')] $[_grou_group('month')]
}
## STDOUT:
['2020', '08']
Expand Down

0 comments on commit 3473f7c

Please sign in to comment.