Skip to content

Commit

Permalink
[eggex] Implement Str => leftMatch()
Browse files Browse the repository at this point in the history
Document the methods.
  • Loading branch information
Andy C committed Dec 16, 2023
1 parent 7987230 commit 36cd2e0
Show file tree
Hide file tree
Showing 5 changed files with 89 additions and 11 deletions.
22 changes: 17 additions & 5 deletions builtin/method_str.py
Expand Up @@ -68,11 +68,14 @@ def Call(self, rd):
return value.Str(res)


class Search(vm._Callable):
SEARCH = 0
LEFT_MATCH = 1

def __init__(self):
# type: () -> None
pass
class SearchMatch(vm._Callable):

def __init__(self, which_method):
# type: (int) -> None
self.which_method = which_method

def Call(self, rd):
# type: (typed_args.Reader) -> value_t
Expand All @@ -88,8 +91,17 @@ def Call(self, rd):

ere = regex_translate.AsPosixEre(eggex_val) # lazily converts to ERE

# Make it anchored
if self.which_method == LEFT_MATCH and not ere.startswith('^'):
ere = '^' + ere

cflags = regex_translate.LibcFlags(eggex_val.canonical_flags)
eflags = 0 if pos == 0 else REG_NOTBOL # ^ only matches when pos=0

if self.which_method == LEFT_MATCH:
eflags = 0 # ^ matches beginning even if pos=5
else:
eflags = 0 if pos == 0 else REG_NOTBOL # ^ only matches when pos=0

indices = libc.regex_search(ere, cflags, string, eflags, pos)

if indices is None:
Expand Down
4 changes: 2 additions & 2 deletions core/shell.py
Expand Up @@ -732,10 +732,10 @@ def Main(
# Like Python's re.search, except we put it on the string object
# It's more consistent with Str->find(substring, pos=0)
# It returns value.Match() rather than an integer
'search': method_str.Search(),
'search': method_str.SearchMatch(method_str.SEARCH),

# like Python's re.match()
'leftMatch': None,
'leftMatch': method_str.SearchMatch(method_str.LEFT_MATCH),

# like Python's re.fullmatch(), not sure if we really need it
'fullMatch': None,
Expand Down
65 changes: 65 additions & 0 deletions doc/ref/chap-type-method.md
Expand Up @@ -55,6 +55,71 @@ Respects unicode.

Respects unicode.

### search()

Search for the first occurrence of a regex in the string.

var m = 'hi world' => search(/[aeiou]/) # search for vowels
# matches at position 1 for 'i'

Returns a `value.Match()` if it matches, otherwise `null`.

You can start searching in the middle of the string:

var m = 'hi world' => search(/dot 'orld'/, pos=3)
# also matches at position 4 for 'o'

The `%start` or `^` metacharacter will only match when `pos` is zero.

(Similar to Python's `re.search()`.)

### leftMatch()

`leftMatch()` is like `search()`, but it checks

var m = 'hi world' => leftMatch(/[aeiou]/) # search for vowels
# doesn't match because h is not a vowel

var m = 'aye' => leftMatch(/[aeiou]/)
# matches 'a'

`leftMatch()` Can be used to implement lexers that consome every byte of input.

var lexer = / <capture digit+> | <capture space+> /

(Similar to Python's `re.match()`.)

## Match

### group()

Returns the string that matched a regex capture group. Group 0 is the entire
match.

var m = 'foo9bar' => search(/ [a-z] <capture d+> [a-z] /)
echo $[m => group(0)] # => o9b
echo $[m => group(1)] # => 9

<!-- TODO: document named capture. group 0 can be omitted -->

### start()

Like `group()`, but returns the **start** position of a regex capture group,
rather than its value.

var m = 'foo9bar' => search(/ [a-z] <capture d+> [a-z] /)
echo $[m => start(0)] # => 2 for 'o9b'
echo $[m => start(1)] # => 3 for '9'

### end()

Like `group()`, but returns the **end** position of a regex capture group,
rather than its value.

var m = 'foo9bar' => search(/ [a-z] <capture d+> [a-z] /)
echo $[m => end(0)] # => 5 for 'o9b'
echo $[m => end(1)] # => 4 for '9'

## List

### append()
Expand Down
3 changes: 3 additions & 0 deletions doc/ref/toc-ysh.md
Expand Up @@ -239,6 +239,9 @@ X [Builtin Sub] _buffer
X trim() X trimLeft() X trimRight()
X trimPrefix() X trimSuffix()
upper() lower() # ascii or unicode
search() leftMatch()
[Match] group() start() end()
X groups() X groupDict()
[List] append() pop() extend() X find()
X insert() X remove() reverse()
[Dict] keys() values() X get() X erase()
Expand Down
6 changes: 2 additions & 4 deletions spec/ysh-regex-api.test.sh
@@ -1,4 +1,4 @@
## oils_failures_allowed: 2
## oils_failures_allowed: 1

#### s ~ regex and s !~ regex
shopt -s ysh:upgrade
Expand Down Expand Up @@ -225,8 +225,7 @@ proc show-tokens (s) {
while (true) {
echo "pos=$pos"

# TODO: use leftMatch()
var m = s->search(lexer, pos=pos)
var m = s->leftMatch(lexer, pos=pos)
if (not m) {
break
}
Expand Down Expand Up @@ -266,7 +265,6 @@ null/ab/null/
pos=2
## END


#### Named captures with _match
shopt -s ysh:all

Expand Down

0 comments on commit 36cd2e0

Please sign in to comment.