Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Regular Expression replace and update Text.replace to the new API #5959

Merged
merged 176 commits into from
Mar 28, 2023
Merged
Show file tree
Hide file tree
Changes from 164 commits
Commits
Show all changes
176 commits
Select commit Hold shift + click to select a range
a1f269a
calling TRegexObject via matches
GregoryTravis Feb 23, 2023
c4c29e1
internal_pattern : Any
GregoryTravis Feb 23, 2023
49633f7
Merge branch 'develop' into wip/gmt/compile-regexp
GregoryTravis Feb 27, 2023
2cc6627
move new code to new modules
GregoryTravis Feb 27, 2023
28f2651
Pattern_2.matches
GregoryTravis Feb 27, 2023
2be426c
iterator
GregoryTravis Feb 28, 2023
3f6b5dd
to_text_debug
GregoryTravis Feb 28, 2023
14efa04
dead
GregoryTravis Feb 28, 2023
b0cacc8
change options to string and include them in src construction
GregoryTravis Mar 1, 2023
08d0cf0
Any, not Object
GregoryTravis Mar 1, 2023
ff8577d
Illegal_Argument
GregoryTravis Mar 1, 2023
c96511a
no UnicodeRegex
GregoryTravis Mar 1, 2023
7581dd5
Merge branch 'develop' into wip/gmt/compile-regexp
GregoryTravis Mar 1, 2023
4423481
match
GregoryTravis Mar 1, 2023
f2a3a68
groups work
GregoryTravis Mar 1, 2023
86a8955
case
GregoryTravis Mar 1, 2023
c59dd6e
install in match, find, find_all
GregoryTravis Mar 1, 2023
b67aab0
clean up
GregoryTravis Mar 1, 2023
d05c0e2
fix check_span
GregoryTravis Mar 1, 2023
fc94c95
clean up
GregoryTravis Mar 1, 2023
16f53d3
unicode woes
GregoryTravis Mar 2, 2023
8bbb758
mark normalization test pending
GregoryTravis Mar 2, 2023
5a34030
cleanup
GregoryTravis Mar 2, 2023
f6af863
convert to grapheme spans
GregoryTravis Mar 2, 2023
6a0ab81
Update test/Tests/src/Data/Text_Spec.enso
GregoryTravis Mar 2, 2023
3a131d0
cleanup, remove _2, escape
GregoryTravis Mar 3, 2023
0e5a7a4
rename to group
GregoryTravis Mar 3, 2023
4e51cbe
trying to read polyglot map
GregoryTravis Mar 3, 2023
be06de9
merge
GregoryTravis Mar 6, 2023
f3393d8
wip
GregoryTravis Mar 6, 2023
a197a74
Coerce polyglot values to supported Enso types
JaroslavTulach Mar 6, 2023
718efa7
undo catch
GregoryTravis Mar 6, 2023
799fbeb
Merge branch 'wip/gmt/compile-regexp' of github.com:enso-org/enso int…
GregoryTravis Mar 6, 2023
e8f7f67
disable syntax error test
GregoryTravis Mar 6, 2023
e2bf6f7
Coerce values obtained from readMember
JaroslavTulach Mar 7, 2023
e93cff1
jaroslav fix
GregoryTravis Mar 7, 2023
7d5f9e6
convert regex exception
GregoryTravis Mar 7, 2023
e6127ed
cleanup, correct error declarations
GregoryTravis Mar 7, 2023
69b710e
nonparticpating matches, docs
GregoryTravis Mar 7, 2023
3303cc4
docs
GregoryTravis Mar 7, 2023
3f72aa5
idiomatic
GregoryTravis Mar 7, 2023
8ce1ce6
docs
GregoryTravis Mar 7, 2023
f668bbd
docs
GregoryTravis Mar 7, 2023
6c8a62d
merge
GregoryTravis Mar 7, 2023
7e439f2
changelog
GregoryTravis Mar 7, 2023
354b1bf
fmt
GregoryTravis Mar 7, 2023
1056d88
groups
GregoryTravis Mar 7, 2023
fe43b71
review
GregoryTravis Mar 8, 2023
1c12ed9
idiomatic
GregoryTravis Mar 8, 2023
3b2da3f
Update CHANGELOG.md
GregoryTravis Mar 8, 2023
e0a9e8b
Update distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Regex_2…
GregoryTravis Mar 8, 2023
ee1431f
review
GregoryTravis Mar 8, 2023
81280ba
remove self from node
GregoryTravis Mar 8, 2023
bf4d119
review
GregoryTravis Mar 8, 2023
2181fea
Merge branch 'wip/gmt/compile-regexp' of github.com:enso-org/enso int…
GregoryTravis Mar 8, 2023
fb06590
matches must match at both ends
GregoryTravis Mar 8, 2023
d8ae85f
fmt
GregoryTravis Mar 8, 2023
92782fa
Merge branch 'develop' into wip/gmt/compile-regexp
GregoryTravis Mar 8, 2023
6e0ad33
dead
GregoryTravis Mar 8, 2023
2fe3572
Merge branch 'wip/gmt/compile-regexp' into wip/gmt/5122-replace
GregoryTravis Mar 8, 2023
41f835f
groups test
GregoryTravis Mar 8, 2023
f77e31e
wip
GregoryTravis Mar 8, 2023
723d15a
convert polyglot map to map
GregoryTravis Mar 9, 2023
294d121
named_groups
GregoryTravis Mar 9, 2023
a4eafe2
better example
GregoryTravis Mar 9, 2023
cb833d9
update docs
GregoryTravis Mar 9, 2023
e5d7714
wip
GregoryTravis Mar 9, 2023
2b48ad9
wip
GregoryTravis Mar 9, 2023
9e362fe
tests pass, check_span etc removed
GregoryTravis Mar 9, 2023
2dc4fab
tests for both span types
GregoryTravis Mar 9, 2023
1c5b94e
grapheme info in docs
GregoryTravis Mar 9, 2023
d7803df
default span and grapheme_span id to 0
GregoryTravis Mar 10, 2023
c5848b9
.text
GregoryTravis Mar 10, 2023
a971c5b
.get .at
GregoryTravis Mar 10, 2023
acb252a
Pattern find, find_all, regex tests
GregoryTravis Mar 10, 2023
3da1a6c
Merge branch 'develop' into wip/gmt/5122-replace
GregoryTravis Mar 13, 2023
153816e
missing imports
GregoryTravis Mar 13, 2023
8779daa
fix tests
GregoryTravis Mar 13, 2023
ed061b2
update docs
GregoryTravis Mar 13, 2023
57afa2c
removed duplicate tests
GregoryTravis Mar 13, 2023
69ac1f7
update regex_2 tests
GregoryTravis Mar 13, 2023
42f0316
Merge branch 'develop' into wip/gmt/5122-replace-2
GregoryTravis Mar 14, 2023
85f2f6b
writ
GregoryTravis Mar 14, 2023
6d41332
only_first
GregoryTravis Mar 14, 2023
082c861
builds
GregoryTravis Mar 14, 2023
3c01ab1
specialization problem
GregoryTravis Mar 14, 2023
e18d951
1 test
GregoryTravis Mar 14, 2023
d1b8748
named
GregoryTravis Mar 14, 2023
a528a93
more
GregoryTravis Mar 14, 2023
406490a
Update distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Regex/M…
GregoryTravis Mar 14, 2023
875b19e
Update distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Regex/M…
GregoryTravis Mar 14, 2023
ac80e87
utf16_start
GregoryTravis Mar 14, 2023
9c9cc8e
M wip/gmt/5122-replaceerge branch 'develop' into wip/gmt/5122-replace
GregoryTravis Mar 14, 2023
2f03d72
start/end tests
GregoryTravis Mar 14, 2023
3c89508
internal_start/end
GregoryTravis Mar 14, 2023
be3802e
at calls get
GregoryTravis Mar 14, 2023
d705a5f
pr
GregoryTravis Mar 14, 2023
296a22d
parser exemption
GregoryTravis Mar 14, 2023
5a328a5
Merge branch 'develop' into wip/gmt/5122-replace
GregoryTravis Mar 15, 2023
e9b7453
Refactor some stuff...
jdunkerley Mar 15, 2023
8c94374
Fix PR change errors.
jdunkerley Mar 15, 2023
f71a52f
Final PR tweaks.
jdunkerley Mar 15, 2023
f523424
Merge branch 'develop' into wip/gmt/5122-replace
GregoryTravis Mar 15, 2023
ded1746
merge
GregoryTravis Mar 15, 2023
4145f90
update tests
GregoryTravis Mar 15, 2023
2a98349
update tests
GregoryTravis Mar 15, 2023
22efa35
replace tests
GregoryTravis Mar 15, 2023
21ff3f3
wip
GregoryTravis Mar 15, 2023
d3463a5
merge
GregoryTravis Mar 15, 2023
2677794
Match.span test
GregoryTravis Mar 15, 2023
f027d11
Merge branch 'develop' into wip/gmt/5122-replace-2
GregoryTravis Mar 16, 2023
ffe7f08
fix docs
GregoryTravis Mar 16, 2023
519e164
fix docs
GregoryTravis Mar 16, 2023
d93953d
tests updated, unicode failing
GregoryTravis Mar 16, 2023
a25b623
all but one
GregoryTravis Mar 16, 2023
1f8de3f
Merge branch 'develop' into wip/gmt/5122-replace-2
GregoryTravis Mar 17, 2023
9dc744c
dead
GregoryTravis Mar 17, 2023
86a2f37
do not expand to graphemes during replace
GregoryTravis Mar 17, 2023
1483541
cleanup
GregoryTravis Mar 17, 2023
7333054
Too_Many_Groups
GregoryTravis Mar 17, 2023
8258d03
docs
GregoryTravis Mar 17, 2023
f8317d3
docs, $0, unicode capture group names
GregoryTravis Mar 17, 2023
e084410
Merge branch 'develop' into wip/gmt/5122-replace-2
GregoryTravis Mar 20, 2023
d7840d1
map cache
GregoryTravis Mar 20, 2023
c6215b4
lru, test
GregoryTravis Mar 20, 2023
967b3f4
cleanup, comments
GregoryTravis Mar 20, 2023
b11d5c3
Tail_Call
GregoryTravis Mar 20, 2023
160e576
changelog
GregoryTravis Mar 20, 2023
b0097fc
changelog
GregoryTravis Mar 20, 2023
a06c1c9
cleanup, map test
GregoryTravis Mar 20, 2023
1ee30de
rename span, grapheme_span to utf_16_span, span respectively
GregoryTravis Mar 20, 2023
b2281a8
java fmt
GregoryTravis Mar 20, 2023
009ee40
Adjuste replace iteration.
jdunkerley Mar 21, 2023
190c40c
Merge branch 'develop' into wip/gmt/5122-replace-2
GregoryTravis Mar 21, 2023
4685a56
A few more tweaks
jdunkerley Mar 21, 2023
19055e9
linear search instead of map
GregoryTravis Mar 21, 2023
edefbff
Value
GregoryTravis Mar 21, 2023
73c1c9e
get_or_set
GregoryTravis Mar 21, 2023
e9bc022
Merge branch 'develop' into wip/gmt/5122-replace-2
GregoryTravis Mar 21, 2023
7df6c11
utf_16 everywhere
GregoryTravis Mar 21, 2023
d75c4d4
use_regex flag
GregoryTravis Mar 21, 2023
c1239ba
example
GregoryTravis Mar 21, 2023
4dec215
private replacer helpers
GregoryTravis Mar 21, 2023
2948bc3
review
GregoryTravis Mar 21, 2023
fe9732f
Merge branch 'develop' into wip/gmt/5122-replace-2
GregoryTravis Mar 22, 2023
13fda5f
review
GregoryTravis Mar 22, 2023
5a6d5df
fmt
GregoryTravis Mar 22, 2023
e9f38ea
comments, examples
GregoryTravis Mar 22, 2023
ba67e9a
review
GregoryTravis Mar 22, 2023
09c1c30
locale must be default
GregoryTravis Mar 22, 2023
81f4c92
if_not_error
GregoryTravis Mar 22, 2023
926bc35
review
GregoryTravis Mar 22, 2023
7726f93
empty pattern
GregoryTravis Mar 22, 2023
6044db7
review
GregoryTravis Mar 23, 2023
dd93242
wip
GregoryTravis Mar 23, 2023
c1e20cc
empty patterns
GregoryTravis Mar 23, 2023
0d57aba
wip
GregoryTravis Mar 23, 2023
27e5130
remove 100 capture group limit
GregoryTravis Mar 23, 2023
0f3e533
update comments
GregoryTravis Mar 23, 2023
53851dc
wip
GregoryTravis Mar 23, 2023
ec52bfd
merge
GregoryTravis Mar 24, 2023
d813416
empty flags
GregoryTravis Mar 24, 2023
8b07d54
Table_Spec
GregoryTravis Mar 24, 2023
700745f
Table_Tests
GregoryTravis Mar 24, 2023
59e0ed3
Tidy a few imports and a type signature.
jdunkerley Mar 25, 2023
cec0417
Merge branch 'develop' into wip/gmt/5122-replace-2
jdunkerley Mar 25, 2023
2fd1a14
Merge branch 'develop' into wip/gmt/5122-replace-2
jdunkerley Mar 27, 2023
1c35f42
PR comments.
jdunkerley Mar 27, 2023
a084ef8
added example test
GregoryTravis Mar 27, 2023
92d572e
non-default locale docs
GregoryTravis Mar 27, 2023
6d5d69a
added example test
GregoryTravis Mar 27, 2023
d9b7d3e
Merge branch 'develop' into wip/gmt/5122-replace-2
mergify[bot] Mar 27, 2023
9f0f05c
Merge branch 'develop' into wip/gmt/5122-replace-2
mergify[bot] Mar 27, 2023
31f508b
Merge branch 'develop' into wip/gmt/5122-replace-2
mergify[bot] Mar 27, 2023
c101f5b
Merge branch 'develop' into wip/gmt/5122-replace-2
mergify[bot] Mar 28, 2023
b4d7760
Merge branch 'develop' into wip/gmt/5122-replace-2
mergify[bot] Mar 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -359,6 +359,8 @@
- [Aligned names of columns created by column operations.][5850]
- [Improved `cross_tab`. Renamed `fill_missing` and `is_missing` to
`fill_nothing` and `is_nothing`. Added `fill_empty`.][5863]
- [Removed many regex compile flags from `replace`; added `only_first`
flag.][5959]

[debug-shortcuts]:
https://github.com/enso-org/enso/blob/develop/app/gui/docs/product/shortcuts.md#debug
Expand Down Expand Up @@ -546,6 +548,7 @@
[5863]: https://github.com/enso-org/enso/pull/5863
[5917]: https://github.com/enso-org/enso/pull/5917
[5705]: https://github.com/enso-org/enso/pull/5705
[5959]: https://github.com/enso-org/enso/pull/5959

#### Enso Compiler

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import project.Data.Range.Range
import project.Data.Text.Case.Case
import project.Data.Text.Case_Sensitivity.Case_Sensitivity
import project.Data.Text.Encoding.Encoding
import project.Data.Text.Helpers
import project.Data.Text.Location.Location
import project.Data.Text.Matching_Mode.Matching_Mode
import project.Data.Text.Regex.Match.Match
Expand All @@ -31,6 +32,8 @@ import project.Errors.Illegal_Argument.Illegal_Argument
import project.Errors.Problem_Behavior.Problem_Behavior
import project.Meta
import project.Nothing.Nothing
import project.IO
import project.IO

from project.Data.Boolean import Boolean, True, False
from project.Data.Text.Text_Sub_Range import Codepoint_Ranges, Text_Sub_Range
Expand Down Expand Up @@ -218,6 +221,8 @@ Text.characters self =
- case_sensitivity: Specifies if the text values should be compared case
sensitively.

If an empty regex is used, `find` throws an Illegal_Argument error.

> Example
Find the first substring matching the regex.

Expand All @@ -227,10 +232,12 @@ Text.characters self =
example_find_insensitive =
## This matches `aBc` @ character 11
"aabbbbccccaaBcaaaa".find "a[ab]c" Case_Sensitivity.Insensitive
Text.find : Text -> Case_Sensitivity -> Match | Nothing ! Regex_Syntax_Error
Text.find : Text -> Case_Sensitivity -> Match | Nothing ! Regex_Syntax_Error | Illegal_Argument
Text.find self pattern=".*" case_sensitivity=Case_Sensitivity.Sensitive =
case_insensitive = case_sensitivity.is_case_insensitive_in_memory
Regex_2.compile pattern case_insensitive=case_insensitive . match self
Helpers.regex_assume_default_locale case_sensitivity <|
case_insensitive = case_sensitivity.is_case_insensitive_in_memory
compiled_pattern = Regex_2.compile pattern case_insensitive=case_insensitive
compiled_pattern.if_not_error <| compiled_pattern.match self

## Finds all the matches of the regular expression `pattern` in `self`,
returning a Vector. If not found, will be an empty Vector.
Expand All @@ -240,6 +247,8 @@ Text.find self pattern=".*" case_sensitivity=Case_Sensitivity.Sensitive =
- case_sensitivity: Specifies if the text values should be compared case
sensitively.

If an empty regex is used, `find_all` throws an Illegal_Argument error.

> Example
Find the substring matching the regex.

Expand All @@ -249,10 +258,12 @@ Text.find self pattern=".*" case_sensitivity=Case_Sensitivity.Sensitive =
example_find_all_insensitive =
## This matches `aABbbbc` @ character 0 and `aBC` @ character 11
"aABbbbccccaaBCaaaa".find_all "a[ab]+c" Case_Sensitivity.Insensitive
Text.find_all : Text -> Case_Sensitivity -> Vector Match ! Regex_Syntax_Error
Text.find_all : Text -> Case_Sensitivity -> Vector Match ! Regex_Syntax_Error ! Illegal_Argument
Text.find_all self pattern=".*" case_sensitivity=Case_Sensitivity.Sensitive =
case_insensitive = case_sensitivity.is_case_insensitive_in_memory
Regex_2.compile pattern case_insensitive=case_insensitive . match_all self
Helpers.regex_assume_default_locale case_sensitivity <|
case_insensitive = case_sensitivity.is_case_insensitive_in_memory
compiled_pattern = Regex_2.compile pattern case_insensitive=case_insensitive
compiled_pattern.if_not_error <| compiled_pattern.match_all self

## ALIAS Check Matches

Expand All @@ -263,6 +274,8 @@ Text.find_all self pattern=".*" case_sensitivity=Case_Sensitivity.Sensitive =
- case_sensitivity: Specifies if the text values should be compared case
sensitively.

If an empty regex is used, `match` throws an Illegal_Argument error.

> Example
Checks if whole text matches a basic email regex.

Expand All @@ -274,11 +287,12 @@ Text.find_all self pattern=".*" case_sensitivity=Case_Sensitivity.Sensitive =
regex = ".+ct@.+"
# Evaluates to true
"CONTACT@enso.org".match regex Case_Sensitivity.Insensitive
Text.match : Text -> Case_Sensitivity -> Boolean ! Regex_Syntax_Error
Text.match : Text -> Case_Sensitivity -> Boolean ! Regex_Syntax_Error | Illegal_Argument
Text.match self pattern=".*" case_sensitivity=Case_Sensitivity.Sensitive =
case_insensitive = case_sensitivity.is_case_insensitive_in_memory
compiled_pattern = Regex_2.compile pattern case_insensitive=case_insensitive
compiled_pattern.matches self
Helpers.regex_assume_default_locale case_sensitivity <|
case_insensitive = case_sensitivity.is_case_insensitive_in_memory
compiled_pattern = Regex_2.compile pattern case_insensitive=case_insensitive
compiled_pattern.if_not_error <| compiled_pattern.matches self

## ALIAS Split Text

Expand Down Expand Up @@ -327,21 +341,28 @@ Text.split self delimiter="," matcher=Text_Matcher.Case_Sensitive = if delimiter
compiled_pattern.split self mode=Regex_Mode.All

## ALIAS Replace Text
Replaces the first, last, or all occurrences of term with new_text in the
input. If `term` is empty, the function returns the input unchanged.
Perform a text or regex replace.

Returns the text with all matched elements replaced by the provided
replacement. If `input` is empty, the function returns the input unchanged.

The replacement string can contain references to groups matched by the
regex. The following syntaxes are supported:
$0: the entire match string
$&: the entire match string
$n: the nth group
$<foo>: Named group `foo`

Arguments:
- term: The term to find.
- new_text: The new text to replace occurrences of `term` with.
If `matcher` is a `Regex_Matcher`, `new_text` can include replacement
patterns (such as `$<n>`) for a marked group.
- mode: Specifies which occurences of term the engine tries to find. When the
mode is `First` or `Last`, this method replaces the first or last occurence
of term in the input. If set to `All`, it replaces all occurences of term in
the input.
- matcher: If a `Text_Matcher`, the text is compared using case-sensitivity
rules specified in the matcher. If a `Regex_Matcher`, the term is used as a
regular expression and matched using the associated options.
- term: The string or regex to find.
- replacement: The text to replace matches with.
- case_insensitive: Enables or disables case-insensitive matching. Case
insensitive matching behaves as if it normalises the case of all input
text before matching on it.
- only_first: If True, only replace the first match.
- use_regex: If true, the term is used as a regular expression.

If an empty regex is used, `replace` throws an Illegal_Argument error.

> Example
Replace letters in the text "aaa".
Expand All @@ -351,17 +372,17 @@ Text.split self delimiter="," matcher=Text_Matcher.Case_Sensitive = if delimiter
> Example
Replace all occurrences of letters 'l' and 'o' with '#'.

"Hello World!".replace "[lo]" "#" matcher=Regex_Matcher == "He### W#r#d!"
"Hello World!".replace "[lo]" "#" use_regex=True == "He### W#r#d!"

> Example
Replace the first occurrence of letter 'l' with '#'.

"Hello World!".replace "l" "#" mode=Matching_Mode.First == "He#lo World!"
"Hello World!".replace "l" "#" only_first=True == "He#lo World!"

> Example
Replace texts in quotes with parentheses.
GregoryTravis marked this conversation as resolved.
Show resolved Hide resolved

'"abc" foo "bar" baz'.replace '"(.*?)"' '($1)' matcher=Regex_Matcher == '(abc) foo (bar) baz'
'"abc" foo "bar" baz'.replace '"(.*?)"' '($1)' use_regex=True == '(abc) foo (bar) baz'

! Matching Grapheme Clusters
In case-insensitive mode, a single character can match multiple characters,
Expand All @@ -378,62 +399,40 @@ Text.split self delimiter="," matcher=Text_Matcher.Case_Sensitive = if delimiter
> Example
Extended partial matches in case-insensitive mode.

# The ß symbol matches the letter `S` twice in case-insensitive mode, because it folds to `ss`.
'ß'.replace 'S' 'A' matcher=(Text_Matcher Case_Insensitive) . should_equal 'AA'
# The ß symbol matches the letter `S` twice in case-insensitive mode, because it folds to `ss`.
'ß'.replace 'ß' 'A' case_sensitivity=Case_Sensitivity.Insensitive . should_equal 'A'
# The 'ffi' ligature is a single grapheme cluster, so even if just a part of it is matched, the whole grapheme is replaced.
'affib'.replace 'i' 'X' matcher=(Text_Matcher Case_Insensitive) . should_equal 'aXb'

! Last Match in Regex Mode
Regex always performs the search from the front and matching the last
occurrence means selecting the last of the matches while still generating
matches from the beginning. Regex does not return overlapping matches - it
will return a match at some position and then continue the search after that
match. This will lead to slightly different behavior for overlapping
occurrences of a pattern in Regex mode than in exact text matching mode
where the matches are searched for from the back.
'affib'.replace 'ffi' 'X' case_sensitivity=Case_Sensitivity.Insensitive . should_equal 'aXb'

> Example
Comparing Matching in Last Mode in Regex and Text mode

"aaa".replace "aa" "c" mode=Matching_Mode.Last matcher=Text_Matcher . should_equal "ac"
"aaa".replace "aa" "c" mode=Matching_Mode.Last matcher=Regex_Matcher . should_equal "ca"

"aaa aaa".replace "aa" "c" matcher=Text_Matcher . should_equal "ca ca"
"aaa aaa".replace "aa" "c" mode=Matching_Mode.First matcher=Text_Matcher . should_equal "ca aaa"
"aaa aaa".replace "aa" "c" mode=Matching_Mode.Last matcher=Text_Matcher . should_equal "aaa ac"
"aaa aaa".replace "aa" "c" matcher=Regex_Matcher . should_equal "ca ca"
"aaa aaa".replace "aa" "c" mode=Matching_Mode.First matcher=Regex_Matcher . should_equal "ca aaa"
"aaa aaa".replace "aa" "c" mode=Matching_Mode.Last matcher=Regex_Matcher . should_equal "aaa ca"
Text.replace : Text -> Text -> Matching_Mode | Regex_Mode -> (Text_Matcher | Regex_Matcher) -> Text
Text.replace self term="" new_text="" mode=Regex_Mode.All matcher=Text_Matcher.Case_Sensitive = if term.is_empty then self else
case matcher of
_ : Text_Matcher ->
Regexp replace.

'<a href="url">content</a>'.replace '<a href="(.*?)">(.*?)</a>' '$2 is at $1' use_regex=True == 'content is at url'

Text.replace : Text -> Text -> Case_Sensitivity -> Boolean -> Boolean -> Text | Illegal_Argument
Text.replace self term replacement case_sensitivity=Case_Sensitivity.Sensitive only_first=False use_regex=False =
case use_regex of
False -> if term.is_empty then self else
array_from_single_result result = case result of
Nothing -> Array.empty
_ -> Array.new_1 result
spans_array = case matcher of
Text_Matcher.Case_Sensitive -> case mode of
Regex_Mode.All ->
Text_Utils.span_of_all self term
Matching_Mode.First ->
array_from_single_result <| Text_Utils.span_of self term
Matching_Mode.Last ->
array_from_single_result <| Text_Utils.last_span_of self term
_ -> Error.throw (Illegal_Argument.Error "Invalid mode.")
Text_Matcher.Case_Insensitive locale -> case mode of
Regex_Mode.All ->
spans_array = case case_sensitivity of
Case_Sensitivity.Sensitive -> case only_first of
False -> Text_Utils.span_of_all self term
True -> array_from_single_result <| Text_Utils.span_of self term
Case_Sensitivity.Insensitive locale -> case only_first of
False ->
Text_Utils.span_of_all_case_insensitive self term locale.java_locale
Matching_Mode.First ->
True ->
array_from_single_result <|
Text_Utils.span_of_case_insensitive self term locale.java_locale False
Matching_Mode.Last ->
array_from_single_result <|
Text_Utils.span_of_case_insensitive self term locale.java_locale True
_ -> Error.throw (Illegal_Argument.Error "Invalid mode.")
Text_Utils.replace_spans self spans_array new_text
_ : Regex_Matcher ->
compiled_pattern = matcher.compile term
compiled_pattern.replace self new_text mode=mode
Text_Utils.replace_spans self spans_array replacement
True ->
Helpers.regex_assume_default_locale case_sensitivity <|
case_insensitive = case_sensitivity.is_case_insensitive_in_memory
compiled_pattern = Regex_2.compile term case_insensitive=case_insensitive
compiled_pattern.if_not_error <|
compiled_pattern.replace self replacement only_first

## ALIAS Get Words

Expand Down Expand Up @@ -1115,9 +1114,9 @@ Text.trim self where=Location.Both what=_.is_whitespace =

term = "straße"
text = "MONUMENTENSTRASSE 42"
match = text . locate term matcher=(Text_Matcher Case_Insensitive)
term.length == 6
match.length == 7
match = text . locate term case_sensitivity=Case_Sensitivity.Insensitive
term.length . should_equal 6
match.length . should_equal 7

! Matching Grapheme Clusters
In case-insensitive mode, a single character can match multiple characters,
Expand Down Expand Up @@ -1265,11 +1264,8 @@ Text.locate_all self term="" case_sensitivity=Case_Sensitivity.Sensitive = if te
- term: The term to find.
- start: The index to start searching from. If the index is negative, it
is counted from the end of the vector.
- matcher: Specifies how the term is matched against the input:
- If a `Text_Matcher`, the text is compared using case-sensitively rules
specified in the matcher.
- If a `Regex_Matcher`, the `term` is used as a regular expression and
matched using the associated options.
- case_sensitivity: Specifies if the text values should be compared case
sensitively.

! What is a Character?
A character is defined as an Extended Grapheme Cluster, see Unicode
Expand Down Expand Up @@ -1301,11 +1297,8 @@ Text.index_of self term="" start=0 case_sensitivity=Case_Sensitivity.Sensitive =
- term: The term to find.
- start: The index to start searching backwards from. If the index is
negative, it is counted from the end of the vector.
- matcher: Specifies how the term is matched against the input:
- If a `Text_Matcher`, the text is compared using case-sensitively rules
specified in the matcher.
- If a `Regex_Matcher`, the `term` is used as a regular expression and
matched using the associated options.
- case_sensitivity: Specifies if the text values should be compared case
sensitively.

! What is a Character?
A character is defined as an Extended Grapheme Cluster, see Unicode
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
from Standard.Base import all

import project.Any.Any
import project.Data.Locale.Locale
import project.Data.Text.Case_Sensitivity.Case_Sensitivity
import project.Errors.Illegal_Argument.Illegal_Argument

## PRIVATE
regex_assume_default_locale : Case_Sensitivity -> Any -> Any ! Illegal_Argument
regex_assume_default_locale case_sensitivity ~action = case case_sensitivity of
Case_Sensitivity.Sensitive -> action
Case_Sensitivity.Insensitive locale -> case locale == Locale.default of
True -> action
False ->
msg = "Custom locales are not supported for regexes."
Error.throw (Illegal_Argument.Error msg)
Loading