In [2]:
using Test

In [3]:
occursin("cat","cat")

true

In [4]:
"cat"=="cat"

true

In [5]:
occursin("cat","scatter")

true

In [6]:
occursin("cat", "cottage")

false

In [1]:
map(s -> occursin("cat", s), ["scatter", "cottage"])

2-element Vector{Bool}:
 1
 0

In [2]:
map(s->startswith(s, "cat"), ["catastrophe", "scatter", "tigercat"])

3-element Vector{Bool}:
 1
 0
 0

In [3]:
map(s->endswith(s, "cat"), ["catastrophe", "scatter", "tigercat"])

3-element Vector{Bool}:
 0
 0
 1

In [7]:
r"cat"

r"cat"

In [4]:
map(s-> occursin(r"cat", s), ["catastrophe", "scatter", "tigercat"])

3-element Vector{Bool}:
 1
 1
 1

In [5]:
map(s-> occursin(r"^cat", s), ["catastrophe", "scatter", "tigercat"])

3-element Vector{Bool}:
 1
 0
 0

In [7]:
map(s -> occursin(r"c[auo]t", s), ["catalog", "scotch", "cutlery", "settle"])

4-element Vector{Bool}:
 1
 1
 1
 0

In [8]:
map(s -> occursin(r"^[a-f]",s), ["apple", "checkmate", "frosted flakes", "zebra"])

4-element Vector{Bool}:
 1
 1
 1
 0

In [10]:
map(s -> occursin(r"^[^a-f]",s), ["apple", "frosted flakes", "poutine", "zebra"])

4-element Vector{Bool}:
 0
 0
 1
 1

In [11]:
map(s -> occursin(r"c.t",s), ["catalog", "tactile", "yacht"])

3-element Vector{Bool}:
 1
 0
 1

In [None]:
map(s -> occursin(r"dog|cat",s), ["dogma", "catalog", "fish"])

3-element Vector{Bool}:
 1
 1
 0

In [14]:
map(s -> occursin(r"(dog|cat)fish",s), ["dogfish", "catfish", "clownfish"])

3-element Vector{Bool}:
 1
 1
 0

In [14]:
occursin(r"^[0-9]","1234")

true

In [15]:
occursin(r"^\d","1234")

true

In [16]:
map(s-> occursin(r"^[a-z]", s), ["apple", "zebra", "1234"])

3-element Vector{Bool}:
 1
 1
 0

In [17]:
map(s -> occursin(r"^[[:alpha:]]",s) , ["apple", "ωγ", "1234"])

3-element Vector{Bool}:
 1
 1
 0

In [18]:
map(s -> occursin(r"\w\w\w", s), ["r2c", "ww_c", "i a"])

3-element Vector{Bool}:
 1
 1
 0

In [1]:
split( "The dog	jumps over the log", r"\s")

6-element Vector{SubString{String}}:
 "The"
 "dog"
 "jumps"
 "over"
 "the"
 "log"

In [2]:
split("1.0,2.0,3.0;4.5,7.9;-10", r"[,;]")

6-element Vector{SubString{String}}:
 "1.0"
 "2.0"
 "3.0"
 "4.5"
 "7.9"
 "-10"

### Quantifiers

We often want to match some character (or range) some number of times.  We can do that by just repeating that character, like: 

In [21]:
occursin(r"ccc","cc")

false

Or we can use the `{n}` after a character to match `n` times.  

In [22]:
occursin(r"c{3}","cc")

false

The following matches `c` between 2 and 4 times. 

In [23]:
map(s -> occursin(r"^c{2,4}$",s),["c","cc","ccc","cccc","ccccc"])

5-element Vector{Bool}:
 0
 1
 1
 1
 0

In [24]:
map(s -> occursin(r"^c{3,}$",s),["c","cc","ccc","cccc","ccccc"])

5-element Vector{Bool}:
 0
 0
 1
 1
 1

In [17]:
map(s -> occursin(r"^c{0,3}$",s),["c","cc","ccc","cccc","ccccc"])

5-element Vector{Bool}:
 1
 1
 1
 0
 0

We can match a character 0 or more times with a `*`.

In [26]:
map(s -> occursin(r"^c*a",s),["a","ca","cca","ccca","cccca"])

5-element Vector{Bool}:
 1
 1
 1
 1
 1

We can match a character 1 or more times with a `+`.

In [27]:
map(s -> occursin(r"^c+a",s),["a","ca","cca","ccca","cccca"])

5-element Vector{Bool}:
 0
 1
 1
 1
 1

### Parsing strings and extracting substrings

A bigger use of regular expressions is for extracting substrings and parsing those substrings.  We first examine the `match` method to extract information. 

If we want to match a string with 3 words separated by spaces, we can use the regular expression `r"\w+\s+\w+\s+\w+"` for example:

In [36]:
occursin(r"\w+\s+\w+\s+\w+", "Three big pigs")

true

Now this just tells us if the string matches.  However lets say that we want to extract the three strings. We can do that first of all by surrounding the `\w+` with `()`, which makes a grouping.  

In [37]:
occursin(r"(\w+)\s+(\w+)\s+(\w+)", "Three big pigs")

true

And now we will use `match` instead of `occursin`:

In [39]:
m = match(r"(\w+)\s+(\w+)\s+(\w+)", "Three big pigs")

RegexMatch("Three big pigs", 1="Three", 2="big", 3="pigs")

This returns a `RegexMatch` object, which returns the matched string (the whole thing) and the three groupings.  We can next get the groups with `m[1]`, `m[2]` and `m[3]`:

In [40]:
m[1], m[2], m[3]

("Three", "big", "pigs")

### Exercise

Let's say there are sports scores like `78-75` or `5-3` where the first number is the home team, the second is the visitor team.  Extract the scores.  Test with a few options. 

In [4]:
map(s -> occursin(r"^\d+-\d+$", s), ["78-75","5-3", "123-97"])

3-element Vector{Bool}:
 1
 1
 1

In [5]:
match(r"^(\d+)-(\d+)$", "78-75")

RegexMatch("78-75", 1="78", 2="75")

### Decimals and Integers

In [28]:
int_re = r"[+-]?\d+"

r"[+-]?\d+"

In [29]:
map(s-> occursin(int_re, s),["1234", "+1234", "-1234"])

3-element Vector{Bool}:
 1
 1
 1

In [20]:
dec_re = r"^[+-]?\d+\.\d*$"

r"^[+-]?\d+\.\d*$"

In [21]:
map(s -> occursin(dec_re, s),["-1.3", "-1.", "14.0343", "14", "-15"])

5-element Vector{Bool}:
 1
 1
 1
 0
 0

In [22]:
int_or_dec_re = r"^[+-]?\d+(\.\d*)?$"

r"^[+-]?\d+(\.\d*)?$"

In [23]:
map(s -> occursin(int_or_dec_re, s),["-1.3", "-1.", "14.0343", "14", "-15"])

5-element Vector{Bool}:
 1
 1
 1
 1
 1

In [24]:
pt_re = r"\(([+-]?\d+),([+-]?\d+)\)"

r"\(([+-]?\d+),([+-]?\d+)\)"

In [25]:
occursin(pt_re,"(14,-17)")

true

In [36]:
m = match(pt_re, "(14,-17)")

RegexMatch("(14,-17)", 1="14", 2="-17")

In [37]:
m[1]

"14"

In [38]:
replace("Alice like cookies.  Also, Alice doesn't like carrots.", "Alice" => "Ben")

"Ben like cookies.  Also, Ben doesn't like carrots."

In [39]:
replace("Is there a doctor in the house?  There is.", r"[Ii]s" => "are")

"are there a doctor in the house?  There are."

In [40]:
replace("Are the kids still in the pond?  Are the adults sitting on the beach?", "Are" => "Is", r"\s(\w+)s\s" => s" \1 ")

"Is the kid still in the pond?  Is the adult sitting on the beach?"

In [41]:
lin = r"([+-]?\d+)x([+-]?\d+)"

r"([+-]?\d+)x([+-]?\d+)"

In [42]:
match(lin, "5x-10")

RegexMatch("5x-10", 1="5", 2="-10")

In [43]:
@testset "Linear Functions" begin
  @test match(lin, "5x-10") !== nothing
  @test match(lin, "-5x-10") !== nothing
  @test match(lin, "5x+10") !== nothing
  @test match(lin, "-5x+10") !== nothing
  @test match(lin, "5x") !== nothing
  @test match(lin, "5*x-10") !== nothing
end

Linear Functions: [91m[1mTest Failed[22m[39m at [39m[1m/Users/pstaab/code/sci-comp-book/julia-output/jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_X56sZmlsZQ==.jl:6[22m
  Expression: match(lin, "5x") !== nothing
   Evaluated: nothing !== nothing

Stacktrace:
 [1] [0m[1mmacro expansion[22m
[90m   @[39m [90m~/.julia/juliaup/julia-1.11.0-rc2+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Test/src/[39m[90m[4mTest.jl:679[24m[39m[90m [inlined][39m
 [2] [0m[1mmacro expansion[22m
[90m   @[39m [90m~/code/sci-comp-book/julia-output/[39m[90m[4mjl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_X56sZmlsZQ==.jl:6[24m[39m[90m [inlined][39m
 [3] [0m[1mmacro expansion[22m
[90m   @[39m [90m~/.julia/juliaup/julia-1.11.0-rc2+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Test/src/[39m[90m[4mTest.jl:1700[24m[39m[90m [inlined][39m
 [4] top-level scope
[90m   @[39m [90m~/code/sci-comp-book/julia-output/[39m[90m[4mjl_notebook_cell_df34fa98e69747e1a

TestSetException: Some tests did not pass: 4 passed, 2 failed, 0 errored, 0 broken.

In [44]:
lin2 = r"([-+]?\d+)\*?x([-+]?\d+)?"

r"([-+]?\d+)\*?x([-+]?\d+)?"

In [45]:
@testset "Linear Functions" begin
  @test match(lin2, "5x-10") !== nothing
  @test match(lin2, "-5x-10") !== nothing
  @test match(lin2, "5x+10") !== nothing
  @test match(lin2, "-5x+10") !== nothing
  @test match(lin2, "5x") !== nothing
  @test match(lin2, "5*x-10") !== nothing
end

[0m[1mTest Summary:    | [22m[32m[1mPass  [22m[39m[36m[1mTotal  [22m[39m[0m[1mTime[22m
Linear Functions | [32m   6  [39m[36m    6  [39m[0m0.0s


Test.DefaultTestSet("Linear Functions", Any[], 6, false, false, true, 1.722298900105663e9, 1.722298900105727e9, false, "/Users/pstaab/code/sci-comp-book/julia-output/jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_X61sZmlsZQ==.jl")

In [46]:
dec = r"^[-+]?\d+(\.\d*)?$"

r"^[-+]?\d+(\.\d*)?$"

In [47]:
@testset "Decimals" begin
   @test match(dec, "10") !== nothing
   @test match(dec, "10.") !== nothing
   @test match(dec, "10.123") !== nothing
   @test match(dec, "-10") !== nothing
   @test match(dec, "-10.") !== nothing
   @test match(dec, "-10.123") !== nothing
end

[0m[1mTest Summary: | [22m[32m[1mPass  [22m[39m[36m[1mTotal  [22m[39m[0m[1mTime[22m
Decimals      | [32m   6  [39m[36m    6  [39m[0m0.0s


Test.DefaultTestSet("Decimals", Any[], 6, false, false, true, 1.722298900122997e9, 1.722298900123055e9, false, "/Users/pstaab/code/sci-comp-book/julia-output/jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_X63sZmlsZQ==.jl")

In [48]:
lin_dec = r"([-+]?\d+(\.\d*)?)\*?x([-+]?\d+(\.\d*)?)?"

r"([-+]?\d+(\.\d*)?)\*?x([-+]?\d+(\.\d*)?)?"

In [49]:
@testset "Linear Functions with integer coefficents" begin
  @test match(lin_dec, "5x-10") !== nothing
  @test match(lin_dec, "-5x-10") !== nothing
  @test match(lin_dec, "5x+10") !== nothing
  @test match(lin_dec, "-5x+10") !== nothing
  @test match(lin_dec, "5x") !== nothing
  @test match(lin_dec, "5*x-10") !== nothing
end

[0m[1mTest Summary:                             | [22m[32m[1mPass  [22m[39m[36m[1mTotal  [22m[39m[0m[1mTime[22m
Linear Functions with integer coefficents | [32m   6  [39m[36m    6  [39m[0m0.0s


Test.DefaultTestSet("Linear Functions with integer coefficents", Any[], 6, false, false, true, 1.722298900139907e9, 1.722298900139966e9, false, "/Users/pstaab/code/sci-comp-book/julia-output/jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_X65sZmlsZQ==.jl")

In [50]:
@testset "Linear Functions with decimal coefficents" begin
  @test match(lin_dec, "5.0x-10.5") !== nothing
  @test match(lin_dec, "-5.9x-10.2") !== nothing
  @test match(lin_dec, "5.3x+10.4") !== nothing
  @test match(lin_dec, "-5.25x+10.8") !== nothing
  @test match(lin_dec, "5.x") !== nothing
  @test match(lin_dec, "5.3*x-10.55") !== nothing
end

[0m[1mTest Summary:                             | [22m[32m[1mPass  [22m[39m[36m[1mTotal  [22m[39m[0m[1mTime[22m
Linear Functions with decimal coefficents | [32m   6  [39m[36m    6  [39m[0m0.0s


Test.DefaultTestSet("Linear Functions with decimal coefficents", Any[], 6, false, false, true, 1.722298900152981e9, 1.722298900153047e9, false, "/Users/pstaab/code/sci-comp-book/julia-output/jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_X66sZmlsZQ==.jl")

In [51]:
eachsplit("18x^2-17x-10",r"[+-]")

Base.SplitIterator{String, Regex}("18x^2-17x-10", r"[+-]", 0, true)

In [52]:
match(r"(([+-]?\d+)\*?x(^\d)?)","18x^2-17x-10",)

RegexMatch("18x", 1="18x", 2="18", 3=nothing)

In [53]:
match(r"(.)?18x\^2", "18x^2-17x-10")

RegexMatch("18x^2", 1=nothing)

In [54]:
findnext(r"[+-]", "-18x^2-17x-10",1)

1:1

In [55]:
function splitPoly(p::String)
  local terms = String[]
  # if the first character is a +/-, start the index at 2
  local ind1 = occursin(r"^[+-]",p) ? 2 : 1

  while true
    ind2 = findnext(r"[+-]", p, ind1)
    if ind2 == nothing
      # Push the last term onto the term stack.
      push!(terms, string(SubString(p, ind1-1)))
      break
    end
    # The first time through the loop, the substring calculation is different.
    push!(terms, string(SubString(p,(ind1 == 1 ? 1 : ind1 -1):first(ind2)-1)))
    ind1 = first(ind2)+1
  end
  terms
end

splitPoly (generic function with 1 method)

In [56]:
splitPoly("4x^3-2x+6")

3-element Vector{String}:
 "4x^3"
 "-2x"
 "+6"

In [57]:
poly_re = r"^([+-]?\d+)(x(\^(\d+))?)?$"

r"^([+-]?\d+)(x(\^(\d+))?)?$"

In [58]:
match(poly_re, "-x^2")

In [59]:
poly_re = r"^([+-]?)(\d+)?(x(\^(\d+))?)?$"
# poly_re = r"([+-]?\d+)x\^\d+"

r"^([+-]?)(\d+)?(x(\^(\d+))?)?$"

In [60]:
match(poly_re, "-x^2")

RegexMatch("-x^2", 1="-", 2=nothing, 3="x^2", 4="^2", 5="2")

In [61]:
map(s -> match(poly_re, s), splitPoly("4x^3-2x+6"))

3-element Vector{RegexMatch{String}}:
 RegexMatch("4x^3", 1="", 2="4", 3="x^3", 4="^3", 5="3")
 RegexMatch("-2x", 1="-", 2="2", 3="x", 4=nothing, 5=nothing)
 RegexMatch("+6", 1="+", 2="6", 3=nothing, 4=nothing, 5=nothing)

In [62]:
match(poly_re, "-17x")

RegexMatch("-17x", 1="-", 2="17", 3="x", 4=nothing, 5=nothing)

In [63]:
match(poly_re, "+10")

RegexMatch("+10", 1="+", 2="10", 3=nothing, 4=nothing, 5=nothing)

In [64]:
using Revise
includet("Polynomial.jl")
using .Poly

In [65]:
p = Polynomial([1,2,3])

  1 x^0 + 2 x^1 + 3 x^2

In [66]:
function polyTerm(str::String)
  local poly_re = r"^([+-]?)(\d+)?(x(\^(\d+))?)?$"
  local m = match(poly_re, str)
  local c = "$(m[1])$(m[2] == nothing ? 1 : m[2])"
  (coeff = parse(Int, c), pow = m[3] == "x" ? 1 : m[5] !== nothing ? parse(Int, m[5]) : 0)
end

polyTerm (generic function with 1 method)

In [67]:
@testset begin
  @test polyTerm("10x^3") == (coeff = 10, pow = 3)
  @test polyTerm("-10x^3") == (coeff = -10, pow = 3)
  @test polyTerm("+x^3") == (coeff = 1, pow = 3)
  @test polyTerm("-x^3") == (coeff = -1, pow = 3)
  @test polyTerm("5x") == (coeff = 5, pow = 1)
  @test polyTerm("-10") == (coeff = -10, pow = 0)
  @test polyTerm("5") == (coeff = 5, pow = 0)
  @test polyTerm("+5") == (coeff = 5, pow = 0)
  @test polyTerm("-x") == (coeff = -1, pow = 1)
  @test polyTerm("x") == (coeff = 1, pow = 1)
end

[0m[1mTest Summary: | [22m[32m[1mPass  [22m[39m[36m[1mTotal  [22m[39m[0m[1mTime[22m
test set      | [32m  10  [39m[36m   10  [39m[0m0.1s


Test.DefaultTestSet("test set", Any[], 10, false, false, true, 1.722298904473303e9, 1.722298904554002e9, false, "/Users/pstaab/code/sci-comp-book/julia-output/jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_Y122sZmlsZQ==.jl")

In [68]:
Polynomial("4x^3-2x+6")

  6 x^0 + -2 x^1 + 0 x^2 + 4 x^3

In [88]:
include("test-polynomial.jl")

[0m[1mTest Summary:                                     | [22m[32m[1mPass  [22m[39m[36m[1mTotal  [22m[39m[0m[1mTime[22m
Creating a Polynomial as a Vector of Coefficients | [32m  12  [39m[36m   12  [39m[0m0.0s
"here" = "here"
"here" = "here"
"here" = "here"
[0m[1mTest Summary:                                                 | [22m[32m[1mPass  [22m[39m[36m[1mTotal  [22m[39m[0m[1mTime[22m
Creating a Polynomial from a string with integer coefficients | [32m   9  [39m[36m    9  [39m[0m0.0s


Test.DefaultTestSet("Creating a Polynomial from a string with integer coefficients", Any[], 9, false, false, true, 1.722299620867498e9, 1.722299620874879e9, false, "/Users/pstaab/code/sci-comp-book/julia-output/test-polynomial.jl")

In [78]:
Polynomial([1,1.5,2//3])

  1.0 x^0 + 1.5 x^1 + 0.6666666666666666 x^2