## Regex 

##### Some Regex modifiers that come after the regex object...
- `r"stuff"i` makes the regex case insensitive
- `r"stuff"m` treats the string as multiple lines
- `r"stuff"s` treats string as a single line
- `r"stuff"x` tells the regex parser to ignore most whitespace that is neither backslashed nor within a character class. This can be used to break up regular expressions into slightly more readable parts. `#` is also treated as a metacharacter introducing a comment...

Once we have a match object from `m = match(r"nan"i, "BANANA")`, we can acccess:
- `m.match` cntains the entire substring that matched
- `m.captures`
- `m.offset`
- `m.offsets`

See https://docs.julialang.org/en/v1/manual/strings/#Regular-Expressions for more

In [1]:
# r"string" actually means regex string, not raw string
typeof(r"foo")

Regex

In [2]:
# Other literal options
println(typeof(v"1.2.3"))
println(typeof(b"byte array literals"))
# println(typeof(pkg"package")) doesn't work in notebooks? yet, not sure...

VersionNumber
Base.CodeUnits{UInt8, String}


In [3]:
reg = r"[0-9]+"  # find one or more numerals
match(reg, "It was 1970.. it was UNIX Epoch Time!")

RegexMatch("1970")

In [4]:
match(r"it was"i, "It was 1970")

RegexMatch("It was")

In [5]:
occursin(r"hello"i, "HELLO, there!")

true

In [6]:
occursin(r"kiwi", "banana")

false

In [7]:
match(r"nana", "banana")

RegexMatch("nana")

In [8]:
# Match all letters
match(r"[a-zA-Z]*", "banana rama 123 kiwi")

RegexMatch("banana")

In [9]:
# If we want greedy evaluation to get all matches of 1 or more letters
collect(eachmatch(r"[a-zA-Z]+", "a 23ft tall kiwi fruit"))

5-element Vector{RegexMatch}:
 RegexMatch("a")
 RegexMatch("ft")
 RegexMatch("tall")
 RegexMatch("kiwi")
 RegexMatch("fruit")

In [10]:
match(r"nana"x, "a \t banana nana")

RegexMatch("nana")

In [11]:
collect(eachmatch(r"nana", "a \t banana nana23"))

2-element Vector{RegexMatch}:
 RegexMatch("nana")
 RegexMatch("nana")

In [12]:
collect(eachmatch(r"[0-9]+", "abc123def456ghi9jkl10"))

4-element Vector{RegexMatch}:
 RegexMatch("123")
 RegexMatch("456")
 RegexMatch("9")
 RegexMatch("10")

In [13]:
for m in eachmatch(r"[0-9]+", "abc123def456ghi9jkl10")
    println(m)
end

RegexMatch("123")
RegexMatch("456")
RegexMatch("9")
RegexMatch("10")


In [14]:
raw"This is a raw string"

"This is a raw string"

## Numbers
- Number types: complex and real
- Real number subtypes: AbstractFloat, Integer, Irrational, and Rational
- Integer subtypes: BigInt, Bool, Signed, Unsigned

See https://docs.julialang.org/en/v1/manual/mathematical-operations/

In [15]:
typeof(42)

Int64

In [16]:
@show Int

Int = Int64


Int64

In [17]:
typemax(Int)

9223372036854775807

In [18]:
typemax(Int) + typemax(Int)

-2

In [19]:
typemin(Int)

-9223372036854775808

In [20]:
# Rational numbers
2//3

2//3

In [21]:
# Some LaTeX symbols evaluate to their appropriate constant value or anticipated function
π

π = 3.1415926535897...

In [22]:
# Vectorized dot operations
[1, 2, 3].^ 2

3-element Vector{Int64}:
 1
 4
 9

In [23]:
# we can vectorize any function, too
f(x) = x ^ 2
f.([1, 2, 3]) # vectorize any function with function_name.(vector)

3-element Vector{Int64}:
 1
 4
 9

In [24]:
# Tuples
(1, 2)

(1, 2)

In [25]:
+(1, 2)

3

In [26]:
# Vectorized dot syntax works on tuples, yay!
(1, 2) .+ (3, 4)

(4, 6)

In [27]:
# Named tuples are a thing, yo! and we access keys w/ dot syntax
skills = (logic=["Julia", "Python"], database="MySQL")
skills[1] == skills.logic

true

In [28]:
skills.database

"MySQL"

### Ranges


In [29]:
r = 1:20
typeof(r)

UnitRange{Int64}

In [30]:
r[10:end] .+ 2

12:22

In [31]:
# char:char ranges
x = 'a':'z'

'a':1:'z'

In [32]:
typeof('r'), typeof("r")

(Char, String)

In [33]:
x[end]

'z': ASCII/Unicode U+007A (category Ll: Letter, lowercase)

In [34]:
# Expanding a range with the ... "spat" operator
(1:5...,)

(1, 2, 3, 4, 5)

In [35]:
# not quite what I expected, but reasonable
tuple(1:4)

(1:4,)

In [36]:
# We can also "splat" into a list
[1:5...]

5-element Vector{Int64}:
 1
 2
 3
 4
 5

In [37]:
# Ranges also have start:stop:end syntax
[0:5:20...]

5-element Vector{Int64}:
  0
  5
 10
 15
 20

In [38]:
[20:-3:1...]

7-element Vector{Int64}:
 20
 17
 14
 11
  8
  5
  2

In [39]:
typeof([1, 2])

Vector{Int64} (alias for Array{Int64, 1})

## Arrays
`Array{Type, Dimensions}`

Functions:
- zeros, ones
- trues, falses
- `similar`
- `rand`
- `fill`

In [40]:
[1, 2, 3]

3-element Vector{Int64}:
 1
 2
 3

In [41]:
Float32[1, 2, 3]

3-element Vector{Float32}:
 1.0
 2.0
 3.0

In [42]:
# Lose the commas and we have 2d Array syntax
[1 2 3]

1×3 Matrix{Int64}:
 1  2  3

In [43]:
[1 2 3; 4 5 6; 7 8 9 ]

3×3 Matrix{Int64}:
 1  2  3
 4  5  6
 7  8  9

In [44]:
zeros(5)

5-element Vector{Float64}:
 0.0
 0.0
 0.0
 0.0
 0.0

In [45]:
ones(5, 2)

5×2 Matrix{Float64}:
 1.0  1.0
 1.0  1.0
 1.0  1.0
 1.0  1.0
 1.0  1.0

In [46]:
fill(23, 2, 3)

2×3 Matrix{Int64}:
 23  23  23
 23  23  23

In [47]:
rand(Char, 2, 2)

2×2 Matrix{Char}:
 '\U4c7df'  '\U5fc8c'
 '\U76430'  '\Ud71ba'

In [48]:
rand(Bool, 5)

5-element Vector{Bool}:
 1
 0
 0
 0
 1

In [49]:
# Colon syntax
# [start_row: inclusive_end_row, start_column:end_column]
a = rand(5, 5)
a[1:1, 2:4]

1×3 Matrix{Float64}:
 0.764912  0.719547  0.119906

In [50]:
a[:, 4:5] # solo colon means all

5×2 Matrix{Float64}:
 0.119906  0.686459
 0.290215  0.768265
 0.938135  0.451664
 0.928718  0.833005
 0.166658  0.367595

In [51]:
x = rand(5)
mask = rand(Bool, 5)
println(x); println(mask)
x[mask]

[0.39636625390712554, 0.13014395037173876, 0.9349062746921843, 0.7449852303364874, 0.26794636497315105]
Bool[0, 0, 1, 1, 1]


3-element Vector{Float64}:
 0.9349062746921843
 0.7449852303364874
 0.26794636497315105

In [52]:
# We can also subset into an array to perform reassignment
x = [1 2 3; 4 5 6; 7 8 9]

x[1, 1] = 99
x

3×3 Matrix{Int64}:
 99  2  3
  4  5  6
  7  8  9

In [53]:
x[:, end:end] .= 23
x

3×3 Matrix{Int64}:
 99  2  23
  4  5  23
  7  8  23

In [54]:
# We can use boolean arryas as selectors, too
x[[false, true, false], :] = ones(3, 1)
x

3×3 Matrix{Int64}:
 99  2  23
  1  1   1
  7  8  23

In [55]:
x

3×3 Matrix{Int64}:
 99  2  23
  1  1   1
  7  8  23

In [56]:
# Iteration
for beatle in ["John", "Paul", "George", "Ringo"]
    println("Hello, $beatle")
end

Hello, John
Hello, Paul
Hello, George
Hello, Ringo


In [57]:
# Iteration w/ the index
beatles = ["John", "Paul", "George", "Ringo"]
for i in eachindex(beatles)
    println("$i. $(beatles[i])")
end

1. John
2. Paul
3. George
4. Ringo


#### Mutating Array
- Arrays are PASS BY REFERENCE, yo!
- The `!` is a hint that the function is a mutation function
- `push!(arr, 4)`

In [58]:
# Mutating arrays
arr = [1, 2, 3]
push!(arr, 4)

4-element Vector{Int64}:
 1
 2
 3
 4

In [59]:
pop!(arr)

4

In [60]:
arr

3-element Vector{Int64}:
 1
 2
 3

In [61]:
# Pass by reference
a = [1, 2, 3]
b = a
pop!(b)
a

2-element Vector{Int64}:
 1
 2

In [62]:
# dele-teat 
deleteat!(arr, 1)
arr

2-element Vector{Int64}:
 2
 3

In [63]:
# Creating copies
a = [1, 2, 3]
b = copy(a)
a = ones(3)
b

3-element Vector{Int64}:
 1
 2
 3

## Comprehend some Comprehension Syntax!

In [64]:
[x + 1 for x = 1:5]

5-element Vector{Int64}:
 2
 3
 4
 5
 6

In [65]:
[x for x = 1:5 if x % 2 == 0]

2-element Vector{Int64}:
 2
 4

## Generators!
- superpower of comprehensions is activated when they're used for creating generators
- generators yield values on demand, rather than allocating memory for arrays and storing values in advance
- Create a generator with `()` instead of `[]`
- For example: `(x+1 for x = 1:10)`
- Allow us to work on potentially infinite collections!
- Generators run "practically in constant time"
- Square brackets realize the values in memory, parens create a generate

In [66]:
@time [x+1 for x = 1:10]

  0.014355 seconds (42.54 k allocations: 2.505 MiB, 108.32% compilation time)


10-element Vector{Int64}:
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11

In [67]:
@time (x+1 for x = 1:10)

  0.003779 seconds (2.32 k allocations: 162.373 KiB, 96.09% compilation time)


Base.Generator{UnitRange{Int64}, var"#9#10"}(var"#9#10"(), 1:10)

In [68]:
@time for i in [x^3 for x=1:1_000_000_000]
    i >= 1_000 && break
    println(i)
end

1
8
27
64
125
216
343
512
729
  3.308428 seconds (59.84 k allocations: 7.454 GiB, 0.27% gc time, 1.17% compilation time)


In [69]:
# Compare the above 7GB of memory, ~4 seconds to the following:
@time for i in (x^3 for x=1:1_000_000_000)
    i >= 1000 && break
end

  0.010957 seconds (13.93 k allocations: 910.314 KiB, 98.11% compilation time)


In [74]:
# The splat operator, ..., works like *args in Python
"""Julia Docstrings go before the function definition"""
function add(a...)

    total = 0
    for n in a
        total += n
    end
    return total
end

add(1, 2, 3)

6

In [75]:
?add

search: [0m[1ma[22m[0m[1md[22m[0m[1md[22m [0m[1ma[22m[0m[1md[22m[0m[1md[22menv re[0m[1ma[22m[0m[1md[22m[0m[1md[22mir mul[0m[1ma[22m[0m[1md[22m[0m[1md[22m r[0m[1ma[22m[0m[1md[22m2[0m[1md[22meg



Julia Docstrings go before the function definition
