## Regex 

##### Some Regex modifiers that come after the regex object...
- `r"stuff"i` makes the regex case insensitive
- `r"stuff"m` treats the string as multiple lines
- `r"stuff"s` treats string as a single line
- `r"stuff"x` tells the regex parser to ignore most whitespace that is neither backslashed nor within a character class. This can be used to break up regular expressions into slightly more readable parts. `#` is also treated as a metacharacter introducing a comment...

Once we have a match object from `m = match(r"nan"i, "BANANA")`, we can acccess:
- `m.match` cntains the entire substring that matched
- `m.captures`
- `m.offset`
- `m.offsets`

See https://docs.julialang.org/en/v1/manual/strings/#Regular-Expressions for more

In [1]:
# r"string" actually means regex string, not raw string
typeof(r"foo")

Regex

In [2]:
# Other literal options
println(typeof(v"1.2.3"))
println(typeof(b"byte array literals"))
# println(typeof(pkg"package")) doesn't work in notebooks? yet, not sure...

VersionNumber
Base.CodeUnits{UInt8,String}


In [3]:
reg = r"[0-9]+"  # find one or more numerals
match(reg, "It was 1970.. it was UNIX Epoch Time!")

RegexMatch("1970")

In [4]:
match(r"it was"i, "It was 1970")

RegexMatch("It was")

In [5]:
occursin(r"hello"i, "HELLO, there!")

true

In [6]:
occursin(r"kiwi", "banana")

false

In [7]:
match(r"nana", "banana")

RegexMatch("nana")

In [8]:
# Match all letters
match(r"[a-zA-Z]*", "banana rama 123 kiwi")

RegexMatch("banana")

In [9]:
# If we want greedy evaluation to get all matches of 1 or more letters
collect(eachmatch(r"[a-zA-Z]+", "a 23ft tall kiwi fruit"))

5-element Array{RegexMatch,1}:
 RegexMatch("a")
 RegexMatch("ft")
 RegexMatch("tall")
 RegexMatch("kiwi")
 RegexMatch("fruit")

In [10]:
match(r"nana"x, "a \t banana nana")

RegexMatch("nana")

In [11]:
collect(eachmatch(r"nana", "a \t banana nana23"))

2-element Array{RegexMatch,1}:
 RegexMatch("nana")
 RegexMatch("nana")

In [20]:
collect(eachmatch(r"[0-9]+", "abc123def456ghi9jkl10"))

4-element Array{RegexMatch,1}:
 RegexMatch("123")
 RegexMatch("456")
 RegexMatch("9")
 RegexMatch("10")

In [21]:
for m in eachmatch(r"[0-9]+", "abc123def456ghi9jkl10")
    println(m)
end

RegexMatch("123")
RegexMatch("456")
RegexMatch("9")
RegexMatch("10")


In [22]:
raw"This is a raw string"

"This is a raw string"

## Numbers
- Number types: complex and real
- Real number subtypes: AbstractFloat, Integer, Irrational, and Rational
- Integer subtypes: BigInt, Bool, Signed, Unsigned

See https://docs.julialang.org/en/v1/manual/mathematical-operations/

In [23]:
typeof(42)

Int64

In [24]:
@show Int

Int = Int64


Int64

In [25]:
typemax(Int)

9223372036854775807

In [29]:
typemax(Int) + typemax(Int)

-2

In [26]:
typemin(Int)

-9223372036854775808

In [40]:
# Rational numbers
2//3

2//3

In [45]:
# Some LaTeX symbols evaluate to their appropriate constant value or anticipated function
π

π = 3.1415926535897...

In [53]:
# Vectorized dot operations
[1, 2, 3].^ 2

3-element Array{Int64,1}:
 1
 4
 9

In [52]:
# we can vectorize any function, too
f(x) = x ^ 2
f.([1, 2, 3]) # vectorize any function with function_name.(vector)

3-element Array{Int64,1}:
 1
 4
 9

In [54]:
# Tuples
(1, 2)

(1, 2)

In [55]:
+(1, 2)

3

In [58]:
# Vectorized dot syntax works on tuples, yay!
(1, 2) .+ (3, 4)

(4, 6)

In [61]:
# Named tuples are a thing, yo! and we access keys w/ dot syntax
skills = (logic=["Julia", "Python"], database="MySQL")
skills[1] == skills.logic

true

In [62]:
skills.database

"MySQL"

### Ranges


In [65]:
r = 1:20
typeof(r)

UnitRange{Int64}

In [67]:
r[10:end] .+ 2

12:22

In [79]:
# char:char ranges
x = 'a':'z'

'a':1:'z'

In [74]:
typeof('r'), typeof("r")

(Char, String)

In [80]:
x[end]

'z': ASCII/Unicode U+007A (category Ll: Letter, lowercase)

In [83]:
# Expanding a range with the ... "spat" operator
(1:5...,)

(1, 2, 3, 4, 5)

In [84]:
# not quite what I expected, but reasonable
tuple(1:4)

(1:4,)

In [87]:
# We can also "splat" into a list
[1:5...]

5-element Array{Int64,1}:
 1
 2
 3
 4
 5

In [90]:
# Ranges also have start:stop:end syntax
[0:5:20...]

5-element Array{Int64,1}:
  0
  5
 10
 15
 20

In [93]:
[20:-3:1...]

7-element Array{Int64,1}:
 20
 17
 14
 11
  8
  5
  2

In [94]:
typeof([1, 2])

Array{Int64,1}

## Arrays
`Array{Type, Dimensions}`

Functions:
- zeros, ones
- trues, falses
- `similar`
- `rand`
- `fill`

In [95]:
[1, 2, 3]

3-element Array{Int64,1}:
 1
 2
 3

In [96]:
Float32[1, 2, 3]

3-element Array{Float32,1}:
 1.0
 2.0
 3.0

In [98]:
# Lose the commas and we have 2d Array syntax
[1 2 3]

1×3 Array{Int64,2}:
 1  2  3

In [99]:
[1 2 3; 4 5 6; 7 8 9 ]

3×3 Array{Int64,2}:
 1  2  3
 4  5  6
 7  8  9

In [101]:
zeros(5)

5-element Array{Float64,1}:
 0.0
 0.0
 0.0
 0.0
 0.0

In [102]:
ones(5, 2)

5×2 Array{Float64,2}:
 1.0  1.0
 1.0  1.0
 1.0  1.0
 1.0  1.0
 1.0  1.0

In [104]:
fill(23, 2, 3)

2×3 Array{Int64,2}:
 23  23  23
 23  23  23

In [105]:
rand(Char, 2, 2)

2×2 Array{Char,2}:
 '\U6b477'  '𧠂'
 'ꚳ'        '\U548c4'

In [112]:
rand(Bool, 5)

5-element Array{Bool,1}:
 0
 1
 1
 1
 1

In [108]:
# Colon syntax
# [start_row: inclusive_end_row, start_column:end_column]
a = rand(5, 5)
a[1:1, 2:4]

1×3 Array{Float64,2}:
 0.103969  0.208695  0.00129328

In [109]:
a[:, 4:5] # solo colon means all

5×2 Array{Float64,2}:
 0.00129328  0.922168
 0.713344    0.428368
 0.585893    0.69649
 0.856391    0.411199
 0.57185     0.520093

In [114]:
x = rand(5)
mask = rand(Bool, 5)
println(x); println(mask)
x[mask]

[0.4645212224405397, 0.5995442049710262, 0.6667045572766359, 0.37877872906958276, 0.4472063573071492]
Bool[1, 1, 0, 0, 1]


3-element Array{Float64,1}:
 0.4645212224405397
 0.5995442049710262
 0.4472063573071492

In [132]:
# We can also subset into an array to perform reassignment
x = [1 2 3; 4 5 6; 7 8 9]

x[1, 1] = 99
x

3×3 Array{Int64,2}:
 99  2  3
  4  5  6
  7  8  9

In [133]:
x[:, end:end] .= 23
x

3×3 Array{Int64,2}:
 99  2  23
  4  5  23
  7  8  23

In [134]:
# We can use boolean arryas as selectors, too
x[[false, true, false], :] = ones(3, 1)
x

3×3 Array{Int64,2}:
 99  2  23
  1  1   1
  7  8  23

In [135]:
x

3×3 Array{Int64,2}:
 99  2  23
  1  1   1
  7  8  23

In [139]:
# Iteration
for beatle in ["John", "Paul", "George", "Ringo"]
    println("Hello, $beatle")
end

Hello, John
Hello, Paul
Hello, George
Hello, Ringo


In [144]:
# Iteration w/ the index
beatles = ["John", "Paul", "George", "Ringo"]
for i in eachindex(beatles)
    println("$i. $(beatles[i])")
end

1. John
2. Paul
3. George
4. Ringo


#### Mutating Array
- Arrays are PASS BY REFERENCE, yo!
- The `!` is a hint that the function is a mutation function
- `push!(arr, 4)`

In [145]:
# Mutating arrays
arr = [1, 2, 3]
push!(arr, 4)

4-element Array{Int64,1}:
 1
 2
 3
 4

In [147]:
pop!(arr)

4

In [148]:
arr

3-element Array{Int64,1}:
 1
 2
 3

In [150]:
# Pass by reference
a = [1, 2, 3]
b = a
pop!(b)
a

2-element Array{Int64,1}:
 1
 2

In [151]:
# dele-teat 
deleteat!(arr, 1)
arr

2-element Array{Int64,1}:
 2
 3

In [152]:
# Creating copies
a = [1, 2, 3]
b = copy(a)
a = ones(3)
b

3-element Array{Int64,1}:
 1
 2
 3

## Comprehend some Comprehension Syntax!

In [153]:
[x + 1 for x = 1:5]

5-element Array{Int64,1}:
 2
 3
 4
 5
 6

In [156]:
[x for x = 1:5 if x % 2 == 0]

2-element Array{Int64,1}:
 2
 4

## Generators!
- superpower of comprehensions is activated when they're used for creating generators
- generators yield values on demand, rather than allocating memory for arrays and storing values in advance
- Create a generator with `()` instead of `[]`
- For example: `(x+1 for x = 1:10)`
- Allow us to work on potentially infinite collections!
- Generators run "practically in constant time"

In [157]:
(x+1 for x = 1:10)

Base.Generator{UnitRange{Int64},var"#13#14"}(var"#13#14"(), 1:10)

In [158]:
@time for i in [x^3 for x=1:1_000_000_000]
    i >= 1_000 && break
    println(i)
end

1
8
27
64
125
216
343
512
729
  3.884488 seconds (68.93 k allocations: 7.454 GiB, 0.20% gc time)


In [159]:
# Compare the above 7GB of memory, ~4 seconds to the following:
@time for i in (x^3 for x=1:1_000_000_000)
    i >= 1000 && break
end

  0.017824 seconds (13.53 k allocations: 781.443 KiB)
