kmp

Based on **Lauwens & Downey "Think Julia: How to Think Like a Computer Scientist"**

https://benlauwens.github.io/ThinkJulia.jl/latest/book.html

Resources:

Julia webpage https://julialang.org/ 

Julia documentation https://docs.julialang.org/en/v1/


In [1]:
pwd()

"e:\\aaa-Julia-course-2023\\lectures-1.9"

In [42]:
words = ["act" "ace"; "bait" "bail"]

2×2 Matrix{String}:
 "act"   "ace"
 "bait"  "bail"

In [43]:
sort(words, dims=2)

2×2 Matrix{String}:
 "ace"   "act"
 "bail"  "bait"

In [3]:
ps = pairs(sort(words, dims=2))

pairs(::Matrix{String})(...):
  CartesianIndex(1, 1) => "ace"
  CartesianIndex(2, 1) => "bail"
  CartesianIndex(1, 2) => "act"
  CartesianIndex(2, 2) => "bait"

In [47]:
d = Dict(pairs(sort(words, dims=2)))

Dict{CartesianIndex{2}, String} with 4 entries:
  CartesianIndex(2, 1) => "bail"
  CartesianIndex(1, 1) => "ace"
  CartesianIndex(2, 2) => "bait"
  CartesianIndex(1, 2) => "act"

In [48]:
key = CartesianIndex(1,2)
d[key]

"act"

In [7]:
enumerate(d)

enumerate(Dict{CartesianIndex{2}, String}(CartesianIndex(2, 1) => "bail", CartesianIndex(1, 1) => "ace", CartesianIndex(2, 2) => "bait", CartesianIndex(1, 2) => "act"))

In [8]:
typeof(enumerate(d))

Base.Iterators.Enumerate{Dict{CartesianIndex{2}, String}}

In [9]:
typeof(3 => "bail")

Pair{Int64, String}

## Chapter 12 -- Tuples

https://benlauwens.github.io/ThinkJulia.jl/latest/book.html#chap12

This chapter presents one more built-in object type, the **`tuple`**, and then shows how arrays, dictionaries, and tuples work together as well as the useful feature of variable-length argument arrays, the **`gather`** and **`scatter`** operators.

### Tuples are immutable

**A tuple is a sequence of values.** The values can be of any type, each element can have its own type, and they are indexed by integers, so in that respect tuples are similar to arrays. The important difference is that tuples are **`immutable`**.

Syntactically, a tuple is a **comma-separated list of values**:

```Julia
	julia> t = 'a', 'b', 'c', 'd', 'e'
		('a', 'b', 'c', 'd', 'e')

```
**Although it is not necessary, it is common to enclose tuples in parentheses**:

```Julia
	julia> t = ('a', 'b', 'c', 'd', 'e')
		('a', 'b', 'c', 'd', 'e')
```

To create a **tuple with a single element**, you have to include a **final comma**:

```Julia
	julia> t1 = ('a',)
		('a',)

	julia> typeof(t1)
		Tuple{Char}
```

A value in parentheses without comma is not a tuple:

```Julia
	julia> t2 = ('a')
		'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

	julia> typeof(t2)
		Char
```

Another way to create a tuple is the built-in function **`tuple`**. With no argument, it creates an empty tuple:

```Julia
	julia> tuple()
		()
```

If multiple arguments are provided, the result is a tuple with the given arguments:

```Julia
	julia> t3 = tuple(1, 'a', pi)
		(1, 'a', π)
```

Because `tuple` is the name of a built-in function, you should avoid using it as a variable name. Most array operators also work on tuples. The **`bracket operator`** indexes an element:

```Julia
	julia> t = ('a', 'b', 'c', 'd', 'e');

	julia> t[1]
		'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
```

And the **`slice operator`** selects a range of elements:

```Julia
	julia> t[2:4]
		('b', 'c', 'd')
```

In [10]:
tpl = ('a',)

('a',)

In [11]:
t = ('a', 'b', 'c', 'd', 'e')

t[begin:2:end]

('a', 'c', 'e')

If you try to modify one of the elements of the tuple, you get an error. Because **tuples are immutable**, you cannot modify its elements:

In [12]:
t[1] = 'A'

MethodError: MethodError: no method matching setindex!(::NTuple{5, Char}, ::Char, ::Int64)

An important subtlety is that although the tuple **`tpl`** is immutable itself, **the comma-separated list might contain objects or components that are mutable**:

In [13]:
tpl = "a", 'b', [9, 8, 7]
tpl[3][2] = 'c' 		        # for clarity (tpl[3])[2] = 3
	
tpl          # not the type conversion, so-called promotion, Int('c') = 99

("a", 'b', [9, 99, 7])

In [14]:
typeof([9, 8, 7])

Vector{Int64}[90m (alias for [39m[90mArray{Int64, 1}[39m[90m)[39m

In [15]:
tpl[2] = 'c'

MethodError: MethodError: no method matching setindex!(::Tuple{String, Char, Vector{Int64}}, ::Char, ::Int64)

The **`relational operators`** work with tuples as well as other sequences. Julia starts by comparing the first element from each sequence. If they are equal, it goes on to the next elements, and so on, until it finds elements that differ. Subsequent elements are not considered, just as for a **lexicographic order**.

```Julia
	julia> (0, 1, 2) < (0, 3, 4)
		true
```

In [16]:
(0, 1) < (0, 1, 200)

true

In [17]:
(0, 1, 2000000) < (0, 3, 4), (0, 4) < (0, 3, 4), (0, 2, 4) < (0, 4)

(true, false, true)

### Tuple assignment

It is often useful to **swap values** of two variables. With conventional assignments, you have to use a **`temporary variable`**. For example, to swap a and b:

```Julia
	tmp = a
	a = b
	b = tmp
```

A **`tuple assignment`** is simpler:

```Julia
	a, b = b, a
```

The left side is a **`tuple of variables`**; the right side is a **`tuple of expressions`**. Each value is assigned to its respective variable. All the expressions on the right side are evaluated before any of the assignments which are assigned in the given order. **The number of variables on the left has to be fewer or equal than the number of values on the right**.


In [18]:
(a, b) = (1, 2, 3)
a, b

(1, 2)

In [19]:
a, b, c = 1, 2

BoundsError: BoundsError: attempt to access Tuple{Int64, Int64} at index [3]

More generally, the right side can be any kind of sequence (string, array or tuple).

In [20]:
a, b, c = [1, 2, 3], "string", ('t', 'u', 'p', 'l', 'e')

([1, 2, 3], "string", ('t', 'u', 'p', 'l', 'e'))

In [21]:
c

('t', 'u', 'p', 'l', 'e')

In [22]:
d = a, b, c

([1, 2, 3], "string", ('t', 'u', 'p', 'l', 'e'))

In [23]:
d[1]

3-element Vector{Int64}:
 1
 2
 3

Another example is to use **`split`** in a tuple assignment to split, for instance, an email address into a user name and a domain:

In [24]:
addr = "julius.caesar@rome"
uname, domain = split(addr, '@')

2-element Vector{SubString{String}}:
 "julius.caesar"
 "rome"

### Tuples as return values

Strictly speaking, a function can only return one value, but if the value is a tuple, the effect is the same as returning multiple values. 

For example, if you want to divide two integers and compute the quotient and remainder, it is inefficient to compute x ÷ y and then x % y. It is better to compute them both at the same time. The built-in function **divrem** takes two arguments and returns a tuple of two values, the quotient and remainder. You can store the result as a tuple:

```Julia
	julia> t = divrem(7, 3)
		(2, 1)
```

Or use tuple assignment to store the elements separately:

```Julia
	julia> q, r = divrem(7, 3);
```

Here is a function that returns a tuple:

```Julia
	function minmax(t)
		return minimum(t), maximum(t)
	end
```

**`maximum`** and **`minimum`** are built-in functions that find the largest and smallest elements of a sequence and `minmax` in the example computes both and returns a tuple of two values. The built-in function **`extrema`** is more efficient.

In [25]:
t = rand(1:365, 100)
mi, ma = extrema(t)

(2, 364)

### Variable-length argument tuples

Functions can take a variable number of arguments. **A parameter name that ends with `...` gathers, slurps, the arguments into a tuple.** For example, **`printall`** takes any number of arguments and prints them:

```Julia
	function printall(args...)
		println(args)
	end
```

The **gather parameter** can have any name you like, but args is conventional. Here is how the function works:

In [27]:
function printall(args...)  # slurp 
   args	    # note args is a tuple after slurp has acted
end

printall(1, 2.0, '3', "banana", (1, 2))

(1, 2.0, '3', "banana", (1, 2))

In [28]:
function printall(args...)
    args, args[4]
 end
 
 printall(1, 2.0, '3', "banana")

((1, 2.0, '3', "banana"), "banana")

The complement of gather is **`scatter` or `splatter`**. If you have a sequence of values, a tuple, and you want to pass it to a function as multiple arguments, you can use the **`...` operator**. 

For example, divrem takes exactly two arguments; it does not work with a tuple:

In [29]:
t = (7, 3)
divrem(t)

MethodError: MethodError: no method matching divrem(::Tuple{Int64, Int64})

Closest candidates are:
  divrem(::T, !Matched::Base.MultiplicativeInverses.MultiplicativeInverse{T}) where T
   @ Base multinverses.jl:152
  divrem(::Any, !Matched::Any)
   @ Base div.jl:179
  divrem(::Any, !Matched::Any, !Matched::RoundingMode{:FromZero})
   @ Base div.jl:262
  ...


But if you **`scatter`** the tuple, it works:

In [30]:
divrem(t...)

(2, 1)


Many of the built-in functions use variable-length argument tuples. For example, `max` and `min` can take any number of arguments:

```Julia
	julia> max(1, 2, 3)
		3
```

But sum does not:

```Julia
	julia> sum(1, 2, 3)
	ERROR: MethodError: no method matching sum(::Int64, ::Int64, ::Int64)
```

In [31]:
t = 1, 2, 3
sum( (1, 2, 3) ), sum(t)

(6, 6)

In the Julia world, `gather` is often called **slurp** and `scatter` **splatter**.

### Arrays and Tuples

The built in function **`zip`** takes two or more sequences and **returns a collection of tuples** where each tuple contains one element from each sequence. This example zips a string and an array:

```Julia
	julia> s = "abc"; t = [1, 2, 3];

	julia> zip(s, t)
	Base.Iterators.Zip{Tuple{String,Array{Int64,1}}}(("abc", [1, 2, 3]))
```

The result is a **`zip object`**, an **iterator**, that knows how to **iterate through the pairs**. The most common use of zip is in a for loop:

In [32]:
s = "abc"; t = [1, 2, 3];
zip(s, t)
for (chr, num) in zip(s, t)
    println(chr)
    println(num)
end

a
1
b
2
c
3


In [33]:
s = "abc"; t = [1, 2, 3];
zip(s, t)
for pair in zip(s, t)
    println(pair[1])
    println(pair[2])
end

a
1
b
2
c
3


A `zip object` is a type of **`iterator`**, which is any object that can be used to iterate through a sequence. Iterators are similar to arrays in some ways, but unlike arrays, you cannot use an index to select an element from a plain iterator. If you want to use array operators and functions, you can use a zip object to make an array by using **`collect`**:

```Julia
	julia> collect(zip(s, t))
		3-element Array{Tuple{Char,Int64},1}:
		('a', 1)
		('b', 2)
		('c', 3)
```

The result is a **vector of tuples**; in this example, each tuple contains a character from the string and the corresponding element from the array.

In [51]:
s

"abc"

In [52]:
t

3-element Vector{Tuple{Char, Int64}}:
 ('a', 1)
 ('b', 2)
 ('c', 3)

In [50]:
collect(zip(s, t))

3-element Vector{Tuple{Char, Tuple{Char, Int64}}}:
 ('a', ('a', 1))
 ('b', ('b', 2))
 ('c', ('c', 3))

In [36]:
collect(zip(s, t))[2][1]

'b': ASCII/Unicode U+0062 (category Ll: Letter, lowercase)

In [37]:
dump(collect(zip(s, t)))  # to get a view of the actual data structure

Array{Tuple{Char, Int64}}((3,))
  1: Tuple{Char, Int64}
    1: Char

 'a'
    2: Int64 1
  2: Tuple{Char, Int64}
    1: Char 'b'
    2: Int64 2
  3: Tuple{Char, Int64}
    1: Char 'c'
    2: Int64 3


**If the sequences are not of the same length, the result has the length of the shorter one.**

In [38]:
collect(zip("Anne", "Elk"))

3-element Vector{Tuple{Char, Char}}:
 ('A', 'E')
 ('n', 'l')
 ('n', 'k')

And you can use **tuple assignment** in a for loop to traverse an array of tuples:

In [39]:
t = [('a', 1), ('b', 2), ('c', 3)]

for (letter, number) in t 		# splitting by letter and number
    println(number, " ", letter)
end

1 a
2 b
3 c


Each time through the loop, Julia selects the next tuple in the array and assigns the elements to letter and number. **The parentheses around (letter, number) are obligatory**.

If you combine **zip, for and tuple assignment**, you get a useful idiom for traversing two (or more) sequences at the same time. For example, `hasmatch` takes two sequences, t1 and t2, and returns true if there is an index i such that t1[i] == t2[i]:

```Julia
	function hasmatch(t1, t2)
		for (x, y) in zip(t1, t2)
			if x == y
				return true
			end
		end
		return false
	end
```

If you need to traverse the elements of a sequence and their indices, you can use the built-in function **`enumerate`**:

In [40]:
for (index, element) in enumerate("abc")
    println(index, " ", element)
end

1 a
2 b
3 c


The result from enumerate is an **`enumerate object`**, which iterates a sequence of pairs; each pair contains an index (starting from 1) and an element from the given sequence.

In [53]:
d = Dict('a'=>3, 'b'=>2, 'c'=>1)

ks = sort(collect(keys(d)))

vs = sort(collect(values(d)))

for key in ks
    println(d[key], " ", key)
end

println()

# note that after sorting, the order of keys and values does not correspond any longer
for k in ks  
    println(k)
end

println()

for v in vs  
    println(v)
end

3 a
2 b
1 c

a
b
c

1
2
3


Combining Dict with zip yields a concise way to create a dictionary:

```Julia
	julia> d = Dict(zip("abc", 1:3))
		Dict{Char,Int64} with 3 entries:
			'a' => 1
			'c' => 3
			'b' => 2
```

It is common to use **tuples as keys** in dictionaries. For example, a telephone directory might map from (last-name, first-name) pairs to telephone numbers. Assuming that we have defined last, first and number, we could write:

```Julia
	directory[last, first] = number
```

**The expression in brackets is a tuple.** We could use tuple assignment to traverse this dictionary.

```Julia
	for ((last, first), number) in directory
		println(first, " ", last, " ", number)
	end
```

This loop traverses the key-value pairs in directory, which are tuples. It assigns the elements of the key in each tuple to last and first, and the value to number, then prints the name and corresponding telephone number.

### Sequences of sequences

The focused has been on arrays of tuples, but almost all of the examples in this chapter also work with arrays of arrays, tuples of tuples, and tuples of arrays. To avoid enumerating all the possible combinations, it is sometimes easier to talk about sequences of sequences. In many contexts, the different kinds of sequences (strings, arrays and tuples) can be used interchangeably. So how should you choose one over the others?

To start with the obvious, strings are more limited than other sequences because the elements have to be characters. They are also immutable. If you need the ability to change the characters in a string (as opposed to creating a new string), you might want to use an array of characters instead.

Arrays are more common than tuples, mostly because they are mutable. But there are a few cases where you might prefer tuples:
- In some contexts, like a return statement, it is syntactically simpler to create a tuple than an array.
- If you are passing a sequence as an argument to a function, using tuples reduces the potential for unexpected behavior due to aliasing.
- For performance reasons. The compiler can specialize on the type.

Because tuples are immutable, they do not provide functions like `sort!` and `reverse!`, which modify existing arrays. But Julia provides the built-in function **`sort`**, which takes an array and returns a new array with the same elements in sorted order, and **`reverse`**, which takes any sequence and returns a sequence of the same type in reverse order.

### Debugging

Arrays, dictionaries and tuples are examples of **`data structures`**; in this notebook we are begining to see **compound data structures**, like arrays of tuples, or dictionaries that contain tuples as keys and arrays as values. 

Compound data structures are useful, but they are prone to **"shape errors"**; that is, errors caused when a data structure has the wrong **type, size**, or **structure**. For example, if you are expecting an array with one integer and I give you a plain old integer (not in an array), it will not work.

Julia allows to **attach a type to elements of a sequence**. How this is done is detailed in **Multiple Dispatch**. Specifying the type eliminates a lot of shape errors.


## Excercises

### Exercise 12-1
Write a function called `sumall` that takes any number of arguments and returns their sum.

### Exercise 12-2

Write a function called `mostfrequent` that takes a string and prints the letters in decreasing order of frequency. Find text samples from several different languages and see how letter frequency varies between languages. Compare your results with the tables at https://en.wikipedia.org/wiki/Letter_frequencies.

### Exercise 12-3

More anagrams! Write a program that reads a word list from a file (see Reading Word Lists) and prints all the sets of words that are anagrams. Here is an example of what the output might look like:

    ["deltas", "desalt", "lasted", "salted", "slated", "staled"]
    ["retainers", "ternaries"]
    ["generating", "greatening"]
    ["resmelts", "smelters", "termless"]

    	

You might want to build a dictionary that maps from a collection of letters to an array of words that can be spelled with those letters. The question is, how can you represent the collection of letters in a way that can be used as a key?

Modify the previous program so that it prints the longest array of anagrams first, followed by the second longest, and so on.

In Scrabble a “bingo” is when you play all seven tiles in your rack, along with a letter on the board, to form an eight-letter word. What collection of 8 letters forms the most possible bingos?

### Exercise 12-4

Two words form a “metathesis pair” if you can transform one into the other by swapping two letters; for example, “converse” and “conserve”. Write a program that finds all of the metathesis pairs in the dictionary. Do not test all pairs of words, and do not test all possible swaps.

### Exercise 12-5

Here’s another Car Talk Puzzler (https://www.cartalk.com/puzzler/browse):

What is the longest English word, that remains a valid English word, as you remove its letters one at a time? Now, letters can be removed from either end, or the middle, but you cannot rearrange any of the letters. Every time you drop a letter, you wind up with another English word. If you do that, you are eventually going to wind up with one letter and that too is going to be an English word—one that’s found in the dictionary. I want to know what is the longest word and how many letters does it have?

An example: Sprite. Ok? You start off with sprite, you take a letter off, one from the interior of the word, take the r away, and we’re left with the word spite, then we take the e off the end, we are left with spit, we take the s off, we are left with pit, it, and I.

Write a program to find all words that can be reduced in this way, and then find the longest one.

This exercise is a little more challenging than most, so here are some suggestions:

(1) You might want to write a function that takes a word and computes an array of all the words that can be formed by removing one letter. These are the “children” of the word.

(2) Recursively, a word is reducible if any of its children are reducible. As a base case, you can consider the empty string reducible.

(3) The word list I provided, `words.txt`, does not contain single letter words. So you might want to add “I”, “a”, and the empty string.

(4) To improve the performance of your program, you might want to memoize the words that are known to be reducible.