# Computing in the Small
## Week 2
This week we introduce another model for compting.  Last week we conssidered a function which manipulated numbers, the Collatz function, and this week we look at a system that operates on strings.

Post Tag systems were introduced by Emil Post who received a doctorate at Columbia and was now at Princeton on a fellowship.  There was a movement started by the mathematician David Hilbert and later exapanded upon by Bertrand Russell and Alfred North Whitehead to reduce all of mathematics to manipulation of strings without explicit reference to what they mean.  This is not dissimilar to the basics underlying ChatGPT and Large Language MOdels (LLM's).

### Python Strings
Last week we worked with Python's representations of numbers.  This week we concentrate on strings.  The first thing to note is that Python does not care if you use single quotes ' or double quotes " provided you remain consistent.  Going forward we will use double quotes to avoid confusion but you may see people use the other (some languages use the two symbols for different purposes and so do not assume they are interchangeable always).

For our purposes we will need to perform the following operations on a string:

1. Append one string to another.
1. Extract the first character of a string.
1. Determine the length of a string.
1. Return the string after removing the first N characters.

The following cells illustrate these operations!

### Appending Strings
To append strings we use the `+` operator.  This means that `"Hello " + "World"` yields `"Hello World"`.

Also `"1" + "1"` gives `"11"`.

Use the cell below to try for yourself.

### Extracting Characters from Strings
In some ways Python treats strings as a list of characters.  As it does with lists, Python starts counting with 0 instead of 1 (this is popular among many computer languages, although there are a small number that do count from 1).  Therefore if we assign a string to a variable s:

`s = "My string"`

The character `s[0]` gives `"M"`, the character `s[1]` gives `"y"`.  Try to guess what index `s[n]` would give you `"g"` and verify your guess below.  What do you think `s[-1]` would return?  Try it.


### Length of strings
The length of a string is given by the built-in function `len` (actually the same function is used by Python on other structures such as lists).

If `s = "My string"` then `len(s)` returns 9.  Because indexes start at `0` they end at `len(s) - 1`.  Verify in the box below that `s[len(s) - 1]` does, in fact, give the last character in the string.


### Extracting Part of a String
Extracting part of a string is a generalization of getting a specific character.  You can get a range of elements by giving two indicies separated by a colon.  For example:

If `s = "My string"` then `s[3:6]` is `"str"`

You can also leave out one of the two arguments.  If the index before the `:` is ommitted it is assumed ot be 0 (from the beginning) and if the number at the end is ommitted it is assumed to be `len(s) - 1` i.e., the end of the string.  So, for example:

1. `s[:2]` gives `"My"` (the first `2`` characters)
1. `s[3:]` gives `"string"` (the result of dropping `3` characters`)

### Sets
When we analyze the behavior of strings under our rule, it is helpful to detect when we start seeing strings we havee already seen without having to work this out by hand.  Happily Python has a data structure designed to do just this, in the form of a set.  For our purposes we need the following operations:

1. Crate a new set.
1. Add an element to a set.
1. Determine if an element belongs to a set.

#### Creation
To create a new (smpty) set we simply say:

`x = set()`

To add an element to a set we say:

`x.add("new element")`

Note sets can hold strings or numbers (among other data types).

To check for membership we say:

```
s = set()
s.add("x")
if "x" in s:
    print("Found it")
```

### while loops
A `while` loop is similar to a `for` loop except that instead of a fixed number of loop iterations, it will continue until a test condition becomes true.  For example, last week we noted that any even number (greater than 0) eventually becomes odd if you divide by two enough.  Here's an example `while` loop illustrating this.  Feel free to modify to use a different start value.  What happens if we start with an odd number?

In [None]:
n = 128
while n % 2 == 0:
    n = n // 2
    print(n)

## Post Tag System
We now have all the working pieces to describe a Post tag system.  We will first describe the system that first gave POst problems trying to describe.  We start with a string consisting of `0's` and `1's` and update according to the following rules.

1. If the length of the string is less than 3, stop.
1. If the first character is `0` append `00` and remove the first three characters.
1. If the first character is `1` append `1101` and remove the first three characters.

Note that the second rule makes the string shorter and the first rule longer.  Try to work out what happens to these initial strings using pencil and paper:

1. `"101"`
1. `"1111"`

In both cases you should notice that we evenually get into a loop.  How many steps does it take to reach a loop and how big is the loop?

Here is code to perform our rule (notice that we take a shortcut by appending the new string and lopping off the first three characters in the same line):

In [None]:
def post(s):
    if len(s) < 3:
        return ""
    if s[0] == "0":
        return (s + "00")[3:]
    else:
        return (s + "1101")[3:]

Just as we did for the Collatz problem, it is helpful to write a helper function that will run our rule over and over again:

In [None]:
def post_iterated(s, n):
    for i in range(n):
        print(s, " -> ", end="")
        s = post(s)
    print(s)

post_iterated("101", 6)

Looking at the output from this command, we see that after four steps we start repeating `-> 10100 -> 001101 --> 10100 -> ...` which is a loop of period 2.  Perform the same analysis for `post_iterated("1111", 6)` noting you may need to increase the second argument to find the loop.  If we try with the starting string `"11111"` we will eventually hit a loop but it takes more time to find it and it is tedious to try figuring it out by hand.  tedious problems are why we have computers!

Let's modify our program instead of giving it a fixed number of steps, we will run until we see a string we have seen before.  Note that we are guessing that we will find a duplicate and some rules may keep on going forever.

In [None]:
def post_iterated_set(s):
    visited = set()
    while s not in visited:
        visited.add(s)
        print(s, " -> ", end="")
        s = post(s)
    print(s)

post_iterated_set("11111")

### A "Collatz" Post Tag System
As a second example we consider a different system that has the following rules.  Strings in this system consist of strngs having `"a"`, `"b"`, `"c"`:

1. If the length of the string is less than 2, stop.
1. If the first character is `a` append `bc` and remove the first two characters.
1. If the first character is `b` append `a` and remove the first two characters.
1. If the first character is `c` append `aaa` and remove the first two characters.

In [None]:
def post_collatz(s):
    if len(s) < 2:
        return ""
    if s[0] == "a":
        return (s + "bc")[2:]
    elif s[0] == "b":
        return (s + "a")[2:]
    else:
        return (s + "aaa")[2:]

def post_collatz_iterated(s, n):
    for i in range(n):
        print(s, " -> ", end="")
        s = post_collatz(s)
    print(s)

Okay, but why are we calling this "Collatz" and what does it have to do with the `3x + 1` problem?

First we are interested in what happens to strings consisting entirely of `a's`.  Let us first see what happens to strings with an even number of `a's`.`   We will also iterated exactly as many times as the length of the string we are starting with.  E.g.,

1. post_collatz_iterated("aa", 2)
1. post_collatz_iterated("aaaa", 4)
1. post_collatz_iterated("aaaaaa", 6)

In [39]:
post_collatz_iterated("aaaaaa", 6)

aaaaaa  -> aaaabc  -> aabcbc  -> bcbcbc  -> bcbca  -> bcaa  -> aaa


If you play around with even numbers of `a's` you should come to the conlusion that in exactly `len(s)` steps we get another string of `a's` of half the length and there are no intermediate steps that consist of all `a's`.  The first half of the steps replaces the initial string with a string of equal length of the form `bcbc...bc` and the second half of the iterations replaces each pair `bc` with a single `a`.

Half of the Collats rule states that if `n` is even the next number is `n/2.`

What about odd numbers?

Here our new system has a slight tweak from our original Collatz rule.  In the original form of the Collatz rule we mapped all odd numbers `n` to `3n + 1`.  On thing we could have noted is that this always takes an odd number to an even number and therefore the next step in the sequence will always be to divide this new number by `2`.  Therefore we sometimes shorten the Collatz rule by replacing the last rule with:

- If `n` is odd return `(3 * n + 1)//2`

and this turns out to be what our new system computes.  For example this new function maps:

- `3 -> (3 * 3 + 1) // 2 = 5`
- `5 -> (3 * 5 + 1) // 2 = 8`

In [43]:
post_collatz_iterated("aaaaa", 8)

aaaaa  -> aaabc  -> abcbc  -> cbcbc  -> cbcaaa  -> caaaaaa  -> aaaaaaaa  -> aaaaaabc  -> aaaabcbc
