### Doing Things Again (and Again)

The theme for today's class is repeating things. Or, rather, it's getting started with making python repeat things. At its most basic level, most of the revolution in computation over the last 100 years, from adding machines and mechanical calculators onwards, has been about making computers do simple things over and over again, faster and faster. Making use of this power consists mostly of figuring out how to reduce complicated tasks down to simple ones. Modern computers provide many levels of **abstraction** to help with this, by hiding the details of how things work. Abstraction is a bottom-up view of the same hierarchy of complexity: computers, for example, deal in binary, but in python, we get to use base 10. Or, more relevant to this course, we get to deal with texts, and rely on Unicode to convert letters and symbols into numbers, which in turn get converted into 1s and 0s and so on.

Python is a high level language, which means we only have to break things down so far (no 1s and 0s needed), and we get a lot of tools to help with the job. One caveat, though: if we want to make python do things over and over (and we do—have you ever counted the words in a novel by hand?\*) we're going to have to keep track of them, which means we have to learn to count like python.

\* Terrifyingly, people used to do this for Shakespeare plays to try to figure out who wrote them. It's actually (arguably) the root of all of DH

## 0-based Indexing

This might be one of the things you already knew about python, or about computers in general (it's actually not true of all programming langauges, though): counting starts with 0. It's weird and it's unintuitive and sometimes weird, but that's the way it is.

Consider the following string:

`H E R E   A R E   S O M E   L E T T E R S`
`0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0`

So letter one is 'E', which isn't too bad, except that while the last letter in the string is letter 20, try running this next cell:


In [15]:
str = "HERE ARE SOME LETTERS"
len(str)

21

If this annoys you as much as it should, it may or may not help to know that the laternative would be worse, since there are, in fact, 21 characters and it would suck to lose one of them. 

Incidentally, there's a word for what happens when you get this mixed-up: **off-by-one** error. It'll happen *all the time* in this class, and you should be okay with that. Sometimes, it's actually easiest to wing it and fix the results, rather than figure out where you need to be subtracting or adding one.

Now that we know how to count letters, though, we can take advantage of this to pull apart our string. This is called **string slicing** though it also works on a few other things (notably, lists, which we'll get to in a bit)

In [19]:
print(str)
str_prefix = str[0:6]
print(str_prefix)
letter_7 = str[7]
print(letter_7)
str_mid = str[5:15]
print(str_mid)

bad_str = "evil_word"
print(bad_str[0:9])

HERE ARE SOME LETTERS
HERE A
E
ARE SOME L
evil_word


In [7]:
# Incidentally, you don't need to store the results in a variable:
print(str[10:21])
print(str)

OME LETTERS
HERE ARE SOME LETTERS


As you can see, slicing up the string doesn't actually change it. If you want to keep your slice, you need to save it somewhere. This is because strings in python are what's called *immutable*: once you've said what `str` is, you can't change the thing. Imagine it like this: your string is some words. When you assign them to `str`, you put them in a box and label it `str`. Anytime you want, you can look in the box, count the letters, etc. And you can copy them to another box, or only a set of them or whatever. *BUT*, you can't change what's in the box. The only thing you can do is take the `str` label and stick it on another box. 

You might think this is sort of restrictive, but it's actually not, because we can actually fake a change to str really easily:

In [12]:
s = "new string"
print(str)
s =  str[0:3]
print(str)

new string
new


If this seems weird and out of bounds to you, it may help to know that it works because everything on the right of the `=` is *evaluated* before anything is assigned, so by the time the assignment happens, there's no refernce to `str` left (the box has been opened up, the letters looked at and the slice calculated) and there's nothing circular about it.

## Lists

Ultimately we're aiming at the idea of the loop here, where we can use python to do something to each of a number of elements in turn. But while it's possible to loop over strings (and we will in this course!), it's helpful to introduce another python construction first: the list.

The list is the first of Python's two major data structures, which basically means it's a way of organizing other things, which could be integers, strings, or anything else we can think of (even other lists).

In the way we interact with it, the list is a lot like a string: it puts each of it's elements in order, and lets us access them by their index number (which, like for strings, starts from 0). Python's lists are enclosed in square brackets, and the items are separated by commas, so we have `word_list = ["these", "are", "words", "in", "a", "list"]`, and `word_list[1]` is `"are"`, `word_list[0:2]` is `["these", "are", "words"]`, etc


In [3]:
word_list = ["these", "are", "words", "in", "a", "list"]
print(f"The word list contains {len(word_list)} elements")

The word list contains 6 elements


In [4]:
# use a slice to create a new list that contains the first half of the list (from zero to halfway). 
# you should use integer division: // (this list has 6 elements, but you don't want to write something where you're asking for teh 8.5th element in the list)
new_list = word_list[]

SyntaxError: invalid syntax (<ipython-input-4-ec1c4e164e31>, line 3)

### List notes
- If you have experience coding in a non-python environment, the list works much more like a dynamic array or vector than a linked-list. 
- For... reasons, you can store elements of multiple types in lists. You may occasionally use this as a way to bundle together information, e.g., \["string", word_count] but it's not a common use-case

## `for` loops

Part of the value of lists is that we can use them as structures to organize repetition, which we accomplish with **loops**. A loop is a way to say "do x over and over until y". In python, the simplest loops are loops over structures like lists or strings, where we can write a loop that does something to each of the elements in turn. 

For example: 

In [6]:
for word in word_list:
    print(word)

these
are
words
in
a
list


Okay, let's take that apart a bit. 

First, the syntax: we write for \<thing> in \<structure> followed by a colon, and then indent the contents of the loop underneath (like an if-statement). 

Second, the /<thing>: this is actually a way to assign a variable, so we can call /<word> anything we want. Python doesn't give us words because it sees what we've asked for; instead it just gives us whatever the constituent unit of the thing we're looping over is. For a list, that's each element in the list, like this monstrosity here:

In [7]:
nums = [1, 2, 3, 4, 5]
for word in nums:
    print(word)

1
2
3
4
5


Incidentally, what do you think will happen if we do this:

In [8]:
ex_string = "This is a short-ish string"
for elem in ex_string:
    print(elem)

T
h
i
s
 
i
s
 
a
 
s
h
o
r
t
-
i
s
h
 
s
t
r
i
n
g


Was that what you expected? Maybe, maybe not. Incidentally, you might be offended that if we store, say, a line of poetry in a list, it's really ugly when we print it: `['The', 'book', 'of', 'my', 'enemy', 'has', 'been', 'remaindered']` ([Excellent poem, by the way](https://web.cs.dal.ca/~johnston/poetry/bookofmyenemy.html])) We can fix this by using our new list strucutre to turn the list into a string. We'll start with an empty string, and then for each word in our list, we'll add it to the string (remember how we learned last time that when we "change" a string, we actually make a new string, though!)



In [16]:
ls = ['The', 'book', 'of', 'my', 'enemy', 'has', 'been', 'remaindered']

result = ""
for word in ls:
    result = result + word

Try adding a line to print the result---does it match your dreams of beautifully formatted poetry (in monospace type)? Yeah, mine neither. Not all of our code in the loop has to refer to anything to do with the loop, though. So, each time we run through the loop to add another word to `result`, we can also add other thigns to tidy it up, too. Try changing the loop to clean up the output. You can either add another line to add extra chracters to result, or you can take advantage of the way that string get added together and do it all with just one line: `result = result + word + [other stuff]`

That's fine and good, but unless you get quite creative (we'll get creative later in this quarter) it's hard to get all the spacing right (you usually end up with extra spaces at the begining or the end of the string). 

Instead, we can make python do most of our work for us, using two string-related functions that give us ways to convert from strings to lists and back. Both of these functions are what are called "methods", which as far as we're concerned just means that they look a bit different when we use them. instead of writing `function(object)` like we do with `len()` or `print()`, with methods, we write `object.method()` In general, methods are more specific to the kind of thing they're acting on: you can take the length of lots of things, but turning a string to lowercase is a method in part because you can really only do that to strings. Let's start with our two string<->list methods:

In [25]:
ls = ['The', 'book', 'of', 'my', 'enemy', 'has', 'been', 'remaindered'] # by the way, list is python's word for list class of entity, so it's better not to call your list "list" (same with "str")
s = "And I am glad"
print(ls)
print(s)

str_from_ls = ''.join(ls)
ls_from_str = s.split()
print(str_from_ls)
print(ls_from_str)

['The', 'book', 'of', 'my', 'enemy', 'has', 'been', 'remaindered']
And I am glad
Thebookofmyenemyhasbeenremaindered
['And', 'I', 'am', 'glad']


Okay, so several things here. First of all, `string.split()` probably worked the way we watned. It breaks the string on all occurences of whitespace (including `\n`, by the way) and add each element to a list. If you want, you can pass a string inside the parentheses, so that the string gets broken on, say `'.!?'`, like in the example below.

Now, `str.join()`. This one... makes less sense. Sorry about that. `join` is just kind of awkward. Would it make more sense to be `<thing we join>.join('separator')`? Yes and no. Yes because that's how a normal person would expect it to work; no because the thing we want to join is probably not a string, and this is a string method. You win some, you lose some. Now, in terms of actually fixing our new string, the empty string I passed in as a separator is probably not what we wanted. If we change it to `' '` or something else useful, we should have proper results. 

In [30]:
long_str = ("The book of my enemy has been remaindered\n"\
            "And I am pleased.\n"\
            "In vast quantities it has been remaindered\n"\
            "Like a van-load of counterfeit that has been seized\n"\
            "And sits in piles in a police warehouse,\n"\
            "My enemy's much-prized effort sits in piles\n"\
            "In the kind of bookshop where remaindering occurs.\n"\
            "Great, square stacks of rejected books and, between them, aisles\n"\
            "One passes down reflecting on life's vanities,\n"\
            "Pausing to remember all those thoughtful reviews\n"\
            "Lavished to no avail upon one's enemy's book --\n"\
            "For behold, here is that book\n"\
            "Among these ranks and banks of duds,\n"\
            "These ponderous and seeminly irreducible cairns\n"\
            "Of complete stiffs.")
print(long_str)
long_ls = long_str.split('\n')
print(long_ls)

The book of my enemy has been remaindered
And I am pleased.
In vast quantities it has been remaindered
Like a van-load of counterfeit that has been seized
And sits in piles in a police warehouse,
My enemy's much-prized effort sits in piles
In the kind of bookshop where remaindering occurs.
Great, square stacks of rejected books and, between them, aisles
One passes down reflecting on life's vanities,
Pausing to remember all those thoughtful reviews
Lavished to no avail upon one's enemy's book --
For behold, here is that book
Among these ranks and banks of duds,
These ponderous and seeminly irreducible cairns
Of complete stiffs.
['The book of my enemy has been remaindered', 'And I am pleased.', 'In vast quantities it has been remaindered', 'Like a van-load of counterfeit that has been seized', 'And sits in piles in a police warehouse,', "My enemy's much-prized effort sits in piles", 'In the kind of bookshop where remaindering occurs.', 'Great, square stacks of rejected books and, between

(In general you should avoid directly writing long strings into your code, but if you have to, take note: if you open parentheses and put a backslash at the end of the line, python treats the next line as a part of the first line—and recall that `'str' 'ing'` is `'string'` to python)

## Other string methods



In [34]:
string = ""

string.upper() # returns an ALL-CAPS version of the string
string.lower() # more useful than the former; check out str.casefold.() if working with non-English Unicode

string.isupper() # these rteturn true if all chars in the string to be upper lower - most useful for checking single chars
string.islower()

string.isalpha() # these methods return true if all of the characters in the string are alphabetic, alphanumberic, etc. All work logically with Unicode
string.isalnum()
string.isdigit()
string.isdecimal() # "anything that can be used to form numbers in base 10", according to the documentation
string.isspace() # checks for all whitespace, not just ' '; string must not be length 0

string = 'Great, square stacks of rejected books'
result = ''
for char in string:
    pass # write code here to check if the letter is uppercase. if it is, add the lowercase version to result; otherwise, add something else (whatever you like)
print(result)


