# Conceptual review
* Functions that are simple and abstract can be used in a wide variety of contexts.
* Abstract functions avoid hard-coding specifics wherever possible.
* We can test whether statements are `True` or `False` with Boolean expressions.
* We can use Boolean values and conditionals like `if` to perform specific actions given specific conditions.
* We can use looping constructs like `while` to perform actions repeatedly until a Boolean condition is met.
* We can combine all of these elements to write useful functions.

# Code review

## Boolean expressions
Basically, any statement that returns `True` or `False`.

In [1]:
100 > 1

True

In [3]:
100 == 1 # remember == means "is equal to"

False

In [4]:
100 != 1 # != means "is not equal to"

True

Another important Boolean operator for the work we do in this class is `in`:

In [5]:
beatles = 'we all live in a yellow submarine'

In [6]:
'yellow' in beatles

True

`in` checks whether a particular object wholly exists within another object under certain conditions. So:

In [7]:
'yello' in beatles # because 'yello' appears in 'yellow'

True

In [8]:
'yellowy' in beatles # because not all of those characters appear in sequence

False

Generally a good idea to put parentheses around multiple conditions:

In [9]:
('yellow' and 'submarine') in beatles

True

In [11]:
('yellow' or 'submersible') in beatles

True

We will discuss `in` further later today. For now just know that it is another Boolean operator.

## Conditional expressions
An expression that executes a code block contingent on the `Truth` or `Falsity` of a given statement.

In [12]:
if 100 > 1:
    print('numbers have meaning')

numbers have meaning


In [14]:
if 'apple' != 'apple': # conditionals are always evaluated
    print('words have no meaning') # but the conditional code block is not always executed
else:
    print('words have meaning')

words have meaning


In [15]:
anything = ''

if not anything: # you can use this form to check if variables have been assigned
    print('there isn\'t anything in anything')

there isn't anything in anything


In [18]:
anything = None

In [19]:
type(anything)

NoneType

# `while` loops
`while` allows us to repeatedly check `if` a condition is `True`. If `True`, it re-executes a code block. Once the condition is `False`, the `while` loop ends.

In [20]:
x = 0

while x < 10:
    print(x)
    x += 1

0
1
2
3
4
5
6
7
8
9


In [1]:
s = 'I\'m meltinggggggggggggg'

while len(s) > 0:
    s = s[:-1] # slice off the last character of the string
    print(s)

I'm meltingggggggggggg
I'm meltinggggggggggg
I'm meltingggggggggg
I'm meltinggggggggg
I'm meltingggggggg
I'm meltinggggggg
I'm meltingggggg
I'm meltinggggg
I'm meltingggg
I'm meltinggg
I'm meltingg
I'm melting
I'm meltin
I'm melti
I'm melt
I'm mel
I'm me
I'm m
I'm 
I'm
I'
I



In [245]:
dots = 1

while dots < 10:
    print('.' * dots)
    dots += 1
    
while dots > 0:
    print('.' * dots)
    dots -= 1

.
..
...
....
.....
......
.......
........
.........
..........
.........
........
.......
......
.....
....
...
..
.


# Combining Booleans, Conditionals, and `while` loops to improve our `KWIC` function
We can bring all of these elements together to write a better keywords-in-context function

# Describing our function with pseudocode

Writing "pseudocode" is a useful practice that describes what you *intend* your code to do in natural language. It helps us think about the structure and logic of our program without worrying about the details of programming it.

Think about it like an outline for an essay.

## `kwic` pseudocode

Goal: Given a text already open as a `string`, find every instance of any word. For every instance, `print` the word as well as an arbitray number of characters to the screen.

Steps:
1. Find the location of the first instance of a word in a text.
2. Print that instance to the screen.
3. Use that instance to search for the *next* instance of the same word.
4. Repeat until `find` returns `-1`, indicating that there are no other instances of the word in the text.

In [107]:
text = '/Users/e/Downloads/huck.txt'
text = open(text).read()
text[:100]

"YOU don't know about me without you have read a book by the name of The\nAdventures of Tom Sawyer; bu"

In [108]:
def find_next(word, text, loc = 0):
    return text.lower().find(word, loc) # this returns the character position of the next instance

In [114]:
def kwic(loc, text, window = 300):
    mn = 0
    mx = len(text)
    start = loc - window
    stop = loc + window
    
    if start < mn:
        start = mn
        stop = window
    if stop > mx:
        start = mx - window
        stop = mx
        
    return text[start:stop]

In [None]:
get_kwics('huck', text)

As written above, the function `get_kwics` calls our other two functions `find_next` and `kwic` to automatically get the *first* instance of any word.

This is often a good programming problem-solving strategy: figure out how to do it once, then figure out how to do it many times.

Now, we need to repeatedly execute this set of steps. We can do so with a `while` loop, which we can run inside our function `get_kwics`:

In [115]:
def get_kwics(word, text):
    while loc != -1:
        print('word:', word)
        loc = find_next(word, text) # note that there I call my custom function find_next()
        print('loc:', loc)
        print('kwic:', kwic(loc, text))

In [116]:
def get_kwics(word, text, loc = 0):
    while loc != -1:
        loc = find_next(word, text, loc + 1)
        print('word:', word)
        print('loc:', loc)
        print('kwic:')
        print(kwic(loc, text))
        print('-'*50)

In [117]:
get_kwics('hunch', text)

word: hunch
loc: 318015
kwic:
up like glory, she was so glad her uncles
was come. The king he spread his arms, and Mary Jane she jumped for
them, and the hare-lip jumped for the duke, and there they had it!
 Everybody most, leastways women, cried for joy to see them meet again
at last and have such good times.

Then the king he hunched the duke private--I see him do it--and then he
looked around and see the coffin, over in the corner on two chairs; so
then him and the duke, with a hand across each other's shoulder, and
t'other hand to their eyes, walked slow and solemn over there, everybody
dropping back to give them room,
--------------------------------------------------
word: hunch
loc: 457081
kwic:
'd been a-going to do.
 So Tom says:

“What's the vittles for?  Going to feed the dogs?”

The nigger kind of smiled around gradually over his face, like when you
heave a brickbat in a mud-puddle, and he says:

“Yes, Mars Sid, A dog.  Cur'us dog, too.  Does you want to go en look at
'im?”

There are a lot of ways we could improve this function. (We're going to do one for homework!)

# Big take-aways
1. Functions that are simple and abstract can be used in many contexts. 
2. Never do the same thing twice. The computer can do that for you.
3. Pseudocode helps you plan out your code.
4. Functions can call each other.
5. `while` loops and `if` / `else` pairs help you take specific actions given specific conditions in your functions.

# Lists

Now we'll learn:

* What **lists** are
* How to create lists
* How to find things in lists
* How to edit lists
* How to add/remove things from lists

Why do lists matter for us? Fundamentally, they allow us to store any *ordered collection of objects* in Python. This will be everything from lists of numbers, to lists of words, to lists of texts.

This is the fundamental structure of the list:

```python
['item 0', 'item 1', 'item 2']
```

Lists are surrounded by square brackets `[]`, with individual items in the list separated by `,` characters. You can put pretty much anything into a list.

We can assign lists to any variable:

In [38]:
l = ['item 0', 'item 1', 'item 2']

And we can access list items using the same bracket notation that we use for slicing:

In [39]:
l[0]

'item 0'

Slicing a list for multiple items gives you a list:

In [40]:
l[1:3]

['item 1', 'item 2']

In [41]:
len('the last letter is y')

20

In [42]:
l[-1]

'item 2'

Lists make it easy for us to count by word rather than by character:

In [51]:
len('This sentence has five words')

28

In [52]:
len(['This', 'sentence', 'has', 'five', 'words'])

5

Lists can include any kind of data:

In [53]:
['this is still a list', 32, 'another number']

['this is still a list', 32, 'another number']

### Lists in lists
Lists can even include other lists! We can use this to store tabular data, like matrices.

In [260]:
l = ['this is the 0th element of a list', ['this', 'is', 'another', 'list', 'in', 'a', 'list']]

In [261]:
l[0]

'this is the 0th element of a list'

In [262]:
l[1]

['this', 'is', 'another', 'list', 'in', 'a', 'list']

In [263]:
l

['this is the 0th element of a list',
 ['this', 'is', 'another', 'list', 'in', 'a', 'list']]

# Finding out where items are in a list

In [46]:
l = ['a', 'b', 'kjhc' , 'b', 'uyfkyc', 'c']

In [47]:
l.index('c')

5

`index` works much the same as `find`. The second argument represents the index to begin searching from.

In [62]:
l.index('c', 3)

4

# Editing lists

Lists are *mutable* meaning that they can be changed in place. This makes them unlike strings and integers, which are *immutable*.

In [48]:
l

['a', 'b', 'kjhc', 'b', 'uyfkyc', 'c']

In [49]:
l[0]

'a'

In [50]:
l[0] = 'something completely different'

Read the above notation as: `l` at index `0` takes the value of `'something completely different'`.

In [51]:
l

['something completely different', 'b', 'kjhc', 'b', 'uyfkyc', 'c']

### Adding new things to lists
We can also add new things to lists using `append`:

In [57]:
l.append('the last thing')

In [58]:
l

['something completely different',
 'b',
 'kjhc',
 'b',
 'uyfkyc',
 'c',
 'the last thing']

Sometimes you'll want to add *multiple* items to a list. We can do this using `extend`:

In [59]:
m = [1, 2, 3]

In [60]:
l.extend(m)

In [61]:
l

['something completely different',
 'b',
 'kjhc',
 'b',
 'uyfkyc',
 'c',
 'the last thing',
 1,
 2,
 3]

Look how `append` behaves by contrast:

In [62]:
l = [1, 2, 3]
m = ['a', 'b', 'c']
l.append(m)
l

[1, 2, 3, ['a', 'b', 'c']]

We get a list *inside* of a list because `append` adds *the whole object* to the end of the list.

By contrast, `extend` adds each of the object's elements to the end of the list in order:

In [264]:
piggy = ['this','little','piggy']
market = ['went','to','market']
piggy.extend(market)
piggy

['this', 'little', 'piggy', 'went', 'to', 'market']

You can accomplish the same thing with `+`:

In [87]:
aretha = ['R', 'E', 'S', 'P']
franklin = ['E','C','T']
aretha + franklin

['R', 'E', 'S', 'P', 'E', 'C', 'T']

# Removing things from lists
You can remove list elements by name:

In [63]:
l = ['one', 'does', 'not', 'donuts', 'belong']

In [64]:
l.remove('donuts')

In [65]:
l

['one', 'does', 'not', 'belong']

But `remove` only targets the first instance it sees:

In [66]:
l = ['good', 'bad', 'bad']
l.remove('bad')
print(l)
l.remove('bad')
print(l)

['good', 'bad']
['good']


Of course we could rewrite this with a `while` loop:

In [68]:
l = ['good', 'bad', 'bad', 'good']

In [69]:
'bad' in l

True

In [70]:
while 'bad' in l:
    l.remove('bad')

In [71]:
l

['good', 'good']

### Removing list elements by index
You can also remove list elements by index with `del`:

In [72]:
planets = ['earth', 'venus', 'pluto']
planets.index('pluto')

2

In [270]:
del planets[2] # this works because lists are mutable
planets

['earth', 'venus']

An equivalent way of doing the above:

In [186]:
planets = ['earth', 'venus', 'pluto']
del planets[ planets.index('venus')] # perform the calculation here instead
planets

['earth', 'pluto']

# Generating lists of words from strings
Lists help us measure a fundamental unit for text analysis, the word. We can `split` strings into words like so:

In [81]:
biggie = 'it was all a dream'

In [82]:
biggie2 = biggie.split(' ')

In [83]:
biggie

'it was all a dream'

In [85]:
biggie2.remove('dream')

In [86]:
biggie2

['it', 'was', 'all', 'a']

In the `split` command, the first argument is the *character* you want Python to split by.

You can split by *any* character or character sequence:

In [74]:
biggie = 'it was all a dream / i used to read word up magazine'

In [274]:
biggie_lines = biggie.split(' / ')
biggie_lines

['it was all a dream', 'i used to read word up magazine']

The pattern you `split` by is known as the `sep` for separator. Splitting by space is a good first approximation for words.

## Counting words
One obvious benefit of `split` is that it allows us to easily count *words* rather than characters:

In [87]:
huck = open('/Users/e/Downloads/huck.txt').read()

In [88]:
huck_list = huck.split(' ')

In [89]:
len(huck_list)

101012

I'm sure you can imagine how `split` would be useful for things like our `kwic` function: we could request whole words rather than just specific character counts!

# Dealing with `split`
Let's look at what we get:

In [90]:
huck_list[:20]

['YOU',
 "don't",
 'know',
 'about',
 'me',
 'without',
 'you',
 'have',
 'read',
 'a',
 'book',
 'by',
 'the',
 'name',
 'of',
 'The\nAdventures',
 'of',
 'Tom',
 'Sawyer;',
 'but']

Not bad! Most of these things are words, but we should clean a couple of them up.

We have at least two problems here: our old friend `\n` and a new problem. Humans would count `Sawyer;` as an instance of `Sawyer`, but computers wouldn't. We need to figure out how to separate our punctuation from the words that it is right next to so that they will be counted correctly.

Let's try writing a quick text cleaning function to see how much better we can do:

In [279]:
def text_list(text):
    text = text.lower()
    text = text.replace('\n', ' ') # replacing \n with a space since we are splitting by space
    # this puts a space on either side of common punctuation:
    text = text.replace(',', ' , ').replace('.', ' . ').replace('?', ' ? ').replace('!', ' ! ').replace(';', ' ; ')
    return text.split(' ')

In [280]:
test = text_list(huck)

In [92]:
l = ['yes', 'no', 'maybe']
l[1:]

['no', 'maybe']

In [281]:
test[:20]

['you',
 "don't",
 'know',
 'about',
 'me',
 'without',
 'you',
 'have',
 'read',
 'a',
 'book',
 'by',
 'the',
 'name',
 'of',
 'the',
 'adventures',
 'of',
 'tom',
 'sawyer']

Looks better!

There are still a few things that could trip our function up. One of the big ones has to do with extra spacing:

In [282]:
s = 'this  string   has      many   extra    spaces'

In [283]:
text_list(s)

['this',
 '',
 'string',
 '',
 '',
 'has',
 '',
 '',
 '',
 '',
 '',
 'many',
 '',
 '',
 'extra',
 '',
 '',
 '',
 'spaces']

There are a lot of ways we could address this problem.

One way to do it using only what we already know might be the following:

In [284]:
# checks to see if there are two consecutive space characters anywhere in the string.
# while True, it replaces all of them with a single space
while '  ' in s:
    s = s.replace('  ', ' ')

In [285]:
text_list(s)

['this', 'string', 'has', 'many', 'extra', 'spaces']

We could also do this using what we just learned about `indexing` with a `while` loop and an `in` condition:

In [286]:
s = 'this  string   has      many   extra    spaces'
l = text_list(s)
print(l)

['this', '', 'string', '', '', 'has', '', '', '', '', '', 'many', '', '', 'extra', '', '', '', 'spaces']


In [287]:
while '' in l:
    del l[l.index('')]

l

['this', 'string', 'has', 'many', 'extra', 'spaces']

But it would be more common to approach this problem using another common Python structure: the `for` loop.

# `for` loops
`for` loops are very similar to `while` loops.

Whereas `while` loops execute until its condition changes from `True` to `False`, `for` loops execute on *every element in a series*. We might read them like this:

```python
for each_element in my_object:
    # do something
```

Like `while` loops, `for` loops execute all of their code, then return to the top to check their condition. If there is another element in the series, the `for` loop begins again. Once all of the elements have been exhausted, the `for` loop ends.


Lists are one object type where `for` loops are especially useful. They look like this in combination:

In [100]:
dickinson = 'After great pain, a formal feeling comes –'
dickinson_list = dickinson.split(' ')
dickinson_list

['After', 'great', 'pain,', 'a', 'formal', 'feeling', 'comes', '–']

In [101]:
for word in dickinson_list:
    print(word)

After
great
pain,
a
formal
feeling
comes
–


Each of those lines repesents one full execution of the code inside of the loop (`print(word)`) as you can see here:

In [102]:
counter = 0

for word in dickinson_list:
    print(counter, word)
    counter += 1

0 After
1 great
2 pain,
3 a
4 formal
5 feeling
6 comes
7 –


One tricky thing about `for` loops is that the name you give to the iterator (`word` above) is arbitrary. Above, I chose a name that makes semantic sense, but there's no reason that you can't use the same iterator name for every `for` loop if it helps you remember the structure.

It's common to use `x` in `for` loops and other short functions. As you can see, it does not change the output:

In [103]:
for x in dickinson_list:
    print(x)

After
great
pain,
a
formal
feeling
comes
–


Note that `for` loops use the same indentation logic as `while` loops and functions. You indent under the `for` loop to choose code to execute.

`for` loops can be run on any object with a length. So strings work, too:

In [104]:
len(dickinson)

42

In [105]:
for character in dickinson:
    print(character)

A
f
t
e
r
 
g
r
e
a
t
 
p
a
i
n
,
 
a
 
f
o
r
m
a
l
 
f
e
e
l
i
n
g
 
c
o
m
e
s
 
–


This shows that `for` loops work on *each element* in the given series. Since strings measure by character, the `for` loop executes one character at a time.

That's why they're so useful for us when we're working with lists: we can perform actions one *word* at a time.

In [302]:
dickinson_list

['After', 'great', 'pain,', 'a', 'formal', 'feeling', 'comes', '–']

In [106]:
counter = 0

for word in dickinson_list:
    if counter < 3:
        print(word.upper())
    if counter == 3:
        print(word * 20)
    if counter > 3:
        print(word[::-1]) # this is a little slicing trick to reverse a string
    counter += 1

AFTER
GREAT
PAIN,
aaaaaaaaaaaaaaaaaaaa
lamrof
gnileef
semoc
–


Of course, `for` loops are very useful when we want to perform the same math operation on a list of numbers:

In [233]:
numbers = [23, 23401, 2, 7564, 394]

for number in numbers:
    print(number * 100)

2300
2340100
200
756400
39400


Usually we will not want to `print` our results to the terminal like this. Instead, we will want to add our results to another list to perform further calculations on.

To do this in Python, we need to initialize an empty list, and `append` values to it as our operations are performed:

In [305]:
input_data = [1, 0, 0, 1, 1, 1, 1, 0]
output_list = [] # initialize empty list outside of the for loop

for number in input_data:
    output_list.append(number * 3)
    
print(output_list)

[3, 0, 0, 3, 3, 3, 3, 0]


Here's another example of the same principle:

In [306]:
my_input = 'This is going to turn into a list of characters'
my_output = [] # initialize empty list to append to

for character in my_input:
    my_output.append(character)

print(my_output)

['T', 'h', 'i', 's', ' ', 'i', 's', ' ', 'g', 'o', 'i', 'n', 'g', ' ', 't', 'o', ' ', 't', 'u', 'r', 'n', ' ', 'i', 'n', 't', 'o', ' ', 'a', ' ', 'l', 'i', 's', 't', ' ', 'o', 'f', ' ', 'c', 'h', 'a', 'r', 'a', 'c', 't', 'e', 'r', 's']
