In [7]:
from __future__ import print_function
from functools import reduce
from operator import add

# Python Exercises

The purpose of this notebook is to get some practice with the special python concepts that are useful for writing Spark applications. Here we will do them in the (safe) controlled environment of a simple python shell for simpler debugging -- later on we will use these same constructs in the Spark framework. 

### How to use this notebook
You can (and should) execute all the cells in this notebook. Where your input is required, you will see `<FILL>` in the source code with some instructions. Replace those with working code, execute, debug, rinse, repeat. 

## 0. Data types
This is a quick primer on various high-level python data types that we will be using. If you are not familiar with these at least superficially already, we recommend you first find a python tutorial of some sort. 

### Lists

A list is just that -- an ordered collection of objects. You can put just about any python object into a list. Most likely, we will be dealing with lists of arrays, lists of tuples, lists of dictionaries, etc. A list is really an essential building block!

In [3]:
# a list of integers
my_list = [1,2,3,4]

The methods of a `list` object are not many, but they are quite useful. 

add an empty cell below this one, and type 

```
my_list.
```

followed by tapping "tab" to see the list of methods. 

In [4]:
# TODO: append the value 1 to my_list, making it [1,2,3,4,1]
<FILL>
assert(len(my_list) == 5 and sum(my_list) == 11)

SyntaxError: invalid syntax (<ipython-input-4-012e77f3dcab>, line 2)

#### Indexing 
List indexing tricks will be essential to your experience with any sort of application in Spark. Here are a few of the most common ones: 

In [None]:
# slices --> getting consecutive items from a list
my_list[2:5]

In [None]:
# all elements from the first to the third
my_list[:3] # zero is implied

In [None]:
# reversing 
my_list[::-1]

In [None]:
# skipping elements --> here, getting every other one
my_list[::2]

In [None]:
# getting the second to last element
my_list[-2]

In [None]:
# TODO: make a new_list composed of all elements from my_list except for the first and last one
new_list = <FILL>
assert(new_list == [2,3,4])

In [None]:
# combining lists 
my_list + new_list

### Dictionaries
Certainly one of the most useful built-in data structures in python. A dictionary provides a mapping between "keys" and "values". Best to look at some examples. 

In [None]:
# creating a dictionary
d = {} # makes an empty dictionary
type(d) 

In [None]:
# add an element
d['first'] = 1
d['second'] = 2
d

In [None]:
d['first'], d['second']

In [None]:
# iterate through all the keys of the dictionary "d" and print out its values
for <FILL> : 
    print(<FILL>)

Dictionaries have some very useful methods: 

In [None]:
# to see all the currently stored keys or values
print(d.keys())
print(d.values())

In [None]:
# to iterate through all the keys and values
for key,value in d.iteritems() : 
    print(key, value)

In [None]:
# alternative way of initializing a dictionary: 
d = {'first': 1, 'second': 2}
print(d)
d['third'] = 3
print(d)

**Note:** you cannot trust that the keys from a dictionary are returned in the order they are entered!

### Strings
Not a complex data type really, but strings are objects like all other things in python, and have some nice properties. 

In [None]:
# they can be indexed like any other collection
string = 'what is going on here'
print('second to seventh characteres: "%s"'%string[2:7])
print('last character: "%s"'%string[-1])

... you get the idea... 

In [None]:
# getting words from a string
string2 = 'one,two,three,four'
print(string.split())    # default splits on whitespaces
print(string2.split(',')) # but you can specify any delimiter you want

##  `map`

Use the python `map` function to convert the first and last letters of each word in the string `test_string` to uppercase.

*hint*: use the standard string method `split` to create a list of words; then use a `map` to convert the appropriate letters of each word

*hint \#2:* Use `Edit -> Split Cell` to create easily-executable code chunks that you can debug. When they all run individually, you can merge them back together.

In [None]:
# From Sun Tzu's Art of War
test_string = 'The supreme art of war is to subdue the enemy without fighting.'

words = <FILL>

def first_last_capitalize(word) : 
    # first convert the string `word` to a list of characters
    l = <FILL>
    
    # now change the first and last character to uppercase (use the upper() method of a string)
    l[0] = <FILL>
    l[-1] = <FILL>
    
    # convert back to a string
    return str("".join(l))

upper_lower = <FILL>

result = " ".join(upper_lower)
print(result)
assert(result == 'ThE SupremE ArT OF WaR IS TO SubduE ThE EnemY WithouT Fighting.')

## 2. List comprehension and tuples

Use a list comprehension to convert the list of words into a list of tuples, where the first element of the tuple is the word and the second element is the word length.


*hint:* use the python built-in len() function to get the string length

In [None]:
word_length = <FILL>

In [None]:
print(word_length)
assert(word_length == [('The', 3),
 ('supreme', 7),
 ('art', 3),
 ('of', 2),
 ('war', 3),
 ('is', 2),
 ('to', 2),
 ('subdue', 6),
 ('the', 3),
 ('enemy', 5),
 ('without', 7),
 ('fighting.', 9)])

## 3. `reduce`

Compute the average word length in the sentence by: 

1. mapping the `word_length` list from above to contain just the word lengths
2. using `reduce` to sum up the lengths
3. dividing by the total number of words

In [None]:
word_counts = <FILL>

In [None]:
total_chars = <FILL>

In [None]:
import numpy as np
print(float(total_chars)/len(words))
assert(np.allclose(float(total_chars)/len(words),4.33333333333))

## 4. Generators

Write a generator that returns the words with an even number of characters into the list `result`. At least two possible solutions! 

In [None]:
result = <FILL>

In [None]:
assert(list(result) == ['of', 'is', 'to', 'subdue'])