# CPS600 - Python Programming for Finance 
###  
<img src="https://www.syracuse.edu/wp-content/themes/g6-carbon/img/syracuse-university-seal.svg?ver=6.3.9" style="width: 200px;"/>

# More Python

###  September 4, 2018

## Strings

### More string methods.

In [1]:
t = 'this is a string object'
t.capitalize()

'This is a string object'

In [2]:
t.split()

['this', 'is', 'a', 'string', 'object']

In [3]:
t.find('string')

10

In [4]:
t.find('Python')

-1

In [6]:
t.replace(' ','|')

'this|is|a|string|object'

In [7]:
'http://www.python.org'.strip('htp:/')

'www.python.org'

### It is often convenient to have template strings into which different values can be substituted (e.g. when prompting/responding to user activity in a program). Here is the old way of doing that, in a few examples:

In [8]:
brand = 'Apple'
exchange_rate = 1.31123245
message = 'The price of this %s laptop is %d USD and the exchange rate is %4.2f' % \
    (brand, 1399, exchange_rate)
print(message)

The price of this Apple laptop is 1399 USD and the exchange rate is 1.31


### Note the digits in '%4.2f'. That indicates a `float` with four digits in total and 2 behind the decimal point

### And here is the new way of doing it:

In [83]:
brand = 'Apple'
price = 1399
exchange_rate = 1.31123245
message = ("The price of this {0:s} laptop"
            "is {1:d} USD and the exchange"
            "rate is {2:4.2f}").format(brand, price, exchange_rate)
print(message)

The price of this Apple laptopis 1399 USD and the exchangerate is 1.31


### Note that we defined a long string across multiple lines. It's not obvious how to do that. Doing it this way, we don't get any endlines or extra space.

In [84]:
a_long_string = ("I am a very"
                "very very"
                "very long string"
                "and now I have stopped.")

In [None]:
a_long_string

## Input

In [None]:
firstName = input("What is your first name?")
lastName = input("What is your last name?")
salary = input("What is your desired salary")

### What if we want that last input to be a numeric? Then we'll have to *cast* it as such.

## Type Casting

### In python, objects generally have *types*. Types can overlap quite a lot, and sometimes you want to be explicit about how you are treating a given object

In [None]:
int(3.14)

In [None]:
int('1234')

In [None]:
float(7)

In [None]:
float("21.742")

In [None]:
str(21.762)

### Note that even these numerics are objects and come with their own methods. For example:

In [None]:
c = 0.5324242
c.as_integer_ratio()

## The `decimal` module

### In financial applications we will sometimes need more precision in floating point values than is used by default in Python.

In [9]:
import decimal
from decimal import Decimal

In [10]:
decimal.getcontext()

Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-999999, Emax=999999, capitals=1, clamp=0, flags=[], traps=[InvalidOperation, DivisionByZero, Overflow])

In [11]:
d = Decimal(1) / Decimal(11)
d

Decimal('0.09090909090909090909090909091')

### We can change the precision as follows

In [12]:
decimal.getcontext().prec = 4 # Lowering precision

In [13]:
e = Decimal(1) / Decimal(11)
e

Decimal('0.09091')

In [14]:
decimal.getcontext().prec = 50 # Raising precision

In [15]:
f = Decimal(1) / Decimal(11)
f

Decimal('0.090909090909090909090909090909090909090909090909091')

In [16]:
g = d + e + f
g

Decimal('0.27272818181818181818181818181909090909090909090909')

### Datetimes & REGEX

In [17]:
import re
series = """
    '01/18/2014 13:00:00', 100, '1st';
    '01/18/2014 13:30:00', 110, '2nd';
    '01/18/2014 14:00:00', 120, '3rd'
    """

### Below we create a regular expression object.

In [18]:
dt = re.compile("'[0-9/:\s]+'")

### Next, we use the `findall` method of that object to search the `series` string we defined previously.

In [19]:
result = dt.findall(series)
result

["'01/18/2014 13:00:00'", "'01/18/2014 13:30:00'", "'01/18/2014 14:00:00'"]

### You'll really want to refer to the [documentation](https://docs.python.org/2/library/re.html) when using regular expressions for anything much harder.

### The results are just strings, but we can convert them to `datetime` objects.

In [None]:
from datetime import datetime
pydt = datetime.strptime(result[0].replace("'",""),
                        '%m/%d/%Y %H:%M:%S')
pydt

In [None]:
print(pydt.second,pydt.minute,pydt.hour)

In [None]:
print(pydt)

In [None]:
print(type(pydt))

# Lists 

### Lists are really, really important. They are an incredibly powerful data structure and you can *literally do anything (in computing) with them*. That is a precisely true statement, believe it or not.

In [21]:
import numpy as np
A=2


In [26]:
martinsList = ['apple',A, 3.14, np.pi, 42, lambda y: lambda x: x*y]
martinsList

['apple', 2, 3.14, 3.141592653589793, 42, <function __main__.<lambda>(y)>]

### Where to begin? Some basic properties of lists can be gotten from *builtin* functions.

In [None]:
len(martinsList) # The length of my list

### We can get individual elements of a list, by specifying the *index*

In [27]:
martinsList[-1]

<function __main__.<lambda>(y)>

### In Python, indexing starts at $0$, so in the above example we are getting the last element - it wraps around. Beware, though, you can exceed the range of indices...

In [None]:
martinsList[100]

### As with strings, slicing is possible:

In [30]:
martinsList[1:4] #Starts at 1, ends at 3

[2, 3.14, 3.141592653589793]

In [31]:
martinsList[1:6:2] #Starts at 1, ends at or before 6, 2 steps at a time

[2, 3.141592653589793, <function __main__.<lambda>(y)>]

### Lists are *mutable*. This is another concept that might sound a bit obscure, but it matters. You can change the items, append or insert new ones, extend the list with things from a list

In [32]:
martinsList[4] += 5 #A little twist; incrementing the value at index 4

In [33]:
martinsList.append(6) #Tacking on the value 6 to the end of the list

In [34]:
martinsList

['apple', 2, 3.14, 3.141592653589793, 47, <function __main__.<lambda>(y)>, 6]

In [35]:
martinsList.extend(['a','b','c']) #Tacking on everything from ['a','b','c']

In [38]:
martinsList.insert(0,9) #The method insert puts the new item at index 0

In [39]:
martinsList.insert(3,17) #You can also indicate where to insert it

In [40]:
martinsList.pop() #Removes and returns last item

'c'

In [41]:
martinsList.pop(0) #Removes and returns item at specified index

9

In [42]:
martinsList

[9,
 'apple',
 17,
 2,
 3.14,
 3.141592653589793,
 47,
 <function __main__.<lambda>(y)>,
 6,
 'a',
 'b']

In [44]:
del martinsList[4]
martinsList

[9, 'apple', 17, 2, 47, <function __main__.<lambda>(y)>, 6, 'a', 'b']

### Other builtin functions on lists are `sorted` and `sum`

In [45]:
my_num_list = [-10,23,-7,8,9,11,3,4]

In [None]:
sorted(my_num_list) #Returns another list, with the same items in order

In [46]:
my_num_list.sort() #Note that nothing is returned; it is *in place*

In [47]:
my_num_list

[-10, -7, 3, 4, 8, 9, 11, 23]

## Tuples

### Tuples are pretty much like lists that you cannot change, though you can compute their features and you can operate on them to make new tuples.

In [49]:
our_tuple = ('a','b','c','d','e')

### Check if an item is in a tuple

In [50]:
'd' in our_tuple

True

In [51]:
len(our_tuple)

5

### Getting new tuples from old:

In [54]:
our_tuple*2
our_tuple[4:7]
our_tuple + tuple(reversed(our_tuple))

('a', 'b', 'c', 'd', 'e', 'e', 'd', 'c', 'b', 'a')

### Finally, let's get rid of the thing

In [None]:
del our_tuple #Deletes the tuple. Now we are free to reassign it.

# Dictionaries

### You can think about dictionaries in various ways. They are like lists where you choose the indices. 

### They are also like *functions* (in the mathematical sense) where you specify all the inputs and outputs. 

### An example:

In [60]:
ourDict = {'Isabelle':28,'Anna':30,'Kevin':45,'Carmen':'Unknown'}

### The data of a dictionary is stored in `key:value` pairs. For instance `'Carmen'` is a key of the dictionary `ourDict` and `'Unknown'` is its associated value. 

### We retrieve that value using the same notation as for lists:

In [61]:
ourDict['Anna']

30

### Another way to generate the same dictionary is with the `zip` builtin

In [55]:
ourNames = ['Isabelle','Anna','Kevin','Carmen']
ourAges = [28,30,45,'Unknown']
alsoDict = dict(zip(ourNames,ourAges))

In [56]:
alsoDict

{'Isabelle': 28, 'Anna': 30, 'Kevin': 45, 'Carmen': 'Unknown'}

### It will be necessary at times to grab the keys or the values from the dictionary.

In [62]:
print(ourDict.values(),ourDict.keys())

dict_values([28, 30, 45, 'Unknown']) dict_keys(['Isabelle', 'Anna', 'Kevin', 'Carmen'])


### That looks odd. What are those?

In [63]:
list(ourDict.values()) #This is always comforting.

[28, 30, 45, 'Unknown']

### We can also get the two together.

In [64]:
ourDict.items()

dict_items([('Isabelle', 28), ('Anna', 30), ('Kevin', 45), ('Carmen', 'Unknown')])

## Sets

In [65]:
s = set(['u','d','ud','du','d','du'])
s

{'d', 'du', 'u', 'ud'}

In [66]:
t = set(['d','dd','uu','u'])

In [67]:
s.union(t)

{'d', 'dd', 'du', 'u', 'ud', 'uu'}

In [68]:
s.intersection(t)

{'d', 'u'}

In [69]:
s.difference(t)

{'du', 'ud'}

In [70]:
t.difference(s)

{'dd', 'uu'}

In [73]:
help(s.symmetric_difference)

Help on built-in function symmetric_difference:

symmetric_difference(...) method of builtins.set instance
    Return the symmetric difference of two sets as a new set.
    
    (i.e. all elements that are in exactly one of the sets.)



In [72]:
s.symmetric_difference(t) #Things in exactly one of t or s

{'dd', 'du', 'ud', 'uu'}

### Sets give us a quick way to get the unique members of a list, i.e., remove duplicate members.

In [74]:
from random import randint
l = [randint(0,10) for i in range(1000)]
len(l) # The length of l

1000

In [75]:
l[:20]

[5, 7, 4, 0, 5, 3, 10, 1, 0, 9, 6, 3, 4, 2, 6, 7, 7, 0, 10, 7]

In [76]:
s = set(l)
s

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

## Numpy & Vectorization

# `numpy`

### This package enables you to do great things, including linear algebra. Here we are defining an array:

In [78]:
import numpy as np

In [79]:
M = np.array(
            [[ 1., 0., 0.],
            [ 0., 1., 2.]]
            )
print(M)

[[1. 0. 0.]
 [0. 1. 2.]]


### We can ask for some *attributes* or features of that array.

In [80]:
M.shape

(2, 3)

### We can take slices or submatrices of it.

In [81]:
M2 = M[:,:2]
M3 = M2.copy()
M2.shape

(2, 2)

### We can do all sorts of operations, including matrix multiplication.

In [82]:
M2 *= 2
print(M,'\n\n',M2, '\n\n ',M3)

[[2. 0. 0.]
 [0. 2. 2.]] 

 [[2. 0.]
 [0. 2.]] 

  [[1. 0.]
 [0. 1.]]


In [None]:
M2.dot(M) #The matrix product M2.M

In [None]:
M.dot(M2) #The nonsense product M.M2

In [None]:
np.matmul(M2,M3) #Another way

### We can *call methods* that compute properties of our matrices.

In [None]:
np.linalg.svd(M)

In [None]:
help(np.linalg.svd)

### Also, `numpy` is great for random samples and some standard probability distributions.

In [None]:
sU = np.random.rand(3,2) #Sample from the uniform distribution
sU

In [None]:
sN = np.random.randn(3,4) #Sample from the standard normal
sN

In [None]:
sB = np.random.binomial(3,.4,10) #Sample 10 values from the binomial n=3,p=.4
sB

### Here are some more numpy array methods.

In [None]:
M.std()

In [None]:
M.cumsum()

In [14]:
sumRows, sumCols = M.sum(axis=0), M.sum(axis=1)

In [None]:
print(sumRows,sumCols)

In [None]:
sumRows.sum() == sumCols.sum()

## Understanding Numpy's Purpose

### Play around with lists a little, and you'll see the need for arrays.

### Suppose we want a $5000 \times 5000$ matrix of sample from a standard normal. Here is how to do that with a *nested list comprehension*, the natural way to do it with lists:

In [None]:
import random
I = 5000
%time mat = [[random.gauss(0,1) for j in range(I)] for i in range(I)]

### Now with `numpy`'s random equipment:

In [None]:
%time mat = np.random.standard_normal((I,I))

### Huge difference.

## Vectorization

### Very often, you can make code cleaner and faster using *vectorization*. From your text:

>The fundamental idea is to conduct an operation on or to apply a function to a complex object "at once" and not by iterating over the single elements of the object.

### Let's start with to $4 \times 3$ arrays.

In [19]:
r,s = np.random.standard_normal((4,3)), np.random.standard_normal((4,3))

In [None]:
r + s

### We can also combine with *scalars*

In [None]:
2 * r + 3

### Or certain arrays of appropriate shape

In [22]:
s = np.random.standard_normal(3)

In [None]:
r + s

### Here is an 'inappropriate' shape

In [24]:
s = np.random.standard_normal(4)

In [None]:
r + s

### But we can *transpose* the array and the addition will work just fine.

In [28]:
r.T + s

array([[-0.66625026,  2.12641465,  2.60830469, -2.69575089],
       [-0.62619826,  0.07695654,  0.74886909, -0.79337633],
       [-1.53429168,  1.63211883,  1.64970089, -0.19776479]])

### Alternatively, we can reshape and then transpose the vector `s` and put it first.

In [None]:
s.reshape((1,4)).T + r

### But we can't just add the transpose of `s` with r:

In [None]:
s.T + r

### Because the transpose of a vector of shape `(4,)` is that same vector.

In [None]:
s.T == s

### One last point about `numpy`, for now at least: you can control memory layout of an array, and that layout choice matters. The choices of *C-like* and *Fortran-like* memory layout have tradeoffs illustrated below with some simple examples.

In [48]:
x = np.random.standard_normal((5,10000000))
y = 2 * x + 3 # linear equation y = a * x + b
C = np.array((x,y), order='C')
F = np.array((x,y), order='F')
x,y = 0,0 # Clearing some memory

In [None]:
C[:2].round(2)

In [None]:
%timeit C.sum()

In [None]:
%timeit F.sum()

### Hardly any difference.

### Now lets look at the row- and columnwise sums for C...

In [None]:
%timeit C[0].sum(axis=0)

In [None]:
%timeit C[0].sum(axis=1)

### and then for F

In [None]:
%timeit F.sum(axis=0)

In [None]:
%timeit F.sum(axis=1)

### It appears that C-like layout is the winner.

## Logic

In [7]:
condition1 = True
condition2 = False
condition3 = None

### Let's look at combining *Boolean* values.

In [None]:
condition1 and condition2

In [None]:
condition1 or condition2

In [10]:
condition1 and condition3

In [None]:
print(condition1 and 0, condition1 and 1)

In [None]:
print(condition1 or 1, condition2 or 1)

### If we store them in a `numpy` array, then we can apply aggregate methods

In [None]:
import numpy as np
bls = np.array([condition1,condition2,condition3])
print(bls.any(),bls.all())

### We can write code whose execution depends on the values of these variables.

In [None]:
if condition1:
    print("yes")
    
else:
    print("no")

In [18]:
condition1 = not condition1

In [None]:
if condition1:
    print("yes")
    
else:
    print("no")

### We can produce `bool` values using the comparison operators.

In [33]:
a, b = 3, 4.5 # This is multiple assignment

In [40]:
newcond1 = a == b
newcond2 = a is b
newcond3 = a >= b
newcond4 = a != b

In [None]:
newcond4

### Here is a more delicate `if-else` block.

In [None]:
if newcond1:
    if newcond2:
        print("one thing")
    elif newcond4:
        print("another")
    
else:
    print(newcond3 and newcond4)

In [44]:
a, b = 3, 3 # New values

In [None]:
if newcond1:
    if not newcond2:
        print("one thing")
    elif newcond4:
        print("another")
    else:
        print("the last")
    
else:
    print(newcond3 and newcond4)

### In simple cases, one way to deal with *nested* `if-else` blocks is simply to combine the conditions.

In [49]:
a,b,c = 1,2,1

In [None]:
if a == b:
    if b == c:
        print("yes")
    else: print("no")
else: print("No.")

In [54]:
b = 1 # Changing 1 value

In [None]:
if a == b:
    if b == c:
        print("yes")
    else: print("no")
else: print("No.")

In [56]:
c = 2 # A different difference

In [None]:
if a == b:
    if b == c:
        print("yes")
    else: print("no")
else: print("No.")

### In this simple case, it would be better to write:

In [None]:
if a==b and b==c:
    print("yes")
else:
    print("no")

## Looping

### This is great, but we don't want to change and check values manually all the time. Enter `for` and `while` loops.

### Here is *FizzBuzz* implemented with a `for` loop and then with a `while` loop. In *FizzBuzz*, we run through a list of integers and print "Fizz" for every multiple of $3$ and "Buzz" for every multiple of $5$ with the exception that for integers *simultaneously divisible by $3$ and $5$* we print "FizzBuzz".

### To keep things simple, we'll do this for integers up to and including $100$.

In [62]:
ourInts = list(range(1,101)) # We want a list, so we must ask for it explicitly

### Our `for` loop

In [68]:
 if 16/3: print("True")

True


In [None]:
for x in ourInts:
    threeQ = x % 3 == 0
    fiveQ = x % 5 == 0
    if threeQ and fiveQ:
        print("FizzBuzz\n")
    elif threeQ:
        print("Fizz\n")
    elif fiveQ:
        print("Buzz\n")
    else:
        print(str(x) + '\n')

### Our `while` loop:

In [None]:
counter = 1
while counter < len(ourInts):
    x = ourInts[counter-1]
    counter += 1
    threeQ = x % 3 == 0
    fiveQ = x % 5 == 0
    if threeQ and fiveQ:
        print("FizzBuzz\n")
    elif threeQ:
        print("Fizz\n")
    elif fiveQ:
        print("Buzz\n")
    else:
        print(str(x) + '\n')    

## List Comprehensions

### That's great, but what if we wanted this output recorded in a list object rather than printed to the console?

In [1]:
fizzList = ["Fizz"*(not i%3) + "Buzz"*(not i%5) or i for i in range(1,101)]

### There is a lot more than list comprehension going on there, e.g.

In [None]:
"Buzz"*(1) or 4

### But list comprehensions are a good way to give compact or *inline* descriptions of things. You can also nest them:

In [93]:
pairsList = [[x,y] for x in range(7) for y in ['a','b','c']]

In [None]:
print(pairsList)

In [104]:
upperDiag = [[x,y] for y in range(10) for x in range(y)]

In [None]:
upperDiag

### Aside: why that name?

In [85]:
import matplotlib.pyplot as plt

In [None]:
xs, ys = [x[0] for x in upperDiag], [x[1] for x in upperDiag] # Preparing the data for our plotting library
plt.scatter(xs,ys)

## Functions

### We have already been using functions implicitly. In fact, `print` is a function. The string methods are functions. The builtin `sorted` is a function.

### Here is a user-defined function. Let's combine what we did above to write a function that takes an `integer` and returns the *FizzBuzz* list for all values up to that integer.

In [108]:
isinstance(6,int)

True

In [26]:
def fizzFunc(n):
    """Here is a docstring
    we're using it to record
    information about the 
    purpose of this function.
    This function does *FizzBuzz*
    so that we don't have to 
    think about it anymore."""
    if isinstance(n,int):
        return ["Fizz"*(not i%3) + "Buzz"*(not i%5) or i for i in range(1,n+1)]
    else:
        raise ValueError("That's not an int")

### Here it is working as intended:

In [None]:
print(fizzFunc(19))

### Now, we'll make it raise an error.

In [None]:
fizzFunc('a')

### In fact, it is *still* working as intended. Handling errors properly is an important part of writing robust code. We'll see more of this later, so it will pay to look into it a little now.

### Suppose we are calling `fizzFunc` inside another function `fizzCount` defined thus:

In [34]:
def fizzCount(n):
    try:
        L = fizzFunc(n)
        c = L.count('Fizz')
        return c
    except ValueError as e:
        return e.args[0]

### Now, instead of barfing when it gets the wrong input, our function can simply report that it got the wrong input.

In [None]:
fizzCount('a')

### Is it totally safe now? Not by a longshot. See if you can improve it.

## File IO & Text Processing

### Now we're going to see some basic IO operations and learn how to process text by following an example from *Think Python, $2^{nd}$ Edition*.

## *Example - Processing Text from a Book*
>Write a program that reads a file, breaks each line into words, strips whitespace and punctuation from the words and converts them to lowercase.

### Here is one quick way to download a text from inside our Python session.

In [None]:
from urllib import request
request.urlretrieve ("https://www.gutenberg.org/files/158/158-0.txt", "Emma.txt")

### While we're at it, let's download the `words.txt` from *Think Python*.

In [None]:
fileURL = "https://raw.githubusercontent.com/AllenDowney/ThinkPython2/master/code/words.txt"
request.urlretrieve (fileURL, "words.txt")

### Let's import `time` so that we can make a comparison.

In [42]:
import time

### Here are two different ways to write the desired function.

In [43]:
def make_word_list1():
    """Reads lines from a file and builds a list using append."""
    t = []
    fin = open('words.txt')
    for line in fin:
        word = line.strip()
        t.append(word)
    return t


def make_word_list2():
    """Reads lines from a file and builds a list using list +."""
    t = []
    fin = open('words.txt')
    for line in fin:
        word = line.strip()
        t = t + [word]
    return t


### Below, we compare these two functions. Which one is the faster of the two?

In [None]:
start_time = time.time()
t = make_word_list1()
elapsed_time = time.time() - start_time

print(len(t))
print(t[:10])
print(elapsed_time, 'seconds')

start_time = time.time()
t = make_word_list2()
elapsed_time = time.time() - start_time

print(len(t))
print(t[:10])
print(elapsed_time, 'seconds')

### Whoa!

### Next, we want to clean up the book and compute frequency statistics - what are the words in the book, and how many times is each one used?

In [45]:
import string # Used to get punctuation

In [89]:
def process_file(filename, skip_header):
    """Makes a histogram that contains the words from a file.

    filename: string
    skip_header: boolean, whether to skip the Gutenberg header
   
    returns: map from each word to the number of times it appears.
    """
    hist = {} # This is an empty dictionary
    fp = open(filename)

    if skip_header:
        skip_gutenberg_header(fp)

    for line in fp:
        process_line(line, hist)

    return hist


def skip_gutenberg_header(fp):
    """Reads from fp until it finds the line that ends the header.
    
    RMK: You just have to look at the Gutenberg format. That is
    how you would know how to write such a function. This had to
    be changed.

    fp: open file object
    """
    for line in fp:
        if line.startswith('*** START OF THIS PROJECT GUTENBERG EBOOK EMMA ***'):
            break


def process_line(line, hist):
    """Adds the words in the line to the histogram.

    Modifies hist.
    
    RMK: This is not *pure* function. It modifies
    one of its arguments. This is frowned upon
    in many circles, but it is one way to do things.

    line: string
    hist: histogram (map from word to frequency)
    """
    # replace hyphens with spaces before splitting
    line = line.replace('-', ' ')
    strippables = string.punctuation + string.whitespace

    for word in line.split():
        # remove punctuation and convert to lowercase
        word = word.strip(strippables)
        word = word.lower()

        # update the histogram
        hist[word] = hist.get(word, 0) + 1



### We want to compute word statistics for a document.

### The `hist` dictionary object contains all the information about our word stats. It is easy to write functions that compute word count and *unique* word count.

In [55]:
def total_words(hist):
    """Returns the total of the frequencies in a histogram."""
    return sum(hist.values())


def different_words(hist):
    """Returns the number of different words in a histogram."""
    return len(hist)

### Finally, our functions for the most commonly occurring words.

In [56]:
def most_common(hist):
    """Makes a list of word-freq pairs in descending order of frequency.

    hist: map from word to frequency

    returns: list of (frequency, word) pairs
    """
    t = []
    for key, value in hist.items():
        t.append((value, key))

    t.sort()
    t.reverse()
    return t


def print_most_common(hist, num=10):
    """Prints the most commons words in a histgram and their frequencies.
    
    hist: histogram (map from word to frequency)
    num: number of words to print
    """
    t = most_common(hist)
    print('The most common words are:')
    for freq, word in t[:num]:
        print(word, '\t', freq)


### OK, now it is time to use all of our functions. Let's try it on *Emma* first.

In [None]:
hist = process_file('Emma.txt', skip_header=True)
print('Total number of words:', total_words(hist))
print('Number of different words:', different_words(hist))

t = most_common(hist)
print('The most common words are:')
for freq, word in t[0:20]:
    print(word, '\t', freq)