## Lab 4 - Probability and Information Theory

In [52]:
# The essential module, but you may need others. 
import pandas as pd
import math

### Question 1: Load Data

#### a. Upload the file *pnp-train.txt* into  a `DataFrame` called `data`.  
* You can use the `read_csv` function to read in the data using a tab (`'\t'`) as a delimiter. 
* You'll also need to use the `Latin-1` encoding to read the data in correctly. 
* Name your columns: `type` and `name`. 

In [53]:
data = pd.read_csv('pnp-train.txt', delimiter="\t",  encoding='Latin-1', names=['type','name'])

data

Unnamed: 0,type,name
0,drug,Dilotab
1,movie,Beastie Boys: Live in Glasgow
2,person,Michelle Ford-Eriksson
3,place,Ramsbury
4,place,Market Bosworth
...,...,...
20996,movie,Old Pals
20997,place,Mailly-le-Château
20998,place,Sudbury
20999,place,West Wickham


#### b. Add a column called `first_word` to `data`. 
* To fill this column, split off the first word of the string in the `name` column and store it in lowercase letters. 

In [54]:
data['first_word'] = data['name'].map(lambda x: x.lower().split()[0])

#### c. Display the first 10 rows of the table.

In [55]:
data.head(10)

Unnamed: 0,type,name,first_word
0,drug,Dilotab,dilotab
1,movie,Beastie Boys: Live in Glasgow,beastie
2,person,Michelle Ford-Eriksson,michelle
3,place,Ramsbury,ramsbury
4,place,Market Bosworth,market
5,drug,Cyanide Antidote Package,cyanide
6,person,Bill Johnson,bill
7,place,Ettalong,ettalong
8,movie,The Suicide Club,the
9,place,Pézenas,pézenas


### Question 2: Write Probability Function

#### a. Write probability function for the `type` column.
* Define a function with this signature `def P(T)` that returns the relative frequency of a given *type*. 
* The `Counter` class in the `collections` module may be helpful.

In [56]:
from collections import Counter as ctr
import numpy as np

prob_ty = ctr(data.type)
prob_tot = len(data)

def P(T=""):
    return prob_ty[T] / prob_tot

#### b. What is the probability of the *type* `movie`? 

In [57]:
P('movie')

0.29817627732012764

#### c. Show the probabilities of all of the *type*s sum to one. 

In [58]:
import numpy as np
round(np.sum([P(T=x) for x in set(data['type'])]))

1.0

### Question 3: Write a Joint Distribution Function

#### a. Write a joint probability function for `type` and `first_word`.
* Define a function with this signature `P2(T, W1)` that returns the joint probability of the entries in the `type` and `first_word` columns.  
* You may want to use the `zip` function to combine the data before counting. 

In [59]:
probb_ty = {}
probb_tot = {}

for T in set(data.type):
    tar = data[data.type == T]
    tar_tot = [wd for wd in tar.first_word]
    probb_ty[T] = ctr(tar_tot)
    probb_tot[T] = len(tar_tot)

def P2(T="", W1=""):
    return probb_ty[T][W1] / sum(probb_tot.values())
   

#### b. What is the joint probability of the *type* `'person'` and the *first word* `'bill'`? 

In [60]:
P2(T='person', W1='bill')

0.00047616780153326033

#### c. What is the joint probability of the *type* `'movie'` and the *first word* `'the'`? 

In [61]:
P2(T='movie', W1='the')

0.02747488214846912

#### d. Show that the probability distribution sums to one. 

In [62]:
sum([P2(T, W1) for T in set(data.type) for W1 in set(data.first_word)])

0.9999999999997853

### Question 4: Write Another Probability Function

#### a. Write another probability function for the `type` column.
* Define a function with this signature `def Q(T)` that returns the relative frequency of a given *type*. 
* This function should marginalize the `type` distribution from the joint distribution function `P2`. 

In [63]:
def Q(T):
    return sum([P2(T, W1) for W1 in set(data.first_word)])

#### b. What is the probability of the *type* `'movie'`?  (The answer should be the same as Question 2b.)  

In [64]:
Q('movie')

0.29817627732012764

#### c. Show that the probability distribution sums to one. 

In [65]:
sum([Q(T) for T in set(data.type)])

1.000000000000029

### Question 5: Compute KL-Divergence

#### a. Compute the KL-Divergence for the probability functions `P` and `Q`. 

In [66]:
sum([P(T) * math.log(P(T)/Q(T)) for T in set(data.type)])

-2.9158386629021504e-14

#### b. Why did you get this answer? 

Because, negative sum of each occurance in P is multiplied by the log of the probability of the occurance in Q over the probability of event P.

### Question 6: Calculating Conditional Probabilities

#### a. How can you find the conditional probability of a *first word* given a *type* using the joint probability of these variables? 
* What's the formula for this calculation? Don't do the calculation. 
* It may require a modification of the *Multiplication Rule*. 

P(first_word|type) = P(type and first_word) / P(type)

#### b. Write a conditional probability function for a `first_word` and `type`.
* Define a function with this signature `def Pwt(W1, T)` that returns the conditional probability of a `first_word` given a `type`. Mathematically: `P(W1 | T)`. 
* This function should use the probabilities functions above to do the calculation. 

In [77]:
def Pwt(W1="", T=""):
    return P2(T, W1) / P(T)

#### c. What is the conditional probability of `'the'`, given `'movie'`?

In [78]:
Pwt(W1='the',T='movie')

0.09214308527626956

### Question 7: Use Bayes' Theorem to Convert Conditional Probabilities

#### a. Find the conditional probability of a *type* given a *first word* using the conditional probability of a *first word* given a *type*. 
* Define a function with this signature `def Ptw(T, W1)` that returns the conditional probability of a `type` given a `first_word`. Mathematically: `P(T | W1)`.
* Use Bayes' Theorem and the probability function for *type*s above do the calculation. You may also need to write a new probability function of *first word*s. 

In [79]:
words_ctr = ctr(data.first_word)

def Pw1(W1):
    return words_ctr[W1] / sum(words_ctr.values())

def Ptw(T="", W1=""):
    return Pwt(W1, T) * P(T) / Pw1(W1)

#### b. What is conditional probability of the *type* `'movie'` given the *first word* `'the'`?

In [80]:
Ptw(T='movie',W1='the')

0.9086614173228347

#### c. What is conditional probability of the *type* `'person'` given the *first word* `'the'`?

In [81]:
Ptw(T='person',W1='the')

0.0

#### d. What is conditional probability of the *type* `'drug'` given the *first word* `'the'`?

In [82]:
Ptw(T='drug',W1='the')

0.0

#### e. What is conditional probability of the *type* `'place'` given the *first word* `'the'`?

In [83]:
Ptw(T='place',W1='the')

0.0015748031496062992

#### f. What is conditional probability of the *type* `'company'` given the *first word* `'the'`?

In [84]:
Ptw(T='company',W1='the')

0.08976377952755905

#### g. Given this, if 'the' is the *first word*, what is the most likely *type*?

movie

### Question 8: Comparing Conditional Probabilities  

#### a. What is the conditional probability of the *first word* `'the'` given the *type* `'movie'`?

In [85]:
Ptw(T='movie',W1='the')

0.9086614173228347

#### b. What is the conditional probability of the *type* `'movie'` given the *first word* `'the'`?

In [86]:
Pwt(W1='the', T='movie')

0.09214308527626956

#### c. Are the two conditional probabilites the same? Why, or why not? 

I don't think so. These two probabilities aren't identical.

### Question 9: Fitting Probability Distributions to the Data

#### In our calculations, we assumed the data has a discrete probability distribution. Should we have used a continuous probability distribution, like a Gaussian or an exponential distribution? Why, or why not? 

In my way of thinking, a continuous function would be preferable.`