
# Connect Intensive - Machine Learning Nanodegree
# Lesson 04: Natural Language Processing (NLP) Mini-Projects
# Part 01: Bayes NLP Mini-Project

## Objectives
  - Understand how [Bayes Rule](https://en.wikipedia.org/wiki/Bayes%27_theorem) derives from [conditional probability](https://en.wikipedia.org/wiki/Conditional_probability)
  - Write methods, utilizing Python dictionary objects and string methods such as `str.split()`.
  - Apply Bayes Rule to simple NLP: missing word prediction problems
  
## Prerequisites
  - Basic Python knowledge in strings and dictionaries would help.

## Bayes Rule
Here is a brief description of Bayes rule -- if you're already familiar, feel free to skip ahead to the next section, **Bayes Rule in NLP**.

Bayesian learning starts from an application of conditional probability. Suppose we have some **hypothesis** $h$ that occurs with probability $P(h)$. For example, in the field of oncology, we might be concerned about cancer rates. One hypothesis could be $h = $ "The patient has cancer". We may call the set of all possible hypotheses $H$, or the **hypothesis space**. If we have no other data about the patient, $P(h)$ is known as the **prior probability**, that is, prior to learning any data about the patient.

Now suppose there's some diagnostic screening we can conduct for the specific type of cancer. The screen can come back positive or negative. We can represent this fact as the **training data** $D$ for the instance. For example, one possible training data could be $D = $ "The diagnostic test for the patient is negative". We can then write a **conditional probability** $P(h|D)$ (read as "probability of hypothesis $h$ given training data $D$"). For our example, this represents the probability that the patient has cancer, given that we know the diagnostic test for the patient is negative. Because we evaluate this probability *after* knowledge of the training data, this quantity is also known as the **posterior probability**.

The probability of the **conjunction** $\land$ of two events can be computed by conditional probabilities:
$$P(D \land h) = P(D|h)\cdot P(h) = P(h|D)\cdot P(D)$$
Here, the quantity $P(D \land h)$ represents the probability of the training data $D$ and the hypothesis $h$ **both** being true for our patient.

The next term $P(D|h)\cdot P(h)$ is the product of (1) the conditional probability of training data $D$ given the hypothesis $h$ is true, and (2) the prior probability of hypothesis $h$ being true. Here, we've conditioned on $h$ being true.

The last term $P(h|D)\cdot P(D)$ is the product of (1) the conditional (posterior) probability of hypothesis $h$ given the training data $D$ is true, and (2) the prior probability of the training data $D$ being true. Here, we've conditioned on $D$ being true.

Bayes rule solves the above equation for the posterior probability $P(h|D)$:

$$\boxed{P(h|D) = \dfrac{P(D|h)\cdot P(h)}{P(D)}}$$

## Bayes Rule in NLP
We are not going to be looking at a problem so grim as cancer diagnosis. Instead, we will apply Bayes rule to making predictions about words.

Suppose we have the following quote (from the movie Office Space):
> "So if you could just go ahead and pack up your stuff and move it down there, that would be terrific, OK?"

Also suppose this text is the entire population of text we have to go from. We can ask a few questions based on this sentence:
  1. What is the probability of finding the word "you" after the word "if"?
  2. What is the probability that a randomly selected word from the sentence is "you"?
  3. What is the probability that a randomly selected word from the sentence is "if"?
  
Enter your answers in the Bayes NLP Mini-Project "Quiz: Calculations".

## Exercise: Maximum likelihood

In this exercise, you will write a method, `NextWordProbability(sampletext,word)`, that creates a Python dictionary from a string `sampletext` and a target word `word`. The keys of the dictionary will be the words that follow the target word `word`, and the values will be the number of times the key follows the target word `word`. For example,  the output of the following code:
```
memo = "If you could just go ahead and pack up your stuff and move it down there, that would be terrific, OK?"
word = "and"
print(NextWordProbability(memo,word))
```
should be the dictionary:
```
{'move': 1, 'pack': 1}
```
Don't worry about removing punctuation or changing upper or lower case letters.

**Complete** the method `NextWordProbability` in the cell below and then **run** the cell. You may want to use [the string method `split`](https://docs.python.org/2/library/stdtypes.html#string-methods), and refer to [the Python documentation on dictionaries](https://docs.python.org/2/library/stdtypes.html#mapping-types-dict).  Then, you can test your method by running the cell below it to try some test cases. When you feel confident your `NextWordProbability` method works, you can copy and paste the method into the Bayes NLP Mini-Project "Quiz: Maximum Likelihood".

In [None]:
# When you are happy with your NextWordProbability method,
# you may copy and paste it into the Bayes NLP Mini-Project
# "Quiz: Maximum Likelihood"

def NextWordProbability(sampletext,word):
    
    
    return {}

In [None]:
# Test cases: see how well your NextWordProbability method works.

memo1 = "If you could just go ahead and pack up your stuff and move it down there, that would be terrific, OK?"
word1 = "and"
print(NextWordProbability(memo1,word1))
# Output should be:
# {'move': 1, 'pack': 1}

memo2 = "Milt, we're gonna need to go ahead and move you downstairs into storage B. We have some new people coming in, and we need all the space we can get. So if you could just go ahead and pack up your stuff and move it down there, that would be terrific, OK?"
word2 = "need"
print(NextWordProbability(memo2,word2))
# Output should be:
# {'to': 1, 'all': 1}

memo3 = "Hello Peter, what's happening? Ummm, I'm gonna need you to go ahead and come in tomorrow. So if you could be here around 9 that would be great, mmmk... oh oh! and I almost forgot ahh, I'm also gonna need you to go ahead and come in on Sunday too, kay. We ahh lost some people this week and ah, we sorta need to play catch up."
word3 = "in"
print(NextWordProbability(memo3,word3))
# Output should be:
# {'tomorrow.': 1, 'on': 1}

## Conditioning multiple times
Suppose we have used our `NextWordProbability` method to compute probabilities for the next word based on sample text, and now we are faced with a situation where we have two missing words in a row: "for --- ---", and we want to know the most likely candidate for the *second* missing word based on the following probabilities:

$$\begin{array}{rcl}
P(\text{ "for this" }|\text{"for ---"})&=&0.4\\
P(\text{ "for that" }|\text{"for ---"})&=&0.3\\
P(\text{ "for those" }|\text{"for ---"})&=&0.3\end{array}$$

$$\begin{array}{rclrcl}
P(\text{ "this time" }|\text{"this ---"})&=&0.6\quad&P(\text{ "this job" }|\text{"this ---"})&=&0.4\\
P(\text{ "that job" }|\text{"that ---"})&=&0.8\quad&P(\text{ "that time" }|\text{"that ---"})&=&0.2\\
P(\text{ "those items" }|\text{"those ---"})&=&1.0\end{array}$$

Which word is the most likely candidate for the *second* missing word after "for"? ...with what probability?

Enter your answers in the Bayes NLP Mini-Project "Quiz: Optimal Classifier Example"

## Exercise: Bayes Optimal Classifier
In this exercise, you will write a method `LaterWords(sample,word,distance)` that determines the most likely word to appear `distance` words after the target word `word` based on the text in the string `sample`. For example, a call to the method:
```
LaterWords(memo,"and",2)
```
would return a string: the most frequent word appearing 2 words after `"and"` in the string `memo`, *e.g.* "and --- **---**"

**Complete** the procedure `LaterWords` in the cell below and then **run** the cell. You may want to call your method `NextWordProbability()`, and you may refer to [the Python documentation on dictionaries](https://docs.python.org/2/library/stdtypes.html#mapping-types-dict).  Then, you can test your method by running the cell below it to try some test cases. When you feel confident your `LaterWords` method works, you can copy and paste the method into the Bayes NLP Mini-Project "Quiz: Optimal Classifier Exercise".

**Note:** If you choose to call `NextWordProbability()` within `LaterWords()`, you will also need to copy and paste your implementation of the `NextWordProbability()` method into the Bayes NLP Mini-Project "Quiz: Optimal Classifier Exercise"

In [None]:
#------------------------------------------------------------------

#
#   Bayes Optimal Classifier
#
#   In this quiz we will compute the optimal label for a second missing word in a row
#   based on the possible words that could be in the first blank
#
#   Finish the procedure, LaterWords(), below
#
#   You may want to use NextWordProbability(), depending on how you choose to approach this problem
#

def LaterWords(sample,word,distance):
    '''@param sample: a sample of text to draw from
    @param word: a word occuring before a corrupted sequence
    @param distance: how many words later to estimate (i.e. 1 for the next word, 2 for the word after that)
    @returns: a single word which is the most likely possibility
    '''
    
    # TODO: Given a word, collect the relative probabilities of possible following words
    # from @sample. You may want to import your code from the maximum likelihood exercise.
    
    # TODO: Repeat the above process--for each distance beyond 1, evaluate the words that
    # might come after each word, and combine them weighting by relative probability
    # into an estimate of what might appear next.
    
    return {}


In [None]:
# Test cases: see how well your LaterWords procedure works.

sample_memo = '''
Milt, we're gonna need to go ahead and move you downstairs into storage B. We have some new people coming in, and we need all the space we can get. So if you could just go ahead and pack up your stuff and move it down there, that would be terrific, OK?
Oh, and remember: next Friday... is Hawaiian shirt day. So, you know, if you want to, go ahead and wear a Hawaiian shirt and jeans.
Oh, oh, and I almost forgot. Ahh, I'm also gonna need you to go ahead and come in on Sunday, too...
Hello Peter, whats happening? Ummm, I'm gonna need you to go ahead and come in tomorrow. So if you could be here around 9 that would be great, mmmk... oh oh! and I almost forgot ahh, I'm also gonna need you to go ahead and come in on Sunday too, kay. We ahh lost some people this week and ah, we sorta need to play catch up.
'''

corrupted_memo = '''
Yeah, I'm gonna --- you to go ahead --- --- complain about this. Oh, and if you could --- --- and sit at the kids' table, that'd be --- 
'''

print(LaterWords(sample_memo,"ahead",2))
# Output: come
print(LaterWords(sample_memo,"and",3))
# Output: on
print(LaterWords(sample_memo,"you",1))
# Output: to

## Bayes NLP Reflection
The last few quizzes in the Bayes NLP Mini-Project allow you to reflect on how Bayes Rule was applied here for simple word prediction tasks. You may enter your answers in the last four quizzes of the Bayes NLP Mini-Project in the Classroom
  1. What set of words in a memo do you think could help predict what a missing word might be? What are some advantages and disadvantages of using more or fewer possible influences in prediction?
  2. If you wanted to measure the joint probability distribution of a missing word, given its position relative to every other word in the document, how many probabilities would you need to measure? Say the document is $N$ words long.
  3. Given the corpus of text we have from our boss, we might like to identify some things he often says, and use that knowledge to make better predictions. What are some statements you see arising multiple times?
  4. Suppose we've identified the following patterns in our boss' speech:
        - "Gonna need [you] to go ahead and"
        - "So if you could ... that would be [great, terrific], [ok, okay, mmmk]"
        - "Oh, and I almost forgot"

     Trying to search all [regular expressions](https://docs.python.org/2/library/re.html) of length up to 9 with multiple optional parts is computationally infeasible. But if we have these hypotheses to begin with, we can make extremely accurate guesses. For example, fill in the blanks in the following sentence:
       > "Yeah, I'm gonna --- you to go --- --- not complain about this. Oh, and if you could --- ahead and sit at the kids' table, that'd be ---."