# Mystery Friend

## Introduction
The goal of this project is to find out the person who sent the anonymous postcard `mystery_postcard`. Based on handwriting list has already been narrowed down to
- Emma Goldman
- Matthew Henson
- TingFang Wu

We are going to use their previous postcard's writing manner, **Bag-of-Words Model** and **Naive Bayes Classifier** to find the person who sent the mystery postcard.

## Import Required Libraries

In [3]:
# Counter converts text into a Bag-of-Words Dictionary
from collections import Counter

# CountVectorizer is used for text Vectorization
from sklearn.feature_extraction.text import CountVectorizer

# Naive Bayes classifiers are a collection of classification algorithms based on Bayes’ Theorem i.e. every pair of features being classified is independent of each other.
from sklearn.naive_bayes import MultinomialNB

## Load Postcards

#### Emma Goldman

In [4]:
goldman = ""
with open("goldman_emma.txt", "r") as f:
    goldman = f.read()

goldman_docs = goldman.split(". ")
#print(goldman_docs)

#### Matthew Henson

In [5]:
with open("henson_matthew.txt", "r") as f:
    henson = f.read()

henson_docs = henson.split(". ")
#print(henson_docs)

#### TangFang Wu

In [6]:
with open("wu_tingfang.txt", "r") as f:
    wu = f.read()

wu_docs = wu.split(". ")
#print(wu_docs)

#### Mystery Postcard
The Anonymous Postcard that was received

In [7]:
mystery_postcard = """
My friend,
From the 10th of July to the 13th, a fierce storm raged, clouds of
freeing spray broke over the ship, incasing her in a coat of icy mail,
and the tempest forced all of the ice out of the lower end of the
channel and beyond as far as the eye could see, but the _Roosevelt_
still remained surrounded by ice.
Hope to see you soon.
"""

### Combine list of Friends' Writing sample

In [8]:
friends_docs = goldman_docs + henson_docs + wu_docs

### Labels for Friends

In [9]:
friends_labels = ["Goldman's friend"] * len(goldman_docs) + ["Henson's friend"] * len(henson_docs) + ["WU's friend"] * len(wu_docs)
#print(friends_labels)

## Explore Postcards

In [10]:
print("Goldman's friend:")
print(goldman_docs[0])

Goldman's friend:
The history of human growth and development is at the same time the
history of the terrible struggle of every new idea heralding the
approach of a brighter dawn


In [11]:
print("\nHenson's friend:")
print(henson_docs[0])


Henson's friend:
When the news of the discovery of the North Pole, by Commander Peary,
was first sent to the world, a distinguished citizen of New York City,
well versed in the affairs of the Peary Arctic Club, made the statement,
that he was sure that Matt Henson had been with Commander Peary on the
day of the discovery


In [12]:
print("\nWU's friend:")
print(wu_docs[0])


WU's friend:
The Importance of Names

  "What's in a name?  That which we call a rose
  By any other name would smell as sweet."


Notwithstanding these lines, I maintain that the selection of names is
important


## BoW Vectorizer

In [13]:
# Create bow_vectorizer:
bow_vectorizer = CountVectorizer()
# Define friends_vectors:
friends_vectors = bow_vectorizer.fit_transform(friends_docs)
# Define mystery_vector: 
mystery_vector = bow_vectorizer.transform([mystery_postcard])

## Naive Bayes Classifier

In [14]:
# Define friends_classifier:
friends_classifier = MultinomialNB()
# Train the classifier:
friends_classifier.fit(friends_vectors, friends_labels)

MultinomialNB()

## Final Predictions

In [15]:
predictions = friends_classifier.predict(mystery_vector)
probability = friends_classifier.predict_proba(mystery_vector)
mystery_friend = predictions[0] if predictions[0] else "someone else"

In [16]:
# Print the final needed outcomes
print("\nThe postcard was from {}!".format(mystery_friend))
print("\nProbabilities of this postcard on belongings are:")
print("Goldman: {}".format(round(probability[0][0], 5)))
print("Henson: {}".format(round(probability[0][1], 5)))
print("WU: {}".format(round(probability[0][2], 5)))


The postcard was from Henson's friend!

Probabilities of this postcard on belongings are:
Goldman: 0.01102
Henson: 0.98898
WU: 0.0


## Conclusion

Looking at the probabilities of postcard on belongings our Model predicted that this postcard belongs to the **Matthew Henson**
But this was not 100% certain as probabilty of this postcard to belong to:
- Emma Goldman: 1.102%
- Matthew Henson: 98.898%
- TingFang Wu: 0%