# Decrypting cryptid

## Tools, technologies, & techniques featured in this notebook
- List TBD

In [13]:
import numpy as np
import pandas as pd
from numpy.linalg import svd
# import string

import matplotlib.pyplot as plt
%matplotlib inline

import re
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem.wordnet import WordNetLemmatizer
# from nltk.stem.porter import PorterStemmer
# from nltk.stem.snowball import SnowballStemmer
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.decomposition import NMF
from sklearn.cluster import KMeans 
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

In [14]:
from urllib.request import urlopen
import plotly.express as px

## Text ingestion
**Source selection**
- Observation data was imported from the file you provided. We looked for other sources, but found that the publishers of this information took the most care to be most credible. A team of researchers would follow up on each sighting with an interview and collection of 'evidence' and attempt to consistently classify the report. They only publish the top three tiers of credibility--A through C in order of most to least evidence.

**Data import and data wrangling**

- 'Beautiful Soup' module to get the html data into a usable format
- Data was pretty messy--think of looking through a filing cabinet for a document where the person who was in charge of filing didn't reliably put files in the right folders
- Straightened out the filing cabinet and then obtained year and month, state and county, 


## Text preprocessing functions and methods
- Machine learning models need to have text converted into a format that they can use. 
- The steps we took to turn the text into machine-readable 'data' included stripping punctuation, tokenizing, lemmatization, and removing stopwords.
- We then 'vectorized' each observation so that the models could compare them.


In [15]:
wordnet = WordNetLemmatizer()
# porter = PorterStemmer()
# snowball = SnowballStemmer('english')
nltk.download('stopwords')

[nltk_data] Downloading package stopwords to
[nltk_data]     /Users/ryankirkland/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


True

In [16]:

def remove_punc(string:str) -> str:
    '''Given a string, removes all punctuation and returned punctuation-less string'''
    return re.sub(r"(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)|^rt|http.+?", "", string)

In [17]:
def tokenize(str):
    '''
    Tokenize a str and return a tokenized list.
    '''
    return [word for word in word_tokenize(str)]

In [18]:
def lemmatize(doc):
    '''Takes in a doc and lemmatizes tokens in doc
    Parameters
    ----------
    doc: list of tokens
    
    Returns
    -------
    lemmatized tokens
    '''
    return [wordnet.lemmatize(tkn) for tkn in doc]

In [19]:
def rm_stop_words(doc, stops=set(stopwords.words('english'))):
    '''Takes in a doc and removes stop words
    Parameters
    ----------
    doc: list of tokens
    
    Returns
    -------
    Tokens with stop words removed
    '''
    return([w for w in doc if w not in stops])

In [20]:
def preprocess_corpus(content):
    '''
    Add docstring. Make flexible to allow for doing, or not doing, preprocessing functions. 
    Parameters
    ----------
    content (str): a collection of strings
    Returns
    -------
    A list of lists: each list contains a tokenized version of the original string
    '''
    preprocessed = []
    for i in range(len(content)):
        step_1 = remove_punc(content[i].lower())
        step_2 = tokenize(step_1)
        step_3 = lemmatize(step_2)
        step_4 = rm_stop_words(step_3)
        preprocessed.append(step_4)
    return preprocessed

In [21]:
# loading bigfoot data
sightings_df = pd.read_csv('data/bigfoot_with_county.csv', index_col=0)

In [22]:
sightings_df['observations'] = sightings_df['observations'].astype(str)

### Preprocessing--data load and function calls

In [23]:
cleaned_tokenized = preprocess_corpus(sightings_df['observations']) # cleaned and tokenized

str_cleaned_tokenized = [" ".join(x) for x in cleaned_tokenized] # string version of cleaned and tokenized 

In [24]:
len(cleaned_tokenized)

4411

## Processing

In [25]:
# 'Bag of words function'
vect = CountVectorizer(max_features=500)
word_counts = vect.fit_transform(str_cleaned_tokenized)

In [26]:
len(vect.get_feature_names())

500

In [27]:
tfidfvect = TfidfVectorizer(max_features=500)
tfidf_vectorized = tfidfvect.fit_transform(str_cleaned_tokenized)
tfidf_vectorized.toarray()

array([[0.12824608, 0.0759244 , 0.07095224, ..., 0.        , 0.0518886 ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.03951046, 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.06966028,
        0.        ],
       ...,
       [0.        , 0.        , 0.04487746, ..., 0.02769821, 0.        ,
        0.03032734],
       [0.        , 0.        , 0.        , ..., 0.20376512, 0.03449167,
        0.03187238],
       [0.        , 0.        , 0.        , ..., 0.15761544, 0.        ,
        0.        ]])

## Clustering with K Means

In [123]:
clusters = 5
kmeans = KMeans(n_clusters=clusters, 
                random_state=0).fit(tfidf_vectorized)

- Investigate the clusters  

> - Investigate the 'centroids' to find out what "topics" Kmeans has discovered by mapping these vectors back into the 'word space'.  Think of each feature/dimension of the centroid vector as representing the "average" article or the average occurrences of words for that cluster.
   
> - Find the features/dimensions with the greatest representation in the centroid.  Print out the top ten words for each centroid.


In [124]:
def Sort(sub_li): 
    return sorted(sub_li, key = lambda x: x[0], reverse=True)

def get_word(centroid):
    return [x[1] for x in centroid]

for k in range(5):
    matched = zip(kmeans.cluster_centers_[k], tfidfvect.get_feature_names())
    match = Sort(list(matched))
    print(' '.join(get_word(match[:24])), '\n')

track print wa inch snow footprint foot found toe area trail picture one long creek large road size human went like would could made 

wa creature saw tree large back foot area tall river seen looked dog house said wood one like heard ran around walking see two 

wa heard sound like scream night loud sounded noise time tent wood area howl dog back camp could one animal hear around went never 

wa road saw car driving side creature foot tall see looked hair back like dark around front seen large right highway home area could 

wa back saw like tree see could foot time wood would one around looked something area heard got friend thing went didnt never house 



For heirarchical clustering methods, see 819 am clustering assignment

## Cosine similarity
- Unsupervised learning

- Use the cosine similarity to compare similarity between documents.

- sklearn's [linear_kernel](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.linear_kernel.html) (computes dot product) can be used on tfidf to compute the cosine similarity since rows are normalized.*

- Here's a page on cosine similarity from [sklearn documentation](http://scikit-learn.org/stable/modules/metrics.html#cosine-similarity) and a relevant [stack overflow post](http://stackoverflow.com/questions/12118720/python-tf-idf-cosine-to-find-document-similarity).

- *The stack overflow post is helpful. It provides instruction over how to slice the tfidf and then how to apply cosine similarity between one doc and all of the rest.*

In [128]:
cosine_similarities = linear_kernel(tfidf_vectorized[1:2], tfidf_vectorized[1:500]).flatten() # This is comparing cs for article #2 and the next 500.

In [126]:
related_docs_indices = cosine_similarities.argsort()[:-6:-1] # This identifies the index of the top 5 most similar.
print(related_docs_indices)

most_similar = cosine_similarities[related_docs_indices] # and their related cs
most_similar

[  0 134 299  51  35]


array([1.        , 0.41648761, 0.40048277, 0.39099911, 0.38874647])

In [127]:
for i in related_docs_indices:
    print(sightings_df['observations'].iloc[i]) # Going step by step pulling up the most similar reports by index

I and two of my friends were bored one night so we decided to do a little snowmachining. Though it was illegal to snowmachine in Anchorage, there were some good trails to ride on a little north of my house.  We took off at probably 11 pm, rode up the road about a quarter mile, and cut off on the trails. It had snowed about 10 inches a few days before so there was fresh snow, with no tracks.  I was leading the way for about a half hour, then we stopped and talked for a little bit.  We took off again and kept cruising on some sort of game trail that led to an opening in the woods.  I rode off into the opening with my friends following about fifty yards behind me.  I came over this little mound and saw strange tracks leading to this spot in the snow where it looked like something had pushed aside some snow and layed down.  I figured it was just a moose or something.  But I followed the tracks over the next small hill and as I came down the far side my headlight pointed right on the back o

## Decompositions NMF (and SVD)
- Unsupervised learning
- Good for situations when there's some potentially valid grouping to both rows and columns, such as putting Joe and Sam in the same group because they like similar movies (as opposed to traditional supervised models where there are features and targets)
- See 820pm solution to NMF for good soft classification and test of classification


In [8]:
from sklearn.decomposition import NMF

In [28]:
nmf = NMF(n_components=5, random_state=1)

In [29]:
W = nmf.fit_transform(tfidf_vectorized)
H = nmf.components_

In [30]:
W

array([[0.07913446, 0.        , 0.0943246 , 0.        , 0.04061488],
       [0.08150579, 0.02538353, 0.00700339, 0.        , 0.05642254],
       [0.03330436, 0.02974379, 0.02128057, 0.01979833, 0.        ],
       ...,
       [0.0867708 , 0.00722763, 0.00217752, 0.04459817, 0.05050635],
       [0.07094695, 0.03922835, 0.02593946, 0.00778871, 0.0382518 ],
       [0.04129261, 0.        , 0.        , 0.08748525, 0.        ]])

In [31]:
H

array([[0.10931832, 0.0841657 , 0.09510468, ..., 0.46675824, 0.23753402,
        0.29082613],
       [0.11385493, 0.07636192, 0.02571999, ..., 0.14082082, 0.16146908,
        0.14633041],
       [0.07536508, 0.00862649, 0.15892977, ..., 0.14377551, 0.07075344,
        0.08158853],
       [0.08055423, 0.08431972, 0.05443353, ..., 0.01840445, 0.14454432,
        0.02289925],
       [0.04083377, 0.        , 0.04191497, ..., 0.21305049, 0.10421994,
        0.22408491]])

In [37]:
topics = ['latent_topic_{}'.format(i) for i in range(5)]
W = pd.DataFrame(W, index=sightings_df.observations, columns=topics)

In [39]:
W.sort_values('latent_topic_0', axis=0, ascending=False)

Unnamed: 0_level_0,latent_topic_0,latent_topic_1,latent_topic_2,latent_topic_3,latent_topic_4
observations,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
"I am writing this because i think that i might have had a recent experience with a bigfoot. I live in Springdale, Ar. Springdale is not very rugged as far as forest goes, but it can get pretty secluded in the right places. On october 10,2002 me and a friend set out on a trip to follow a creek to see how far it goes, because i followed this creek when i was young. This creek goes through springdale, and winds in through Elm springs,wheres theres nothing but dense trees on one side the left, and open fields, to the right-private properity. Well Elm springs is not a big city, and you would never know the surrundings where i was not unless you owend the properity.It was about 5:00 pm when i decided that it was time to try to get out of the creek and up on the bank,we had been walking about 2 hours in and out of the creek,and i decided that we could cover more ground on land before the sun went down. Now im about 5""9 and i guess the creek bed must have been 7 foot high,so i had my friend C hoist my foot up while i got my knee up high enough to pull myself up.Now the original plan was to yank him up once on top,but looking back on it now i realize how foolish this was because i would had to be extremely strong,because i probably could have just have barely have had touched his fingertips from way up there,but it didnt matter because what i saw when i got on my feet scared the crap out of me. The best way to explain it is just to tell you that my heart sunk,and I was just extremely stunned. When i got to my feet i was facing nothing but dense forests or woods,somthing caught the corner of my eye about lets say 75 to 100 yards off. Just before getting a good enough look at it i could tell it was big and brown,now ive never seen a bear before out in the wild, so imediately i started to panic. I took one quick look down and figured i would break every bone in my legs if i were to jump,for the water was only maybe a foot deep, with all this happening in what i would say just about 2 seconds,and keep in mind i havent even been able to warn my friend yet of what i thought to be a charging bear, i look back to see this thing or whatever it was walking extremely fast! It was then i noticed that this couldnt have been a bear. I got a clear look at it for just a second.Ill never forget what i saw that day.I remember its head, real long head,i guess you could say it was shaped like an egg, i just know it wasnt like a humans, the eyes must have been dark because i couldnt make them out,i didnt reall notice a nose ether, but there was so much hair, it was hard to make anything out except for the head.The biggest thing ill never forget is the height and weight of this thing.It was so big,legs coverd with hair, so massive,that had to have been the biggest part of this thing,so wide,had to have been as wide as a refrigerator.In comparing the legs to my legs i would have to say each leg made two sets of both of my legs.The shouldars were massive,could have been three feet or over from shoulder to shoulder.I also remember how it walked, besides taking extremely wide steps, it kind of walked with its front body slightly hunched over,i dont know, like some one with bad posture.The height of this thing was astronomical.The tip of its head was in the tree tops!!! now dont get me wrong, the trees werent that tall, but they were at least 8 to 9 feet where the limbs started to curl out and i remember the head was up there with them, there is no way i could have jumped and touched the limbs, they were just to high up.So i would put whatever it was that i seen to be 8 to 9 foot tall. And looking back on it now, the frame could have easily held 500- 600 pounds,the arms on it looked long, and they swayed back in forth heavily,i do not recall a odor,or anything like that,for this thing was out of my view in about id say 5 to 7 seconds. Now i realize when i was getting up the creek bed, i had spooked the thing,because it sure didnt waste no time getting gone into the woods.And i was not about to follow it.So with all this said,i asked my friend if he had heard anything,which he said no because of the running water,and i guess because he was down there,and i said man,i just seen a monster go into the woods! ofcourse my facial feachure was very solemn and serious,but he still didnt believe me.I repeated this several times to him,and i guess he could tell by looking at my face i was serious.He later said my face looked white. With his help i got down back into the creek,but it did take a few minutes,and even then i wound up falling back down in the creek,luckly i wasnt hurt,just wet and scared,the sun was setting,and we jogged and trampled our way back,never once daring to go back up on land. We talked and thought better of muterring such foolishness around for fear of people thinking were nuts. Truthfuly,i dont even think he beleives me.But putting myself in his shoes i probably wouldnt either.But it doesnt matter, i know what i saw, so in closing i have ruled out the possbility of a hoax,1. why would somebody be out in the middle of nowhere in this kind of costume? 2. how would they know we were coming?3.how do you account for the height and size, it was easily a foot taller than any human ive ever seen.It was so tall,you cant duplicate that kind of height.Ihope someone reads this and takes it seriously because i dont know if ill ever go into the woods again.",0.134143,0.000000,0.019502,0.000000,0.019846
"The following personal interview was conducted by Stan Courtney. The first name of the witness has been changed.Jennie -We lived out in the country on a farm. Some of it was hunting ground and some of it was left not hunting, like you weren't allowed to hunt on it. I would always walk through the non-hunting area. I would ride my four-wheeler part way and then walk across this area to go to a pond where I feed my fish. I drove my four-wheeler, parked it along the little dirt path, that was all in woods. I then walked out of the woods and onto the field that was like a corn field that had been cut down.Then, something screamed at me, like it was trying to scare me. I got so scared that I ducked down for a half hour and I thought, ""who is messing with me?"" I always wore camo, it wasn't hunting season at that point so I wasn't afraid to go on hunting ground.And the trees started shaking, like in a straight path. The tops of the trees were barely shaking but the middle of the trees were moving like something was slapping them or pushing against them. It just didn't make any sense that an animal would walk through and slap things. Stan - And how old were you?Jennie - I was nine years old and I was freaking out. I stood back up and walked to the fish ponds. Stan - And what time of day was this?Jennie - That was probably right before it got dark out. Because I went twice a day, right before it got light out in the morning and just before dark at night. And I carried a coffee can with me full of fish food. And I would shake it every time. It was like a rattle, and it was like someone knew I did that, but we didn't have neighbors. And nobody went out there unless they told us.I just remember getting yelled at, it was very high pitched, like a bunch of girls or a bunch of guys playing with me, it was like screaming from in the woods at me. And the tops of the trees were barely shaking but I was far enough out into the field to see the path that whatever or whoever it was I could see the trees moving as it passed through the woods. It was like ""I scared her now I am going to move on.""Stan - Did you have the impression it was hitting the trees?Jennie - I knew something was going through the trees, I assumed walking because the sound was coming from lower like it was screaming out and I couldn't explain it. It was so fast moving through the trees. All I know is that there were several trees in our woods and they were all moving one right after another. The leaves were moving on the very top of each tree and there was no wind and you could see it was a path. If it were a large animal it had to have its arms wrapped around the tree and shaking it, that is what I imagined.Stan - Anything else unusual happen at that same time?Jennie - Yes, I was at the pond, and I was looking at some fox holes along the embankment above the pond. I was climbing up the bank and was covered in mud. A big rock, larger than a bowling ball, just tumbled on the ground right next to me. And now that I think about it we've had other rocks thrown at us while at that embankment. Both of my sisters were with me, we were always getting rocks thrown at us at that embankment.Stan - And which direction did it come from?Jennie - Beyond the embankment, it landed right next to me, higher ground to lower ground. Stan - How much of the time did you think something else was out there?Jennie - Not every time I was out there but sometimes I thought something was following me or watching me.",0.131671,0.013964,0.000000,0.000000,0.000000
"It's taken me a long time to share this story (about 7 years). My family, friends and I use to ride atv's very often, usually on the weekends. I had the biggest 4 wheeler so I usually stayed in the very back in case someone got into trouble or hung up, there was about 15 atv's. One night we were going through a classified forest called Kramers woods, which is an old growth forest with a tight winding trail and massive trees. As we went around a wet, muddy turn I seen the atv in front of mine, lights reflecting off the water hit something behind a tree that I was approaching. It looked hairy but I wasn't sure. Well, as I approached the tree while turning the atv left the thing (which was on my right) and I locked eyes. I'm 44 now so I would have been 37 at the time. I'm 6' tall 250lbs and rigid, I'm an old country boy and fear no man and could match or best most in strength, but this thing was no man, it was massive and was barely hiding behind this massive tree. It was tight against the tree with both hands flat on it and its chest tight against its hands. what happens next is very strange to me, but I know they are more than just an animal now, when he and I looked at each other we both knew he goofed and got caught in among that chain of atv's, I was instantly terrified but then instantly knew he was as scared as I was and wanted this situation to end. It was like we were in each others head and came to an agreement to just go on. My wife was riding with me so I never reacted because she would have screamed bloody murder if she would have seen it, it was 3 feet from us... I knew it was a male, just knew. He was at least 8 1/2' tall, I was the same height on my 4 wheeler as I was standing, and I looked way up at it. It's funny how small you feel, this thing could have easily have grabbed me, my wife or entire atv and threw us, that's how powerful it looked. I cant put to words how powerful it looked. I have gone online several times to try and see what i saw. Seen the white one on youtube but this one looked more human, people report red triangulated eyes, smelly & long matted coat, this one was none of that. Its coat was brownish orange with 2-3"" straight hair. I could see greyish skin and muscle mass. there is no body builder on the planet with the size muscle mass this thing had. Not much of a neck but its head was huge, its eyes were 2-3 times bigger, but just like ours with whites and pupil. The whites of his eyes I'll never forget because his eyes were what I was stuck on the most, the rest of him was at a glance. His mouth was very wide (4-5"") with its lips pulled tight from fear (just like us). As I pulled away he took one step and vanished. I was amazed at how he was just gone, that quick.",0.131008,0.000000,0.000000,0.004552,0.013304
"I went to a favorite spot for the weekend to get some peace and quiet and hopefully a monster brown trout. Been going there for years fishing and camping. Not extremely remote but wild and beautiful none the less.Its actually an old home place that has long disappeared, nothing remains except the concrete foundation and a narrow overgrown road that was once the driveway.Was very cold the night before, mid 20s,but the sun was shining bright and it was starting to warm up. I made the relatively short hike down the old road bed to my spot by the river.I dropped my pack against a tree and went down the step embankment to the rivers edge to check out the water and soak up some sunshine to warm up.After a few minutes I heard what I thought was another fisherman or someone up above me where the old foundation was.I thought ""great,so much for peace and quiet, must be hardcore trout fisherman like me to be here in cold weather"". I walked back up the embankment expecting to see people,, what I saw in broad day light, probably 30-40yds directly in front of me was what I can only describe as what appeared to be a fully erect gorilla, except his face didn't protrude out,it was flatter,and the hair was the color of a squirrel, browns, black, silver & a slight tint of red when the sun reflected off him.He was standing there, head slightly tilted back, squinting his eyes,soaking up the sunshine, just as I was.I couldn't comprehend what I was looking at at first. My first thought was ""what am I looking at?"".. Second thought was"" God,look how big he is"". It was incredibly muscular, at least 4ft wide at shoulders, 7-8ft in height,massive animal. The skin was a dark grayish color around the face, head was extremely pointed, like a dunce hat almost, no neck,large pectorals visible under the hair. Eye color was like a shade of amber. It was squinting in the bright sun, soaking up some warmth. He didn't really see me at first I don't think.I did kinda pop up over the embankment, I think he knew I was there but didn't really know where .He could have been trying to smell me. I watched this massive beautiful animal standing there in the bright sun for probably 15-20 seconds.I had a feeling of amazement, wonder & actually privileged .I knew that what I was looking at was truly one of Gods immaculate creations. It was beautiful, big and scary as hell, but beautiful. Hair was clean, looked groomed almost. Very healthy animal. After 15-20 seconds of seeing this I actually stepped up over the embankment all the way in view and I said out loud something like ""hey big boy,ain't you a pretty thing!"" That's when it finally saw me. It dropped its head down & froze. I could tell I surprised it. It wouldn't make eye contact. I could see its whole posture change. It became very shy& acted like it was hoping that as long as it didn't look directly at me I wouldn't see him. It was like a bashful child.I took a few steps toward it and said ""hey big boy"" ,or something to that effect, and that was enough for him. He turned to his left and he was out of there. I could here the limbs & leaves rustling as he made his getaway. He moved like a deer going up the mountain, fast fast. Didn't see him but I could hear him. After probably 5 seconds the rustling stopped and it was totally silent. Then I clearly hear what sounded like someone doing a drum roll with wooden sticks on a wooden table. Three"" brat-tat-tat-tat's,very quick, very clear, definitely not a woodpecker. No doubt in my mind it was him. He had traversed the mountain side and was on the ridgeline above me in no more than 5-10 seconds. Unbelievable. I never felt threatened, never felt afraid. Was a feeling of privilege & respect. I set up camp and stayed at that spot for the night and all the next day but never saw it again. 'spent most of my life in those mountains walking feeder creeks,backpacking, fishing for brook trout.Thought I knew all there was to know.That day changed my life. Made me not only question my sanity & eye sight (I'm not crazy & have 20/20 vision), but my understanding of the natural world .Ive been back several times but haven't seen him.",0.128392,0.018751,0.000000,0.015389,0.000000
"I'm reporting this approximately 10 years after the fact. My wife had asked me to re-tell my story to her this morning and that is what prompted us to stumble across this web site. Late Fall 1997 (or possibly 1996). I was coming out of my apartment's laundry facility, which is a seperate building not connected to the actual apartment buildings. I must have heard something because I looked to my right and there it was, about 20 feet away. The creture was about 5.5 to 6 feet tall. I'd guess it was aroung 200 lbs. There was no odor and it made no sound. I saw it as it was rooting through a garbage dumpster. In a matter of seconds my brain went through a kind of checklist to try to rationalize what I was seeing. But it was not a bear, it was too slim in relation to its height. It also moved in a way that was very human. It was nighttime but there was a parking lot light on overhead. I could see hair that was brown and about 4-7 inches in length. Unfortunately, primative survival instict and adreneline took over. I bolted toward my apartment, running so fast that I lost my footing as I was going through a dip in the terrain. I landed arms first and slid across a gravel covered road. As I got to my feet I looked over my shoulder to see that it too had run and was about 200 feet from where we had met.I'm certain it was not an animal that is commonly known to people. It was humanoid in appearence and movement. The way it stopped what it was doing, turned its head and looked at me was freaky. Because of shadows I couldn't get a look at its face. The incident didn't change my life or anything. I hardly ever think about it...unless I look at my elbow. I scraped it really bad when I fell running, and have a scar to this day. When I first told this story to my wife she was very taken aback because she knows that I am a very rational and analytic person.",0.127687,0.003503,0.000000,0.028534,0.000000
...,...,...,...,...,...
"We were coming home from our grandson's baseball game on State Route 39 west of Nashville, Ohio. After passing State Route 179 and driving a short distance, something came out of a wooded area into a bean field and started to cross the road. The three of us saw it on the south side of the road just as it finished crossing the road. It turned and looked at us as we were passing and we could see it proceed up the bank into a wooded area. There was another car in front of us that slowed down obviously since they saw the same thing.",0.000000,0.010333,0.000000,0.117843,0.000000
"It was a loud piercing sound, that lasted approximately 22 minutes. These sounds were at 5 to 6 second intervals which lasted about 4 to 5 seconds each. At first there was one distinct sound, (screaming) but then there was an additional ""voice"" that was louder and screamed longer.Each time the sounds would be heard, neighboring dogs would bark; then when the sounds stopped so did the barking. At the end of the 22 minutes (this occurred from 10:04 p.m. to 10:26 p.m.) it became extremely quiet and the barking also stopped. I have heard other sounds that weren't near as ""scary"" as these sounds were. These were almost like a Halloween movie sound!Talked to the owners of the dogs, and they said their dogs were petrified after this incident. The dogs refused to come outside after hearing these sounds.",0.000000,0.128469,0.000000,0.000000,0.037481
"This is not a sighting, but I believe I may have heard some of these creatures near my home last night. My husband and I recently moved to a remote area of Fairfield County, SC near the Blair community. The house we are renting has been vacant for quite some time and is near a thickly wooded area; our house backs up to the woods. We finished moving in last Sunday. There are no other homes in sight, though there are a few down the road from us. We drive 23 miles to the nearest town for shopping.Last night (Monday, May 8) my husband and I were half watching TV, half dozing when it occurred to me the sounds I was hearing outside the back window were getting strange. At first in the back of my mind I thought it was an owl; then I thought it must be the noisiest owl on record. We are both hard of hearing and had the TV turned up pretty loud and the windows closed. Then I realized it was more than one, and they were calling to one another, and seemed to be getting closer.I became alarmed around 11PM when one of our two dogs started barking in response to the noise. They are on our enclosed back porch and we keep the door to the outside open so they can go out to a small enclosed yard area when they need to. I went out to the porch and turned on the light, and they were very nervous and happy to see me. After I calmed them down I stayed on the porch and listened in earnest. What I heard sounded like a rhythmic hoo hoo hoo HOOOOOO, but unlike owl or dove sounds, it was very deep and loud, not at all birdlike. My husband and I could not identify it, we kept saying ""is that coyote? wolf?"" and as soon as we would say it we would just shake our heads. We both grew up and camped, hiked and traveled extensively in the West and know what coyote and wolf sound like. We discerned there were at least three distinct voices fanned out - to the south of our house, directly behind, and to the north, all to the west (wooded) side. Upon listening at length and discerning a pattern, which seemed to relate to our presence there somehow, I was concerned enough to lock the dogs up on the porch and turn on all the lights to the outside. As the calling continued to move closer, the dogs cowered in the far corner of the porch. We then realized our indoor cats had been missing all evening. We found them up in our attic underneath boards we leaned against the wall in preparation for adding shelves there. When we brought them down they ran right back up there and we didn't see them until morning.Also, the usual night sounds were absent. Nothing else made a sound while this went on, and it lasted for at least two hours. Towards the end, another call was so close I nearly panicked, and this one was making rapid hoo HOO noises interspersed with almost talking? sounds, while moving very quickly from the south to the north. The others were answering rapid-fire. I can't describe it any better than that, other than to say it wasn't yips or croaking or any of the normal animal sounds we hear out here, but very loud, deep and rapid and most certainly they were communicating with each other. It was scary and intimidating, and if I had been in a camping situation I would have run for my life.My husband is a rural mail carrier out here and knows everyone. He talked to several people today and they all said they have heard the sounds before but they don't know what they are either, most think they must be wild dogs. They can't be serious, if they have really listened. But it doesn't surprise me. I figure it must be this way with most people when confronted with something unknown and unexplainable, to come up with just about any explanation at hand and dismiss it. I decided to get on the Internet and see if I could find recordings of wild dogs and birds and find something similar. After a day of searching I found the recordings of wolf and coyote here and the whooping and calls we heard most closely resembled the suspected bigfoot whoops on this site.",0.000000,0.106527,0.000183,0.009274,0.110413
"(#1) late July - early August 1968 (#2) 11 August 1968, 8:30 P.M. (#3) 12 - 18 August 1968(#1) Residents in the area reported dogs barking louding and ""carrying on"" at night. (#2) Tim Bullock, 22, of West Frankfort, and his girlfriend, Barbara Smith, 17, of Carterville, sighted a ""creature"" while riding northeast of Chittyville. Bullock reported the creature threw dirt at them through the window. Smith saw it and started screaming. She described it as ""huge"" - about 10 feet tall, with a head as large as a steering wheel. It appeared to be black, with a round ""hairy"" face. They left the area and reported the incident to police. Bullock returned the next day and found a depression in the grass, as if a large animal had rested or slept there. (#3) Bullock said a carhop at a Herrin drive-in restaurant told him that someone came into the drive-in ""hysterical"" and said they had seen the creature. One other person reported they saw ""it"" near Rt. 148.",0.000000,0.001616,0.031945,0.060009,0.071783


In [40]:
def sort_latent_and_collect_articles(n, articles):
    headlines = {}
    for i in range(n):
        sorted_df = W.sort_values(f'latent_topic_{i}', ascending=False)
        headlines[f'latent_topic_{i}'] = list(sorted_df.index[:articles])
    return headlines

In [52]:
from collections import Counter
from nltk.corpus import stopwords

def get_popular_words(topics):
    latent_topics = {}
    for key, value in topics.items():
        words = ' '.join(topics[key]).lower().split()
        stop_words = set(stopwords.words('english'))
        stop_words.add('like')
        filtered_words = []
        for word in words:
            if word not in stop_words:
                filtered_words.append(word)
        word_count = Counter(filtered_words)
        first, second, third, fourth, fifth = word_count.most_common()[0][0], word_count.most_common()[1][0], word_count.most_common()[2][0], word_count.most_common()[3][0], word_count.most_common()[4][0]
        latent_topics[key] = [first, second, third, fourth, fifth]
    return latent_topics

In [53]:
top_obs = sort_latent_and_collect_articles(5, 30)

In [54]:
top_words = get_popular_words(top_obs)

In [55]:
top_words

{'latent_topic_0': ['would', 'see', 'could', 'back', 'one'],
 'latent_topic_1': ['heard', 'sound', 'sounds', 'sounded', 'could'],
 'latent_topic_2': ['tracks', 'area', 'found', 'track', 'inches'],
 'latent_topic_3': ['road', 'creature', 'side', 'saw', 'road.'],
 'latent_topic_4': ['back', 'house', 'said', 'one', 'went']}

## Naive Bayes
- Supervised learning method to assign class probabilities to a document
- See 818PM NLP-pipeline-programming-net-example for using sklearn Naive Bayes classifier. See also 818PM lecture on text classification. Solutions to assignment contain a number of useful naive Bayes python functions

In [102]:
counties = pd.read_csv('data/US_FIPS_Codes.csv', header=1)
counties

In [32]:
sightings_df.observations

0       I and two of my friends were bored one night s...
1       Me and a couple of friends had been bored, whe...
2       My hunting buddy and I were sitting on a ridge...
3       To whom it may concern, I am a commercial fish...
4       This is an excerpt from a short story I wrote ...
                              ...                        
4406    Dear BFRO Persons,     My family and I were st...
4407        On 6-94 we visited Yellowstone National Pa...
4408    I was delivering newspapers on a lightly snowi...
4409    I was 19 years old at that time. I had just go...
4410    I am not sure how relevant this report will be...
Name: observations, Length: 4411, dtype: object