#### NLP | Model

# Coronavirus Tweets: April 2020<a id='top'></a> 

### Natural Language Processing Stepwise Analysis<a id='top'></a> 

1. [Research Question](#1)<br/>
2. [DataFrames](#2) <br/>
3. [Exploratory Data Analysis](#3)<br/>
   [Data Summary](#31)<br/>
4. [Preprocessing](#4)<br/>
   [Clean Text](#41)<br/>
   [Stop Words](#42)<br/>
   [Stemming](#43)<br/>
5. [Vectorizer](#5)<br/>
6. [Topic Modeling/Dimensionality Reduction](#6)<br/>
7. [Sentiment Analysis](#7)<br/>
8. [Classification](#8) <br/>
    1 [Naive Bayes: Gaussian](#81)<br/>
    2 [Naive Bayes: Multinomial](#81)<br/>

In [1]:
import glob 
import nltk
import numpy as np
import pandas as pd
import pickle
import re
import spacy
import en_core_web_sm
import string
pd.set_option('display.max_colwidth', None)

from nltk.stem import PorterStemmer, SnowballStemmer, LancasterStemmer
from nltk.stem import WordNetLemmatizer
from nltk.tag import pos_tag #?
from nltk.tokenize import word_tokenize #?
from nltk.util import ngrams #?
from sklearn.decomposition import PCA, NMF
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB, GaussianNB 
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer 


# 1 | Research Design<a id='1'></a> 

* **Research Question:** What were Americans tweeting about coronavirus and COVID-19 in April 2020? 
* **Impact Hypothesis:** Inform CDC's communication strategy for future pandemics. 
* **Data source:** Coronavirus COVID-19 Tweets [early](https://www.kaggle.com/datasets/smid80/coronavirus-covid19-tweets-early-april) and [late](https://www.kaggle.com/datasets/smid80/coronavirus-covid19-tweets-late-april) April, n=138,796


[back to top](#top)

# 2 | [DataFrames](https://github.com/slp22/nlp-project/blob/main/nlp-coronavirus-tweets-mvp.ipynb)<a id='2'></a> 

In [2]:
# load tweets selected for mvp 
tweets_df = pd.read_csv('./raw_data/tweet_df.csv', low_memory=False)


In [3]:
tweets_df.head(2)

Unnamed: 0,created_at,screen_name,text,country_code,account_lang,verified,lang
0,2020-04-06T00:00:05Z,WFMGINC,....#SUNDAYFUNDAY #coronavirus style #vino cheers 🍷 https://t.co/SrymChBkq2,US,,False,en
1,2020-04-06T00:00:14Z,jpomietlasz,"This pandemic has confirmed my worst fears, most people don’t know how to make entertaining videos. #Covid_19 #SinceIveBeenQuarantined #AmericasUnfunniestVideos #WrestleMania #tonyaharding",US,,False,en


[back to top](#top)

# 3 | Exploratory Data Analysis<a id='3'></a> 

##### Note: Full EDA part of [MVP](https://github.com/slp22/nlp-project/blob/main/nlp-coronavirus-tweets-mvp.ipynb).

### Data Summary<a id='31'></a> 

In [4]:
tweets_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 138796 entries, 0 to 138795
Data columns (total 7 columns):
 #   Column        Non-Null Count   Dtype 
---  ------        --------------   ----- 
 0   created_at    138796 non-null  object
 1   screen_name   138790 non-null  object
 2   text          138789 non-null  object
 3   country_code  138789 non-null  object
 4   account_lang  2 non-null       object
 5   verified      138787 non-null  object
 6   lang          138787 non-null  object
dtypes: object(7)
memory usage: 7.4+ MB


#### All data types are objects. The column `account_lang` is mostly null values, will drop in next step. 

[back to top](#top)

## 4 | Preprocessing<a id='4'></a>  

In [5]:
# isolate text, drop other columns
# save as text_df

text_df = tweets_df.drop(columns=['created_at', 
                                  'screen_name', 
                                  'country_code',
                                  'account_lang', 
                                  'verified', 
                                  'lang'])
print(type(text_df))
text_df.head(2)


<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,text
0,....#SUNDAYFUNDAY #coronavirus style #vino cheers 🍷 https://t.co/SrymChBkq2
1,"This pandemic has confirmed my worst fears, most people don’t know how to make entertaining videos. #Covid_19 #SinceIveBeenQuarantined #AmericasUnfunniestVideos #WrestleMania #tonyaharding"


### 4.1 Text Cleaning  <a id='41'></a>  

In [6]:
# remove numbers, punctuation, and capital letters
alphanumeric = lambda x: re.sub('\w*\d\w*',' ', str(x))
punc_lower = lambda x: re.sub('[%s]' % re.escape(string.punctuation), ' ', x.lower())
                          
text_df['text'] = text_df.text.map(alphanumeric).map(punc_lower)
text_df.head(2)


Unnamed: 0,text
0,sundayfunday coronavirus style vino cheers 🍷 https t co
1,this pandemic has confirmed my worst fears most people don’t know how to make entertaining videos sinceivebeenquarantined americasunfunniestvideos wrestlemania tonyaharding


In [7]:
# remove emojis
text_df = text_df.astype(str).apply(lambda x: x.str.encode('ascii', 'ignore').str.decode('ascii'))


In [8]:
text_df.head(10)


Unnamed: 0,text
0,sundayfunday coronavirus style vino cheers https t co
1,this pandemic has confirmed my worst fears most people dont know how to make entertaining videos sinceivebeenquarantined americasunfunniestvideos wrestlemania tonyaharding
2,is this true \nhttps t co \n ecuadorenemergencia coronaviruspandemic
3,many us thought it was wuhan province but it could never be us then it was italy but it could never be us now it is here one newyorker died every minutes from over this weekend absolutely devastating \n\nhttps t co
4,ah coronavirus humor https t co
5,miami and south florida in general are also staying home \n\nnorth florida thinks their immune to https t co oipfvqymvc
6,how can president trump be flip about a question about continuity of power and contracting the coronavirus when i am around him i dont breathe and we wonder why other americans question and defy the quarantine coronavirustaskforce
7,this is our most desperate hour help us obi wan kenobi you re our only hope starwars weneedaleader
8,face your fears \n business love coronavirus nyc ny la california zoom happy prayer https t co
9,first defense nasal screen coming up on abcsharktank during the outbreak and all of a sudden the guy doesnt seem crazy \n\nsharks redeemed themselves w mega offers \n\npost corona airing mcuban thesharkdaymond kevinolearytv is gonna make a killing replaces mask need


In [9]:
text_df.tail(10)


Unnamed: 0,text
138786,the nfl draft is here this is something everyone needs right now to get everyones mind off of the coronoavirus
138787,so has now shifted from untested medication to telling us that maybe injecting disinfectant or somehow getting uv rays inside our bodies will cure the virus so tide pods are next coronavirus trumppressconference trumpisanidiot trumppressbriefing https t co wgavormglv
138788,discussing with my dad told him what i learned about typhoidmary on myfavmurder he said he was proud my obsession is helping me deal with my anxiety in regards to our newnormal karenkilgariff ghardstark ssdgm
138789,talkingtaiwan guest emily chen talked about how she manages having her kids at home during this coronavirus pandemic listen until the end for a special offering https t co coronavirus coronavirusoutbreak homeschool coronavirus interview emily chen
138790,report state dept confirms china iran and russia are working together to blame us for https t co
138791,we are thankful triadcleanhome thankful thankfulthursday americaworkstogether smallbusiness community communitylove shoplocal shopsmall highpointnc piedmonttriad triadnc thankyou pandemic humble commercialcleaning smallbusinessowner https t co fjimozmxzo
138792,covidart mixedmedia sheep cartoon diary sketchbookjournal sketchbook coronavirus uncertainty hermit isolation flattenthecurve stopthespread sunshine practice sparetire san diego california https t co
138793,sitting in the office after another hour day and feeling thankful for all those who are sharing life friendship and leadership with me during this crazy time in history friends workfamily leadership
138794,realdonaldtrump ondinebio has been using light to kill viruses and mrsa for years testing happening now in canada solution ready now
138795,starting in just a moment live on facebook or on wxxi tv am follow us here for live updates forum https t co


[back to top](#top)

In [10]:
# save preprocessed corpus as corpus_df
corpus_df = text_df
corpus_df.to_pickle('./raw_data/corpus_df.pkl')
corpus_df.to_csv(r'//Users/sandraparedes/Documents/GitHub/metis_dsml/05_nlp/g00-nlp-project/raw_data/corpus_df.csv', index=False)


### 4.2 Stop Words  <a id='42'></a>  

In [11]:
# custom stop words 
stopwords = nltk.corpus.stopwords.words('english')
new_words = ['also',
             'amp', 
             'corona', 
             'coronavirus', 
             'https',
             't',
             'co',
             'people',
             'us',
             'pandemic', 
             'covid',
             'pandemic',
             'day',
             'coronaviruspandemic',
             'get',
             'like',
            'im',
            'go']
stopwords.extend(new_words)
print(stopwords)


['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', '

[back to top](#top)

### 4.3 Stemming  <a id='43'></a>  

In [33]:
# stemmer
stemmer = SnowballStemmer("english")

def prep(word):
#     print('word in prep is :', word)
    if word in stopwords:
        return None
    elif stemmer is None:
        return word
    else:
        return stemmer.stem(word)


[back to top](#top)

## 5 | Vectorizer<a id='5'></a>  

In [35]:
# load preprocessed corpus from step 3 
df = pd.read_pickle("./raw_data/corpus_df.pkl")  
df.head(2)

Unnamed: 0,text
0,sundayfunday coronavirus style vino cheers https t co
1,this pandemic has confirmed my worst fears most people dont know how to make entertaining videos sinceivebeenquarantined americasunfunniestvideos wrestlemania tonyaharding


In [36]:
corpus = df.text
print('corpus type:', type(corpus))
print(corpus.head(2))

corpus type: <class 'pandas.core.series.Series'>
0                                                                                                                            sundayfunday  coronavirus style  vino cheers  https   t co  
1    this pandemic has confirmed my worst fears  most people dont know how to make entertaining videos      sinceivebeenquarantined  americasunfunniestvideos  wrestlemania  tonyaharding
Name: text, dtype: object


### Term Frequency Inverse Document Frequency (TF-IDF)

In [37]:
tf_vectorizer = TfidfVectorizer(stop_words=stopwords, 
                                min_df=0.01, 
                                max_df=.95, 
                                preprocessor=prep)
tf_vectorizer

TfidfVectorizer(max_df=0.95, min_df=0.01,
                preprocessor=<function prep at 0x7fa6fd2421f0>,
                stop_words=['i', 'me', 'my', 'myself', 'we', 'our', 'ours',
                            'ourselves', 'you', "you're", "you've", "you'll",
                            "you'd", 'your', 'yours', 'yourself', 'yourselves',
                            'he', 'him', 'his', 'himself', 'she', "she's",
                            'her', 'hers', 'herself', 'it', "it's", 'its',
                            'itself', ...])

In [38]:
# document-term matrix with TF-IDF
tf_doc_term_mtx = tf_vectorizer.fit_transform(corpus)
type(tf_doc_term_mtx)

scipy.sparse._csr.csr_matrix

In [18]:
tf_doc_term_df = pd.DataFrame(tf_doc_term_mtx.toarray(), 
                              columns=tf_vectorizer.get_feature_names_out())
tf_doc_term_df.head(2)

Unnamed: 0,america,americans,another,anyone,april,around,away,back,best,better,...,week,weeks,well,work,workers,working,world,would,year,york
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [39]:
# double check that domain specfic words were omitted 
print('https' in tf_vectorizer.get_feature_names_out())
print('corona' in tf_vectorizer.get_feature_names_out())
print('covid' in tf_vectorizer.get_feature_names_out())
print('t' in tf_vectorizer.get_feature_names_out())
print('amp' in tf_vectorizer.get_feature_names_out())


False
False
False
False
False


[back to top](#top)

## 6 | Topic Modeling/Dimensionality Reduction <a id='6'></a>  

### Non-Negative Matrix Factorization (NMF)

In [20]:
# V     visible variables     doc_term             input (corpus matrix)
# W     weights               doc_topic            feature set
# H     hidden variables      topic_term           coefficients

In [21]:
V = tf_doc_term_mtx
V.shape

(138796, 167)

In [22]:
# W matrix = feature set & weights

nmf = NMF(n_components=3, init=None)
W = nmf.fit_transform(V).round(3)
print(type(W))
W.shape

<class 'numpy.ndarray'>


(138796, 3)

In [23]:
# H matrix = hidden variables & coefficients 

H = pd.DataFrame(nmf.components_.round(2),
                 index = ['c1', 
                          'c2',
                          'c3'] #,
#                           'c4']#,, 
#                           'c5']
                 ,
                 columns = tf_vectorizer.get_feature_names_out())
print('H.shape:',  H.shape)
H.T.style.background_gradient(cmap='Blues')


H.shape: (3, 167)


Unnamed: 0,c1,c2,c3
america,0.32,0.0,0.01
americans,0.42,0.0,0.0
another,0.38,0.05,0.02
anyone,0.39,0.02,0.0
april,0.26,0.03,0.15
around,0.34,0.02,0.02
away,0.3,0.02,0.0
back,0.9,0.04,0.0
best,0.36,0.05,0.01
better,0.39,0.02,0.01


[back to top](#top)

In [24]:
# function to display topics
def display_topics(model, feature_names, no_top_words, topic_names=None):
    for ix, topic in enumerate(model.components_):
        if not topic_names or not topic_names[ix]:
            print("\nTopic ", ix)
        else:
            print("\nTopic: '",topic_names[ix],"'")
        print(", ".join([feature_names[i]
                        for i in topic.argsort()[:-no_top_words - 1:-1]]))


#### Top terms by topic:

In [40]:
display_topics(nmf, tf_vectorizer.get_feature_names_out(), 10)


Topic  0
time, trump, one, today, realdonaldtrump, home, need, help, thank, stay

Topic  1
quarantine, stayhome, quarantinelife, socialdistancing, stayathome, staysafe, lockdown, california, stayhomestaysafe, love

Topic  2
new, york, cases, deaths, city, nyc, state, county, positive, today


   [Stop Words](#42)<br/>


[back to top](#top)

## 7 | Sentiment Analysis<a id='7'></a>  

### Vader Sentiment

In [26]:
analyzer = SentimentIntensityAnalyzer() 
sentiment = analyzer.polarity_scores(text_df).get('compound')
print('compound_score', sentiment)

compound_score 0.0


In [27]:
text_df['compound_score'] = text_df.text.map(analyzer.polarity_scores).map(lambda x: x.get('compound'))
text_df.head(2)


Unnamed: 0,text,compound_score
0,sundayfunday coronavirus style vino cheers https t co,0.4767
1,this pandemic has confirmed my worst fears most people dont know how to make entertaining videos sinceivebeenquarantined americasunfunniestvideos wrestlemania tonyaharding,-0.6124


In [28]:
# map sentiment column to positive or negative based on compound score
text_df['sentiment'] = text_df['compound_score'].apply(lambda x: 'positive' if x >=0 else 'negative')
text_df.head(2)

Unnamed: 0,text,compound_score,sentiment
0,sundayfunday coronavirus style vino cheers https t co,0.4767,positive
1,this pandemic has confirmed my worst fears most people dont know how to make entertaining videos sinceivebeenquarantined americasunfunniestvideos wrestlemania tonyaharding,-0.6124,negative


[back to top](#top)

## 8 | Classification<a id='8'></a>  

### Can a NaiveBayes model predict the sentiment of the tweet?

In [29]:
# split feature and target
X = text_df.text
y = text_df.sentiment

# split train/test
X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    y, 
                                                    test_size=0.3, 
                                                    random_state=42)

# fit/transform to TF-IDF vectorizer from step 5
X_train_tf = tf_vectorizer.fit_transform(X_train)
X_test_tf  = tf_vectorizer.transform(X_test)


word in prep is : and thank you to grn dph for making this testing sote accessible for the deaf by providing american sign language interpreters   accessibilitywin  
 deaf  asl  equalaccess  ada  accessibility  novelcoronavirus  coronavirus    https   t co  
word in prep is : have you been researching the    pandemic  globalhealth and  inequalities  send your work to  professorhuish  cergler  nicholegeorgeou  edi journal https   t co  
word in prep is :  scottadamssays new york citys far more crowded living conditionsincluding commutes to and from work on crowded buses  commuter trains and subwaysexplains the high infection and death rates there from    
word in prep is : would love to participate in random test for anti body     bayarealockdown
word in prep is :  suenbcboston may i also add to your list that states determine their own electoral processes  absentee ballots in  ma currently have narrow restrictions  and as  massago has said  we should move to  votebymail during      mas

word in prep is :  davidbegnaud hateful discriminatory are separate possible issues  but i don t fault her ignorance  i fault  circlekstores c level execs for not educating their frontline managers  amp  staff abt the  coronavirus  coronaviruspandemic  and for not initiating company wide safety directives measures 
word in prep is : another day on the block feeding our unhoused  neighborhood homies and senior neighbors      macarthur  townbiz      unspkn kraft   east oakland  ca https   t co  
word in prep is : is it saturday  or is it wednesday  does it even matter  

 quarantinelife  coronaviruspandemic https   t co izzicvafke
word in prep is :   realdonaldtrump watched the  vp steal the show with his heartfelt anecdote about healthcare workers holding up iphones so families can be with  coronavirus victims at the end  and he cant handle it  so he has to grab the mike back and suck the air out of the room again 
word in prep is : i am so proud of my fellow wisconsinite brothers and s

word in prep is : a blessed day 
 healed  dailyblessings  manymoretocome 
 repost  baptisthealthsf

hospital staff members cheer as a couple leaves the hospital after beating     married for   years  doriela and carlos https   t co eehhpbupbr
word in prep is : mayor  sylvesterturner announces two additional    deaths and   new positive cases for a total of   positive cases and   deaths  https   t co   https   t co xfqytealvr
word in prep is : agree  need to go slow to go fast    https   t co  
word in prep is : covid   essentials 

 northside  cleaningsupplies  toiletpaper  window  shelf     livinginthecoronaera     rva  igersrva  neighborhoodfinds  staysafe  essentials   northside  richmond  virginia https   t co  
word in prep is : wow  asheville s richest leave   people high  amp  dry during  coronaviruspandemic   avlnews  avlrealnews https   t co  
word in prep is :  indiana    update  its so sad  https   t co  
word in prep is : i want to do that in ur face
motherfucker    
 coron

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



word in prep is : opinion   coronavirus makes inequality even deadlier  new york city won t accept it  https   t co  
word in prep is : sunday lunch a team  nyphospital lmh  nypheroes     gratitude  thanks  grit  joanhalpern  j mmejia     dagy v https   t co  
word in prep is : praying tonight for carlos  that his lungs will prevail over     likewise  for all of  ny first responders  faith  health and protection  cnn  drsanjaygupta  chriscuomo
word in prep is : my new chart book will have one chart 

wait for it    



  
 oott
word in prep is :   us fda gives its first emergency approval to a do it yourself home coronavirus test

https   t co       coronavirus
word in prep is : this is so weird  everything is weird  its weird to be thankful that my disease is severe enough to still get my  remicade infusion despite all non essential appointments being canceled  essentiahealth   curearthritis  ankylosingspondylitis     medx  stillkickin  allbirds https   t co  
word in prep is : bears 

word in prep is : this is how my birthday would have went down if it wasn t for the shelter in place 

 coronavirus  birthday     lockdownextension  covid  staysafe    https   t co  
word in prep is :  noaprilfools    

 responsibility  yourmove  letstalk about the real  fools  criminal  murders  liars of this  pandemic 
 realdonaldtrump
 senatemajldr
 rondesantisfl 
 republicansenablingtrump

murder is the case they should all face 

 itstimetodecide  whoweare  passiton https   t co cpvcaslvwk https   t co  
word in prep is : how else do you think they can control the population   coronavirus     coronahoax
word in prep is : im a victim of    and im happy to say after   weeks of suffering im finally getting better   thankyougod
word in prep is : please complete this survey so we all learn more    
 yourvoicematters  wereallinthistogether  girlceppa  maraantonoff  brendonstilesmd  taylorriall https   t co  
word in prep is : amazing efforts by our ob colleagues and cnms to facilitate p

word in prep is :     negligent    ignorant  combovercaligula will kill us all 

https   t co  
word in prep is : urban farming update  added another head of romaine and more carrots   urbanfarming     urbangarden https   t co  
word in prep is :    srry   bother    has caused me to loose my job  i cant pay my rent and my car is scheduled to be repossessed  i take care of my   old sister who has   kids under the age of    i dont know what else to do they need diapers food  anything helps 
word in prep is :    of britons between the ages of   and   said they were finding it hard to remain upbeat 

most people are not okay   and thats okay      https   t co  
word in prep is : domestic violence  substance abuse  mental health challenges   amp  anti asian racism are all outcomes of     amp  related mitigation efforts   they are deeply impacting communities  amp  are just as salient as the virus itself  short term  amp  long term funding needed  https   t co  
word in prep is : this need t

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



word in prep is : kimfir  mistress of the  ocarina  performs  thislittlelightofmine as a dedication to all of you who are enduring  coronavirus    pandemic live at roosevelt center in  greenbelt   maryland   tiktok https   t co  
word in prep is : could dental labs help address a shortage of    testing supplies  thats the request of  secnorman  find out what it would take tonight on     https   t co  
word in prep is :  repost longbeachcity      

our community continues to prove what  lbstrong is all about  

the    relief fund has exceeded    million  of which      has already been distributed to   nonprofits https   t co  
word in prep is : as of     pm today there are     cases of    in missouri  

  of those are in the kc metro  

  
word in prep is : qrtnd  quarantine     hiphop  mke  beats https   t co  
word in prep is :  repdancrenshaw how about if we americans sue  realdonaldtrump  vp for ignoring  coronavirus  refusing to use the obama pandemic playbook  then failing to act 

word in prep is : in response to  coronavirus and the solitude weve been forced into  i started a collection of my dry humorous  to me  thoughts  lol https   t co  
word in prep is : this item is available for sale in my ebay store along with many other items including backpacks  belt bags  and travel duffles  entrepreneurs  quarentine  corona  coronavirus        family  safe  love  supportsmallbusiness  spring  pandemic
https   t co  
word in prep is : members of the public were able to be tested for the  coronavirus using a drive up method tuesday  april       on the campus of the old ps dupont high school in wilmington  de    monsterphotoiso    dronephotography  dronestagram  djiglobal  photography  mavicair  wilmde https   t co  
word in prep is : happy friday    going live at      hit me up if you have request    we on this together    coronavirus  djstreaming  djlife  stuckinside  turnup  fridayvibes  djmademan   los angeles  california https   t co  
word in prep is : its going 

word in prep is : heartbreaking interview with the wife of the lee county man who lost his life to     

stay home and stay safe everyone  
   

https   t co  
word in prep is : a lot of talk about  respirators and  masks   but nary a mention of psychologists and  mentalhealth to combat the  mentalillness that will be a factor for generations long after  coronavirus is gone   wuhanvirus      
word in prep is :  briscoecain people dead from    cant work or run businesses 
word in prep is : working from home everyday be like     mood    life  https   t co  
word in prep is : great to see  reenaninan anchoring cbsn from her home studio to flatten the curve against the  coronavirus   cbsnewspath  natalieabrand as usual file today s report from the white house  stay safe reena and natalie   cbsnlive  cbsnews  alwayson https   t co khztallngd https   t co enhladwqrt
word in prep is : back to back zoom calls on the ppp  amp  eidl sba loans   if you have questions or need assistance  reach out

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



In [30]:
tfidf = pd.DataFrame(X_train_tf.toarray(), columns=tf_vectorizer.get_feature_names_out())
tfidf.head(2)

Unnamed: 0,america,americans,another,anyone,april,around,back,best,better,business,...,week,weeks,well,work,workers,working,world,would,year,york
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0


### 8.1 Naive Bayes: Gaussian<a id='81'></a> 

In [31]:
gaus = GaussianNB()
gaus.fit(X_train_tf.toarray(), y_train)
sentiment_score = gaus.score(X_test_tf.toarray(), y_test)
sentiment_score

0.6029443550517544

[back to top](#top)

### 8.2 Naive Bayes: Multinomial<a id='82'></a> 

In [32]:
multi = MultinomialNB()
multi.fit(X_train_tf.toarray(), y_train)
sentiment_score = multi.score(X_test_tf.toarray(), y_test)

sentiment_score

0.7346237901966906

#### [comment ]

[back to top](#top)

[back to top](#top)

[back to top](#top)