### Sample text analysis using spacy

Spacy is a library that can assist you in doing linguistic analyses. 

To install and use the Englis-language version of spacy you should run these commands in your virtual environment:
`pip3 install spacy`
`python3 -m spacy download en_core_web_sm`
We will be importing the `text.txt` file in our `data` folder. It contains a sample article about a very special [cat](https://www.buzzfeednews.com/article/juliareinstein/this-thicc-lazy-high-maintenance-incredibly-well-hydrated/).

In [1]:
!pip3 install spacy 
import spacy
import pandas as pd
import numpy as np

[33mYou are using pip version 18.1, however version 19.3.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [2]:
Trump_text=pd.read_csv('../data/Tweets_Trump_Dec_2019.csv')
df = pd.DataFrame(Trump_text)
Trump_text.head()

Unnamed: 0,screen_name,created_at,text,retweet_count,user_description,source,lang,id
0,realDonaldTrump,2019-12-18T13:27:27Z,Sean is a great patriot and will do a fantasti...,141,45th President of the United States of America🇺🇸,Twitter for iPhone,en,1207291285683367936
1,realDonaldTrump,2019-12-18T13:24:55Z,RT @dbongino: Liberals Embarrassingly Upset Ab...,1050,45th President of the United States of America🇺🇸,Twitter for iPhone,en,1207290646236499970
2,realDonaldTrump,2019-12-18T13:23:29Z,RT @seanhannity: BREAKING: McConnell shoots do...,3597,45th President of the United States of America🇺🇸,Twitter for iPhone,en,1207290288030400512
3,realDonaldTrump,2019-12-18T13:18:34Z,"....out, and he’s been amazing at it. The Demo...",1313,45th President of the United States of America🇺🇸,Twitter for iPhone,en,1207289048240312322
4,realDonaldTrump,2019-12-18T13:18:33Z,"....had, while the Democrats are just looking ...",1223,45th President of the United States of America🇺🇸,Twitter for iPhone,en,1207289046482870272


In [3]:
Trump_tweet_text=df['text']

In [4]:
Trump_tweet_text.head(1)

0    Sean is a great patriot and will do a fantasti...
Name: text, dtype: object

In [5]:
Trump_string=Trump_tweet_text.to_string()
print(Trump_string)

0       Sean is a great patriot and will do a fantasti...
1       RT @dbongino: Liberals Embarrassingly Upset Ab...
2       RT @seanhannity: BREAKING: McConnell shoots do...
3       ....out, and he’s been amazing at it. The Demo...
4       ....had, while the Democrats are just looking ...
5       ....said, “I’m going to clean up Washington, I...
6       “It’s sad. Here’s a gentleman who came to the ...
7       “They just wanted to get at the President. The...
8       Can you believe that I will be impeached today...
9       .@foxandfriends  “My hope is that impeachment ...
10           RT @realDonaldTrump: https://t.co/WzLB5s41m3
11      .@marcthiessen  “Voters say the Democrats are ...
12      ....They want to Impeach me (I’m not worried!)...
13      So, if Comey &amp; the top people in the FBI w...
14      Good marks and reviews on the letter I sent to...
15      Democrat “leadership,” despite their denials, ...
16      Wow! “In a stunning rebuke of the FBI, the FIS...
17            

In [6]:
np.savetxt('../output/Trump_Tweets.txt', Trump_tweet_text.values,fmt='%s', delimiter="\n") 

In [7]:
nlp=spacy.load('en_core_web_sm')
#r+ reads and w+ overwrites
text=open('../output/Trump_Tweets.txt','r+').read()
print(text)

Sean is a great patriot and will do a fantastic job. Has my total and complete endorsement! https://t.co/GxLuWGjKYX
RT @dbongino: Liberals Embarrassingly Upset About Neil Gorsuch Saying “Merry Christmas” 👇🏻👇🏻🤦🏼‍♂️ https://t.co/8enEzXlqwA
RT @seanhannity: BREAKING: McConnell shoots down Schumer impeachment demands https://t.co/y0c1TxnPpH
....out, and he’s been amazing at it. The Democrsts have no message, they have no hope for 2020.” @RepDougCollins @foxandfriends  Thank you Doug!
....had, while the Democrats are just looking out for elections. This President should just continue to fight like he’s always fought, for himself &amp; for this Country. Continue to put forth policies like prescription drugs &amp; trade policies. That’s what makes this President stand..
....said, “I’m going to clean up Washington, I’m going to help people.” He gave big tax cuts, he’s made our military strong. They’re mad at him because he actually did what he said he was going to do. History will record we’re

Now let's turn the string into a corpus for spacy

In [8]:
doc=nlp(text)
len(doc)
#gives the number of words in the file

171818

The document can act like a list of words. To access each word or 'token' we can use the built in function `.text`

In [9]:
for token in doc:
    print(token)

Sean
is
a
great
patriot
and
will
do
a
fantastic
job
.
Has
my
total
and
complete
endorsement
!
https://t.co/GxLuWGjKYX


RT
@dbongino
:
Liberals
Embarrassingly
Upset
About
Neil
Gorsuch
Saying
“
Merry
Christmas
”
👇
🏻
👇
🏻
🤦
🏼‍
♂
️
https://t.co/8enEzXlqwA


RT
@seanhannity
:
BREAKING
:
McConnell
shoots
down
Schumer
impeachment
demands
https://t.co/y0c1TxnPpH


....
out
,
and
he
’s
been
amazing
at
it
.
The
Democrsts
have
no
message
,
they
have
no
hope
for
2020
.
”
@RepDougCollins
@foxandfriends
 
Thank
you
Doug
!


....
had
,
while
the
Democrats
are
just
looking
out
for
elections
.
This
President
should
just
continue
to
fight
like
he
’s
always
fought
,
for
himself
&
amp
;
for
this
Country
.
Continue
to
put
forth
policies
like
prescription
drugs
&
amp
;
trade
policies
.
That
’s
what
makes
this
President
stand
..


....
said
,
“
I
’m
going
to
clean
up
Washington
,
I
’m
going
to
help
people
.
”
He
gave
big
tax
cuts
,
he
’s
made
our
military
strong
.
They
’re
mad
at
him
because
he
actually
did


Now we can count some words by:
- turning the words into a list
- turning that list into a pandas data frame
- counting the values

In [10]:
rows=[]
for token in doc:
    rows.append(token.text)

In [11]:
print(rows)



In [12]:
word_dataframe = pd.DataFrame(rows)
word_dataframe.columns=['word']
word_dataframe.head()

Unnamed: 0,word
0,Sean
1,is
2,a
3,great
4,patriot


In [13]:
word_dataframe['word'].value_counts()

,                           6513
the                         5966
.                           5600
\n                          4777
and                         3523
to                          3507
!                           3299
of                          2850
a                           2241
is                          2072
in                          1924
for                         1589
:                           1317
I                           1225
on                          1220
that                        1176
are                         1109
will                        1019
our                         1006
be                          1003
with                         991
RT                           973
-                            932
The                          817
“                            812
have                         808
”                            793
it                           766
;                            756
&                            745
          

In [14]:
word_count= word_dataframe.groupby('word').agg({'word':'count'}).rename(columns={'word':'count'})
word_count=word_count.sort_values(by='count', ascending=False)
word_count.head()

Unnamed: 0_level_0,count
word,Unnamed: 1_level_1
",",6513
the,5966
.,5600
\n,4777
and,3523


In [15]:
word_count.to_csv ('../output/export_dataframe.csv', header=True) 

You could also try the below
Millions=word_count.loc['millions'] to check how many times the word millions was used.

In [16]:
Read_tweets=pd.read_csv('../output/export_dataframe.csv')

In [17]:
df = pd.DataFrame(Read_tweets)
sum=df.sum(axis=0)
sum

word     ,the.\nandto!ofaisinfor:Ionthatarewillourbewit...
count                                               171818
dtype: object

In [18]:
Impeachment=df[df.word.str.contains('impeachment',  case=False)]

In [20]:
China=df[df.word.str.contains('China',  case=False)]

In [19]:
Impeachment

Unnamed: 0,word,count
1620,impeachment,11
3730,Impeachment,4


In [21]:
China

Unnamed: 0,word,count
115,China,195
7335,machinations,1
11172,U.S.–China,1


In [23]:
sum_column1 = Impeachment.sum(axis=0)
print (sum_column1) 

word     impeachmentImpeachment
count                        15
dtype: object


In [31]:
sum_column2 = China.sum(axis=0)
print (sum_column2) 

word     ChinamachinationsU.S.–China
count                            197
dtype: object


In [32]:
Trump_being_Trump= sum_column1['count']+(sum_column2['count']-1)

In [33]:
sum['count']

171818

In [34]:
(Trump_being_Trump/sum['count'])*100

0.12280436275593942