# Twitter Data Analysis
With downloaded tweets data, this notebook aims to do some initial analysis.

In [1]:
# import necessary libraries
#%matplotlib inline
import json
import pandas as pd
import pprint
import re
from textblob import TextBlob
from datetime import datetime



## Step 1: Read Downloaded Twitter Files
Read tweets from downloaded files into a list called **tweets_data**

In [16]:
# a list of files
file_paths = ['./data/search_data_11.txt','./data/search_data_10.txt','./data/search_data_09.txt','./data/search_data_08.txt','./data/search_data_07.txt','./data/search_data_06.txt','./data/search_data_05.txt','./data/search_data_04.txt','./data/search_data_03.txt',
              './data/search_data_02.txt', './data/search_data_01.txt']
# initialize data set
tweets_data = []
# loop on each file
for fp in file_paths:
    # open a file to read
    with open(fp,"r") as tweet_file:
        # read tweets into tweets_data
        for line in tweet_file:
            if len(line) > 10:
                try:
                    tweet = json.loads(line)
                    tweets_data.append(tweet)
                except ValueError as err:
                    print(err)
                    break

#print the number of tweets read
print("{} tweets have been read suceessfully.".format(len(tweets_data)))

62696 tweets have been read suceessfully.


Following command print out one example of the loaded tweets, which is the first tweet in the list.

Command below print out underlying information

In [17]:
firstTweet = tweets_data[0]
pprint.pprint(firstTweet,depth = 10, indent=2)

{ 'contributors': None,
  'coordinates': None,
  'created_at': 'Wed Aug 23 17:24:09 +0000 2017',
  'entities': { 'hashtags': [],
                'media': [ { 'display_url': 'pic.twitter.com/DTKO6446FR',
                             'expanded_url': 'https://twitter.com/librarianatrix/status/900408331797491712/photo/1',
                             'id': 900408307206152192,
                             'id_str': '900408307206152192',
                             'indices': [53, 76],
                             'media_url': 'http://pbs.twimg.com/media/DH7klbfVYAAe_J0.jpg',
                             'media_url_https': 'https://pbs.twimg.com/media/DH7klbfVYAAe_J0.jpg',
                             'sizes': { 'large': { 'h': 1798,
                                                   'resize': 'fit',
                                                   'w': 2048},
                                        'medium': { 'h': 1054,
                                                    'resize': 'fit'

In [18]:
firstTweetId = firstTweet['id']
firstTweetUserScreenName = firstTweet['user']['screen_name']
srcStr = "http://twitframe.com/show? url=https://twitter.com/{0}/status/{1}".format(firstTweetUserScreenName, firstTweetId)
srcStr

'http://twitframe.com/show? url=https://twitter.com/librarianatrix/status/900408331797491712'

In [19]:

%%html 
<iframe id="myFrame" border=0 frameborder=0 height=250 width=550 src= 'http://twitframe.com/show? url=https://twitter.com/librarianatrix/status/900408331797491712'> 
</iframe>

## Step 2: Retrieve Features

Because there are so many information inclued in one tweet, we just picked some useful features for initial analysis. Thsee features have been retrieved and loaded into a pandas dataframe.
* **id** - unique identification of a tweet
* **created_at** - time stamp when a tweet was posted
* **text** - content of a tweet
* **country** - country of the world where the tweet posted
* **retweet_count** - number of times a tweet was retweeted
* **favorite_count** - number of time a tweet was liked
* **userId** - id of the user who posted the tweet
* **retweeted_status** - most imported indicator of if a tweet has been retweeted or not
* **userName** - screen name of the user who posted the tweet
* **followers_count** - number of followers of the user who posted the tweet

Note: the `created_at` time stamp looks like `'Fri Aug 04 20:37:50 +0000 2017'`, from which the date info was extracted and the feild name changed to `created_date` instead.

In [20]:
df = pd.DataFrame()
df['id'] = list(map(lambda t : t['id'], tweets_data))
df['created_date'] = list(map(lambda t : datetime.strptime(t['created_at'], '%a %b %d %H:%M:%S %z %Y').date(), tweets_data))
# df['created_at'] = list(map(lambda t : t['created_at'][0:10]+', '+t['created_at'][-4:], tweets_data))
df['text'] = list(map(lambda t : t['text'], tweets_data))
df['country'] = list(map(lambda t : t['place']['country'] if t['place'] != None else 'None', tweets_data))
df['retweet_count'] = list(map(lambda t : t['retweet_count'], tweets_data))
df['favorite_count'] = list(map(lambda t : t['favorite_count'], tweets_data))
df['retweeted_status'] = list(map(lambda t : t['retweeted_status']['id'] if t.get('retweeted_status') != None else 0, tweets_data))
df['userId'] = list(map(lambda t : t['user']['id'], tweets_data))
df['userName'] = list(map(lambda t : t['user']['screen_name'], tweets_data))
df['followers_count'] = list(map(lambda t : t['user']['followers_count'], tweets_data))

#show table sample
df.head()

Unnamed: 0,id,created_date,text,country,retweet_count,favorite_count,retweeted_status,userId,userName,followers_count
0,900408331797491712,2017-08-23,"Great job, @UPS. Hope nothing broke inside thi...",,0,0,0,1740991,librarianatrix,511
1,900408215610982403,2017-08-23,@FedEx blows. @UPS at least gets my stuff here...,,0,0,0,338787873,_nidabear,260
2,900408196522704899,2017-08-23,PRC approves USPS' Move Update request https:/...,,0,0,0,543781548,PostalKathy,170
3,900407349269155840,2017-08-23,RT @UPS: It may have been the #F1 summer break...,,59,0,900281643587457024,19899422,iJLaing,425
4,900407241253109760,2017-08-23,RT @RoadSafetyNGOs: Today we tested @iRAPSavin...,,6,0,900380567887335424,325046475,PrerrnaSingh,132


## Step 3: Do Analysis

We did some analysis with the data sets and tried to answer some questions:

### Q1: how many tweets talked bout each of FedEx, UPS, DHL, and USPS every day?

To answer this question, we need to determine which company(or companies) each tweet was related to.

Defined a funciton named `word_in_text(word, text)` that checks if a `word` can be found in the `text`. 

In [21]:
def word_in_text(word, text):
    '''A function that tests if a word included in the text.'''
    word = word.lower()
    text = text.lower()
#     match = if any (re.search(w, text) for w in word.split(','))
    if any (re.search(w.strip(), text) for w in word.split(',')):
        return 1
    return 0

Add four new columns,**`FedEx`**, **`UPS`**, **`DHL`**, and **`USPS`** to the dataframe to flag if the tweet related to each of the companies.

In [22]:
df['FedEx'] = df['text'].apply(lambda t: word_in_text('FedEx', t))
df['UPS'] = df['text'].apply(lambda t: word_in_text('UPS', t))
df['DHL'] = df['text'].apply(lambda t: word_in_text('DHL', t))
df['USPS'] = df['text'].apply(lambda t: word_in_text('USPS', t))
#show the new dataframe
df.head()

Unnamed: 0,id,created_date,text,country,retweet_count,favorite_count,retweeted_status,userId,userName,followers_count,FedEx,UPS,DHL,USPS
0,900408331797491712,2017-08-23,"Great job, @UPS. Hope nothing broke inside thi...",,0,0,0,1740991,librarianatrix,511,0,1,0,0
1,900408215610982403,2017-08-23,@FedEx blows. @UPS at least gets my stuff here...,,0,0,0,338787873,_nidabear,260,1,1,0,0
2,900408196522704899,2017-08-23,PRC approves USPS' Move Update request https:/...,,0,0,0,543781548,PostalKathy,170,0,0,0,1
3,900407349269155840,2017-08-23,RT @UPS: It may have been the #F1 summer break...,,59,0,900281643587457024,19899422,iJLaing,425,0,1,0,0
4,900407241253109760,2017-08-23,RT @RoadSafetyNGOs: Today we tested @iRAPSavin...,,6,0,900380567887335424,325046475,PrerrnaSingh,132,1,0,0,0


Through steps belows, we can get number of tweets of each companies and total number of all tweets on each day. 
First, a new dataframe was built through selecting fields of **`'created_date', 'FedEx', 'UPS', 'DHL', 'USPS'`** and adding a new field **`Count`** for total number of tweets calculation. After that, a **`groupby - sum`** operation on **`created_date`** to get total numbers.

In [23]:
# built a new dataframe
df_q01 = df.loc[:,['created_date', 'FedEx', 'UPS', 'DHL', 'USPS']]
df_q01['Count'] = 1
# get the result
twt_by_date = df_q01.groupby('created_date').sum()
twt_by_date

Unnamed: 0_level_0,FedEx,UPS,DHL,USPS,Count
created_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2017-07-21,664,944,38,551,2131
2017-07-22,496,962,46,582,1988
2017-07-23,452,349,16,335,1088
2017-07-24,567,762,33,508,1820
2017-07-25,562,1005,67,557,2081
2017-07-26,634,985,80,747,2342
2017-07-27,596,771,59,685,2062
2017-07-28,606,801,61,600,1967
2017-07-29,347,928,20,472,1736
2017-07-30,439,374,23,389,1216


With above table, it is easy for us to get **overall tweets distribution among four companies**. Figures below is plotted through **Bokeh**, a python interactive visualization library, which targets modern web browsers. The figures can be embeded in a web page for a dynamic visualization.

In [25]:
twt_by_company = twt_by_date[['FedEx', 'UPS', 'DHL', 'USPS']].sum().rename_axis('Company').reset_index(name='Count')
twt_by_company= twt_by_company.sort_values(by='Count', ascending= False)
print(twt_by_company)

# import bokeh library
from bokeh.charts import Bar, output_notebook, show
from bokeh.charts.attributes import CatAttr
# output to notebook
output_notebook()

# plot grouped barcharts
p = Bar(data=twt_by_company, label=CatAttr(columns=['Company'], sort=False),  values='Count',  legend=None,
        plot_width=600, plot_height=300, title='Company Tweet Counts')
show(p)

  Company  Count
1     UPS  22960
0   FedEx  21321
3    USPS  18831
2     DHL   1576


The code block below plots a figure of **daily total and company tweet counts**, where total tweet counts over days are shown as a dotted line and grouped bars present tweet counts of four companies at each day.

In [26]:
# import bokeh library
from bokeh.charts import Bar, Line, output_notebook, show
from bokeh.models.ranges import Range1d
from bokeh.models import ColumnDataSource
from bokeh.models.glyphs import Line as Line_glyph
from bokeh.models.markers import Square

# output to notebook
output_notebook()

# prepare plotting data
tot_twt_by_date = twt_by_date['Count'].reset_index()
tot_twt_by_date['created_date']=tot_twt_by_date['created_date'].apply(lambda x: str(x))

cmp_twt_by_date = twt_by_date[['FedEx', 'UPS', 'DHL', 'USPS']].sort_values(by=twt_by_date.index[0], axis=1, ascending = False)
cmp_twt_by_date =cmp_twt_by_date.stack().rename_axis(['created_date','Companies']).reset_index(name='Count')
cmp_twt_by_date['created_date']=cmp_twt_by_date['created_date'].apply(lambda x: str(x))

# plot grouped barcharts
p = Bar(data=cmp_twt_by_date, label = 'created_date',  values='Count', group = 'Companies', legend='top_right',
        color=['#ffcd00', '#4B1388', '#644117','#004A87' ],
        plot_width=900, plot_height=450, title='Daily Total and Company Tweet Counts')

# build a columndatasource for line
source = ColumnDataSource(tot_twt_by_date)
# create a line glyph object which references columns from source data
line = Line_glyph(x='created_date', y='Count', line_dash='dashed',line_color="#F46D43",line_alpha=0.6, line_width=2)
# add the glyph to the chart
p.add_glyph(source, line)

square = Square(x='created_date', y='Count', size=5, line_color="#F46D43", fill_color="white")
p.add_glyph(source, square)

# reset y range
p.y_range = Range1d(0, 2500)

# show plot
show(p)

Next take a further look at FedEx tweet daily volume.

In [27]:
# import bokeh library
from bokeh.plotting import figure, show, output_notebook
from bokeh.models import BoxAnnotation
from bokeh.models import DatetimeTickFormatter
from math import pi
# output to notebook
output_notebook()

# prepare plotting data
fedex_daily_volume = twt_by_date['FedEx'].reset_index().rename(columns={'FedEx':'Count'})
fedex_daily_volume['created_date'] = fedex_daily_volume['created_date']
x = fedex_daily_volume['created_date']
y = fedex_daily_volume['Count']


des = y.describe()
mean = des['mean']
std = des['std']
UL = mean +  1.9*std
LL = mean - 1.9*std

# build the figure
p = figure(plot_width=600, plot_height=400, 
           x_axis_type='datetime', 
           title='FedEx Tweet Daily Volume',
           x_axis_label ='Date',
          y_axis_label='Tweet Counts')

# add tweet count line 
p.line(x=x, y=y,line_color='#4B1388',line_alpha=0.6, line_width=2)
# add diamond market
p.circle(x=x, y=y,line_color='#4B1388',size=10, color="white", line_width=2)

# build control lines
low_box = BoxAnnotation(bottom= LL, top=LL )
mid_box = BoxAnnotation(bottom=LL, top=UL, fill_alpha=0.1, fill_color='green')
mean_line = BoxAnnotation(bottom=mean, top=mean, line_dash='dashed',line_color="black",line_width=2 )
high_box = BoxAnnotation(bottom=UL, top=UL )
# add control lines
p.add_layout(low_box)
p.add_layout(mid_box)
p.add_layout(high_box)
p.add_layout(mean_line)

# format datatime
p.xaxis.formatter=DatetimeTickFormatter(
        days=["%Y-%m-%d"]
    )
# set orientation
p.xaxis.major_label_orientation = pi/4
p.y_range = Range1d(0, des['max']+400)

# set grid 
p.xgrid[0].grid_line_alpha=0.3
p.ygrid[0].grid_line_alpha=0.3

# show results
show(p)


Some potential observations can be made:
1. More than one thousand daily total tweets should be hard for a human to read manually
2. The total tweets volume goes up and down in a cyclical pattern(less in weekends and more in the middle of weeks)
3. Each individual company's tweet volume also follows a similar cyclical up and down pattern

However, on August 06(Sunday) and August 07(Monday), the volumes of tweets related to FedEx were almost doubled, let's take a look at the tweets on the two days!

**NOTE: Analysis below are based on only FedEx tweets on the two days, but the same anaysis process and approach are appliable to other tweet datasets.**

### Q2: What happened on Aug 06, 2017 and Aug07, 2017? What topics people have talked about FedEx on the two days?
To answer this question, we can take a look at what are the most popular original tweets (with most retweeted count) during the two days.

In [28]:
# select FedEx tweet on the day of aug06, 2017
is_aug0607 = df['created_date'].map(lambda x: (str(x)=='2017-08-06') | (str(x)=='2017-08-07'))
aug0607_FedEx_Tweets = df[is_aug0607 & (df['FedEx'] ==1)]
aug0607_FedEx_Original_Tweets =  aug0607_FedEx_Tweets[aug0607_FedEx_Tweets['retweeted_status'] == 0] 
# sort by retweet_count 
sorted_by_retweet_count = aug0607_FedEx_Original_Tweets.sort_values(by='retweet_count',ascending=False)
# print out the top 10 tweet
i = 0
for idx, row in sorted_by_retweet_count.head(10).iterrows():
    print("Tweet({0}) with author '{1}' (has {2} followers), date '{3}' and had been retweeted {4} times, has text: \n \t{5}".format(row['id'], row['userName'], row['followers_count'], row['created_date'], row['retweet_count'], row['text']))
    print()
    i = i+1
    

Tweet(894073747824574464) with author 'ShujaRabbani' (has 299505 followers), date '2017-08-06' and had been retweeted 611 times, has text: 
 	Hi @FedEx Dubai, your delivery is an absolute nightmare! Your drivers are lost &amp; confused! One delivery has been bouncing around for 5 days!

Tweet(894002161759141889) with author 'h0t_p0ppy' (has 14126 followers), date '2017-08-06' and had been retweeted 16 times, has text: 
 	"im leaving on a jet plane" and i am NEVER going home @FedEx You are complicit in dolphin slavery 
#OpSeaWorld https://t.co/ZeunaDfiMT

Tweet(894213516764135425) with author 'h0t_p0ppy' (has 14126 followers), date '2017-08-06' and had been retweeted 15 times, has text: 
 	#OpFunKill
Its Time for companies to be accountable for their actions. 
#FedEx profit from #extinction #FDX #NYSE https://t.co/Zp3n2SeamA

Tweet(894311662009819136) with author 'h0t_p0ppy' (has 14126 followers), date '2017-08-06' and had been retweeted 12 times, has text: 
 	.@FedEx could save sharks 

We can also calculate the most frequent keywords in these tweets using code block below.

In [29]:
import string
from collections import Counter 
from nltk.tokenize import TweetTokenizer 
from nltk.corpus import stopwords

# define a pre-processing function
def process(text, tokenizer=TweetTokenizer(), stopwords=[]):
    """Process the text of a tweet:
    - Lowercase
    - Tokenize
    - Stopword removal
    - Digits removal
    
    Return: list of strings
    """ 
    text = text.lower()
    tokens = tokenizer.tokenize(text)
    return [tok for tok in tokens if tok not in stopwords and not tok.isdigit()] 

# define the tokennizer
tweet_tokenizer = TweetTokenizer() 
punct = list(string.punctuation)
# define stop words list
stopword_list = stopwords.words('english') + punct + ['rt','via', '...','…','hi','#fedex','#fdx','@fedex', '@fedexhelp',
                                                      'fedex','one','going','it\'s', '@angelciraq214','@timkaine','@repdonbeyer',
                                                     '@hillaryclinton','@barackobama','@va8thcddems','@lowkell','@scooterocket']

# initialize the counter dict
tf = Counter()
# loop through all fedex tweet on aug 06, 2017
for _, row in aug0607_FedEx_Tweets.iterrows():
    tokens = process(row['text'], tokenizer=tweet_tokenizer, stopwords=stopword_list)
    tf.update(tokens) 

# print result
print("Keyword : Count")
print("----------------")
for tag, count in tf.most_common(30): 
    print("{} : {}".format(tag, count))

Keyword : Count
----------------
delivery : 1250
lost : 611
drivers : 611
absolute : 603
confused : 602
nightmare : 601
bouncing : 601
@shujarabbani : 600
dubai : 600
tracking : 164
#opseaworld : 129
package : 105
stop : 91
#opfunkill : 86
time : 84
today : 80
system : 73
get : 59
@dennyhamlin : 58
#boycottfedex : 53
dolphin : 53
@h0t_p0ppy : 52
#nyse : 51
can't : 48
yet : 48
working : 47
shark : 46
never : 45
like : 44
@wgi : 43


In [30]:
from bokeh.plotting import figure, show, output_notebook

output_notebook

# prepare data
top_key_words = tf.most_common(30)
keyword_list = [kw[0] for kw in top_key_words[::-1]]
cout_list = [kw[1] for kw in top_key_words[::-1]]


p = figure(plot_width=450, plot_height=500, y_range = keyword_list, y_axis_label="Key Word", x_axis_label='Frequency')
p.hbar(y=keyword_list, height=0.7, left=0, right=cout_list, color="#4B1388", fill_alpha=0.5)

show(p)



#### Print out some example tweets posted in the two days that contain selected key words.

In [14]:
# grad tweets that contain specified key words
keyword = 'tracking'
aug0607_FedEx_Tweets['contain_keyword'] = aug0607_FedEx_Tweets['text'].apply(lambda t: word_in_text(keyword, t))
contains_key_words = aug0607_FedEx_Tweets[aug0607_FedEx_Tweets['contain_keyword']==1]
sorted_contains_key_words=contains_key_words.sort_values(by='retweet_count',ascending=False)
sorted_contains_key_words.tail(200)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,id,created_date,text,country,retweet_count,favorite_count,retweeted_status,userId,userName,followers_count,FedEx,UPS,DHL,USPS,contain_keyword
6551,894645698838310913,2017-08-07,"RT @langdaleca: Wow, @FedEx is not even issuin...",,9,0,894569707918983168,144651856,jruizjimenez,56,1,0,0,0,1
6861,894623181058781185,2017-08-07,"RT @langdaleca: Wow, @FedEx is not even issuin...",,8,0,894569707918983168,28215663,Lori_McClain11,1263,1,0,0,0,1
7189,894593129403305985,2017-08-07,"RT @langdaleca: Wow, @FedEx is not even issuin...",,8,0,894569707918983168,14978950,otaibi,911,1,0,0,0,1
7074,894605528923615233,2017-08-07,"RT @langdaleca: Wow, @FedEx is not even issuin...",,8,0,894569707918983168,18953266,tinastullracing,335827,1,0,0,0,1
7367,894573211756748801,2017-08-07,"RT @langdaleca: Wow, @FedEx is not even issuin...",,8,0,894569707918983168,1623824257,Eddie46604134,15,1,0,0,0,1
7039,894608346283692032,2017-08-07,"RT @langdaleca: Wow, @FedEx is not even issuin...",,8,0,894569707918983168,40084098,Goodgoth,7512,1,0,0,0,1
7237,894588577581027329,2017-08-07,"RT @langdaleca: Wow, @FedEx is not even issuin...",,8,0,894569707918983168,2288999436,javiflame,936,1,0,0,0,1
6739,894631048792817664,2017-08-07,"RT @langdaleca: Wow, @FedEx is not even issuin...",,8,0,894569707918983168,24303,giovanni,48850,1,0,0,0,1
7392,894569707918983168,2017-08-07,"Wow, @FedEx is not even issuing a statement ab...",,8,49,0,2984871018,langdaleca,1394,1,0,0,0,1
7219,894589224279617537,2017-08-07,"RT @langdaleca: Wow, @FedEx is not even issuin...",,8,0,894569707918983168,2910012074,wawasense,226,1,0,0,0,1


### Q3: Among  FedEx tweets, which are positive or negative comments?
Following steps applies the `textblob`- a sentiment analysis library built on top of NLTK, to classify sentiment of tweet text.

In [16]:
# Step 1: prepare dataset
company_Tweets = df[ df['FedEx'] ==1].reset_index()

# Step 2: clean up text for sentiment analysis
def clean_text(text):
        '''
        Utility function to clean tweet text by removing links, special characters
        using simple regex statements.
        '''
        return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", text).split())
    
company_Tweets['Cleaned_Text'] = company_Tweets['text'].apply(lambda t: clean_text(t))

# Step 3: define and apply sentiment analysis
def get_tweet_sentiment(tweet):
        '''
        Utility function to classify sentiment of passed tweet text
        using textblob's sentiment method
        '''
        # create TextBlob object of passed tweet text
        analysis = TextBlob(clean_text(tweet))
        # set sentiment
        if analysis.sentiment.polarity > 0.4:
            return 'strong positive'
        elif analysis.sentiment.polarity > 0.1 and analysis.sentiment.polarity <=0.4:
            return 'weak positive'
        elif analysis.sentiment.polarity > -0.1 and analysis.sentiment.polarity <=0.1:
            return 'neutral'
        elif analysis.sentiment.polarity >-0.4 and analysis.sentiment.polarity <=-0.1:
            return 'weak negative'
        else:
            return 'strong negative'
company_Tweets['sentiment'] = company_Tweets['Cleaned_Text'].apply(lambda t: get_tweet_sentiment(t))
company_Tweets.head(5)

Unnamed: 0,index,id,created_date,text,country,retweet_count,favorite_count,retweeted_status,userId,userName,followers_count,FedEx,UPS,DHL,USPS,Cleaned_Text,sentiment
0,1,895738266602717184,2017-08-10,.@FedEx fly dolphins around the world and ensu...,,0,0,0,2874094245,OpKiIIingBay,1441,1,0,0,0,fly dolphins around the world and ensure a lif...,strong positive
1,2,895738254322008065,2017-08-10,RT @ClayMillican: Teen Safe Driving School is ...,,17,0,895072540791640067,3333481893,NewClyde,740,1,0,0,0,RT Teen Safe Driving School is coming To Memph...,weak positive
2,5,895737630087942146,2017-08-10,RT @FedExNews: Congratulations to the 173 driv...,,8,0,895692826977193984,2773614424,iamalbaaa,70,1,0,0,0,RT Congratulations to the 173 drivers represen...,strong positive
3,8,895737280752742402,2017-08-10,RT @OrcaOnIceSkates: #FedEx why do you do this...,,5,0,895731044996890625,517950064,sunnycarol54,97,1,0,0,0,RT FedEx why do you do this to me Stop Dolphin...,neutral
4,12,895736747937497088,2017-08-10,RT @BBBPNW: RT @consumerist: #FedEx won't be i...,,1,0,895736699136999424,245440990,Ryan_Hawes,1086,1,1,0,0,RT RT FedEx won t be imposing holiday surcharg...,neutral


In [17]:
# picking strong positive tweets from FedEx tweets
strng_pos_tweets = company_Tweets[company_Tweets['sentiment'] == 'strong positive'].sort_values(by='retweet_count',ascending=False)
# percentage of strong positive tweets
print("{0} Strong Positive tweets counts for {1:.2f}% of total {2} tweets".format(strng_pos_tweets.shape[0], \
        100*strng_pos_tweets.shape[0]/company_Tweets.shape[0], company_Tweets.shape[0]))

# picking weak positive tweets from FedEx tweets
weak_pos_tweets = company_Tweets[company_Tweets['sentiment'] == 'weak positive'].sort_values(by='retweet_count',ascending=False)
# percentage of weak positive tweets
print("{0} Weak Positive tweets counts for {1:.2f}% of total {2} tweets".format(weak_pos_tweets.shape[0], \
        100*weak_pos_tweets.shape[0]/company_Tweets.shape[0], company_Tweets.shape[0]))

# picking neutral tweets from FedEx tweets
neu_tweets = company_Tweets[company_Tweets['sentiment'] == 'neutral'].sort_values(by='retweet_count',ascending=False)
# percentage of neutral tweets
print("{0} Neutral tweets counts for {1:.2f}% of total {2} tweets".format(neu_tweets.shape[0], \
        100*neu_tweets.shape[0]/company_Tweets.shape[0], company_Tweets.shape[0]))

# picking weak negative tweets from FedEx tweets
weak_neg_tweets = company_Tweets[company_Tweets['sentiment'] == 'weak negative'].sort_values(by='retweet_count',ascending=False)
# percentage of weak negative tweets
print("{0} Weak Negative tweets counts for {1:.2f}% of total {2} tweets".format(weak_neg_tweets.shape[0], \
        100*weak_neg_tweets.shape[0]/company_Tweets.shape[0], company_Tweets.shape[0]))

# picking strong negative tweets from FedEx tweets
strng_neg_tweets = company_Tweets[company_Tweets['sentiment'] == 'strong negative'].sort_values(by='retweet_count',ascending=False)
# percentage of weak negative tweets
print("{0} Strong Negative tweets counts for {1:.2f}% of total {2} tweets".format(strng_neg_tweets.shape[0], \
        100*strng_neg_tweets.shape[0]/company_Tweets.shape[0], company_Tweets.shape[0]))


994 Strong Positive tweets counts for 7.61% of total 13069 tweets
1787 Weak Positive tweets counts for 13.67% of total 13069 tweets
7705 Neutral tweets counts for 58.96% of total 13069 tweets
1755 Weak Negative tweets counts for 13.43% of total 13069 tweets
828 Strong Negative tweets counts for 6.34% of total 13069 tweets


In [18]:
# printing first 10 positive tweets
print("Some Strong Positive tweet examples:\n")
for idx, row in strng_pos_tweets.tail(100).iterrows():
    print("Tweet({0}) with author '{1}' (has {2} followers), date '{3}' and had been retweeted {4} times, has text: \n \t{5}".format(row['id'], row['userName'], row['followers_count'], row['created_date'], row['retweet_count'], row['text']))
    print()

Some Strong Positive tweet examples:

Tweet(891728574696357888) with author 'NickLaBrecque' (has 4111 followers), date '2017-07-30' and had been retweeted 0 times, has text: 
 	Okay @poconoraceway let's watch @dennyhamlin get Win #5 there today in the #Overtons400 for @JoeGibbsRacing @FedEx… https://t.co/Cy3qWwgiFV

Tweet(891727659788619784) with author 'lukbon' (has 44276 followers), date '2017-07-30' and had been retweeted 0 times, has text: 
 	@rogerfederer Its spectacular.....and the trophy isn't too shabby either! #Fedex

Tweet(891722346586918912) with author 'ShaneSimp111' (has 823 followers), date '2017-07-30' and had been retweeted 0 times, has text: 
 	@FedEx @dennyhamlin @poconoraceway 5 sounds great! Go gettem DH! #FedEx11

Tweet(891693159914319872) with author 'Bunnygolf' (has 297 followers), date '2017-07-30' and had been retweeted 0 times, has text: 
 	Here's to a great Sunday @RBCCanadianOpen for @Power4Seamus $$$$ #FedEx points #COYBIG

Tweet(891657429355831297) with au

In [57]:
# printing first 10 positive tweets
print("Some Weak Positive tweet examples:\n")
for idx, row in weak_pos_tweets.head(100).iterrows():
    print("Tweet({0}) with author '{1}' (has {2} followers), date '{3}' and had been retweeted {4} times, has text: \n \t{5}".format(row['id'], row['userName'], row['followers_count'], row['created_date'], row['retweet_count'], row['text']))
    print()

Some Weak Positive tweet examples:

Tweet(894880103603122177) with author 'Z12xusEhWenamAe' (has 2 followers), date '2017-08-08' and had been retweeted 1910 times, has text: 
 	RT @UPS: It’s #Seb5’s birthday and his @ScuderiaFerrari teammate #Kimi7 has made a last minute gift. Lucky for him, UPS is faster than ever…

Tweet(893862688769671169) with author 'EricaClassyladi' (has 10 followers), date '2017-08-05' and had been retweeted 1909 times, has text: 
 	RT @UPS: It’s #Seb5’s birthday and his @ScuderiaFerrari teammate #Kimi7 has made a last minute gift. Lucky for him, UPS is faster than ever…

Tweet(893084902404624384) with author 'JvG6V04jp4nOPcB' (has 4 followers), date '2017-08-03' and had been retweeted 1908 times, has text: 
 	RT @UPS: It’s #Seb5’s birthday and his @ScuderiaFerrari teammate #Kimi7 has made a last minute gift. Lucky for him, UPS is faster than ever…

Tweet(893563511577686019) with author 'judehaste_write' (has 12123 followers), date '2017-08-04' and had been retw

In [58]:
# printing first 10 neutral tweets
print("Some Neutral tweet examples:\n")
for idx, row in neu_tweets.head(5).iterrows():
    print("Tweet({0}) with author '{1}' (has {2} followers), date '{3}' and had been retweeted {4} times, has text: \n \t{5}".format(row['id'], row['userName'], row['followers_count'], row['created_date'], row['retweet_count'], row['text']))
    print()

Some Neutral tweet examples:

Tweet(893191890098630656) with author 'kquezadadavila1' (has 185 followers), date '2017-08-03' and had been retweeted 234 times, has text: 
 	RT @OnOffSexy: #ups what can brown do 4U? #OnOff @BabePicsHQ @RZual @TETASPERFECTAS @brndn1116 @the1stMe420 @OnlyTheSexiest_ @AZwtf http://…

Tweet(892497398769094657) with author 'd_wolt' (has 631 followers), date '2017-08-01' and had been retweeted 219 times, has text: 
 	RT @UPS: You can't spell layups without #UPS... https://t.co/FX7kqyguP8

Tweet(892238333727969281) with author 'colorfvlsovnd' (has 72 followers), date '2017-08-01' and had been retweeted 218 times, has text: 
 	RT @UPS: You can't spell layups without #UPS... https://t.co/FX7kqyguP8

Tweet(892052995428560896) with author 'TamarianPo' (has 8848 followers), date '2017-07-31' and had been retweeted 217 times, has text: 
 	RT @UPS: You can't spell layups without #UPS... https://t.co/FX7kqyguP8

Tweet(892047922212294656) with author '_serenxde_' (has 4

In [59]:
# printing first 10 weak negative tweets
print("Some Weak Negtive tweet examples:\n")
for idx, row in weak_neg_tweets.head(5).iterrows():
    print("Tweet({0}) with author '{1}' (has {2} followers), date '{3}' and had been retweeted {4} times, has text: \n \t{5}".format(row['id'], row['userName'], row['followers_count'], row['created_date'], row['retweet_count'], row['text']))
    print()

Some Weak Negtive tweet examples:

Tweet(888351181583638528) with author 'Adii21' (has 259 followers), date '2017-07-21' and had been retweeted 198 times, has text: 
 	RT @peta: @UPS Animals are not trophies. Tell @UPS to STOP shipping dead animal body parts! https://t.co/7i3330eOFR #CecilTheLion https://t…

Tweet(889537744917213185) with author 'reginersl' (has 216 followers), date '2017-07-24' and had been retweeted 198 times, has text: 
 	RT @peta: @UPS Animals are not trophies. Tell @UPS to STOP shipping dead animal body parts! https://t.co/7i3330eOFR #CecilTheLion https://t…

Tweet(888475953139200000) with author 'gaia_on' (has 136 followers), date '2017-07-21' and had been retweeted 198 times, has text: 
 	RT @peta: @UPS Animals are not trophies. Tell @UPS to STOP shipping dead animal body parts! https://t.co/7i3330eOFR #CecilTheLion https://t…

Tweet(888752636245233666) with author 'JasminJ0107' (has 13 followers), date '2017-07-22' and had been retweeted 198 times, has text: 
 

In [60]:
# printing first 10 weak negative tweets
print("Some Strong Negtive tweet examples:\n")
for idx, row in strng_neg_tweets.tail(100).iterrows():
    print("Tweet({0}) with author '{1}' (has {2} followers), date '{3}' and had been retweeted {4} times, has text: \n \t{5}".format(row['id'], row['userName'], row['followers_count'], row['created_date'], row['retweet_count'], row['text']))
    print()

Some Strong Negtive tweet examples:

Tweet(891024833160650753) with author 'XBadBadDaddyX' (has 54 followers), date '2017-07-28' and had been retweeted 0 times, has text: 
 	@UPS @OriginalFunko @PopPriceGuide  not a happy collector rn. From ppg marketplace https://t.co/xMbXEB6B9h

Tweet(892107545610989568) with author 'ScooterDrones' (has 32 followers), date '2017-07-31' and had been retweeted 0 times, has text: 
 	@UPS just lost my package that had my broken mavic in it. @DJISupport @DJIGlobal

Tweet(892112305378529282) with author 'DWhipp21' (has 59 followers), date '2017-07-31' and had been retweeted 0 times, has text: 
 	Longer than 30 seconds for me to answer the door. Sorry I had surgery on my ankle and can't get there in 2 seconds!! @UPS

Tweet(892845598440923137) with author 'Dreysander1' (has 1115 followers), date '2017-08-02' and had been retweeted 0 times, has text: 
 	@Punk_Bat @BossSergal @FedEx @UPS Unfortunately when you order from @amazon , you have no control over thei