First we must import some necessary packages. If you do not have them you will need to do pip install (name of the package)

In [147]:
import pandas as pd
import re
from textblob import TextBlob as tb

Next we will import the file. Change the path to whatever you saved the csv to when you first got the data. 

In [148]:
df = pd.read_csv('~/PycharmProjects/untitled3/giants4.csv', header = 0)

First we will look at the what the data looks like. 

In [149]:
print(df.head())
print(type(df))

             TimeStamp                                              Tweet  \
0  2017-11-16 23:57:49  b'RT @BackAftaThis: Mike Francesa goes bonkers...   
1  2017-11-16 23:56:16  b'Fluker should be part of this O-Line next se...   
2  2017-11-16 23:55:56  b'Are the #Giants worth a look against the #Ch...   
3  2017-11-16 23:54:56  b'RT @TikiBarber: Talking shop with the NFL\xe...   
4  2017-11-16 23:54:11  b'RT @JordanRaanan: 1. Sterling Shepard, Evan ...   

                   Location  
0                    towson  
1  07444üíôüèà‚û°Ô∏è13210üçäüéì‚û°Ô∏è90254üå¥  
2           Fort Lauderdale  
3                      NYC   
4  07444üíôüèà‚û°Ô∏è13210üçäüéì‚û°Ô∏è90254üå¥  
<class 'pandas.core.frame.DataFrame'>


The next few things just set up to remove emojis, and get the words tokenized. 

In [150]:
emoticons_str = r"""
    (?:
        [:=;] # Eyes
        [oO\-]? # Nose (optional)
        [D\)\]\(\]/\\OpP] # Mouth
    )"""

regex_str = [
    emoticons_str,
    r'<[^>]+>', # HTML tags
    r'(?:@[\w_]+)', # @-mentions
    r"(?:\#+[\w_]+[\w\'_\-]*[\w_]+)", # hash-tags
    r'http[s]?://(?:[a-z]|[0-9]|[$-_@.&amp;+]|[!*\(\),]|(?:%[0-9a-f][0-9a-f]))+', # URLs

    r'(?:(?:\d+,?)+(?:\.?\d+)?)', # numbers
    r"(?:[a-z][a-z'\-_]+[a-z])", # words with - and '
    r'(?:[\w_]+)', # other words
    r'(?:\S)' # anything else
]

tokens_re = re.compile(r'('+'|'.join(regex_str)+')', re.VERBOSE | re.IGNORECASE)
emoticon_re = re.compile(r'^'+emoticons_str+'$', re.VERBOSE | re.IGNORECASE)

def tokenize(s):
    return tokens_re.findall(s)

def preprocess(s, lowercase=False):
    tokens = tokenize(s)
    if lowercase:
        tokens = [token if emoticon_re.search(token) else token.lower() for token in tokens]
    return tokens

Next we want to get the sentiment scores for all of the tweets and put them into a list. 

In [151]:
tweets = []
sentiment_scores = []



for index, entry in df.iterrows():
    tweet = tb(entry['Tweet'])
    sentiment_scores.append(tweet.sentiment.polarity)

Now we should add the sentiment scores to our dataframe. I decided to remove tweets with 0 sentiment to try to get a better sense of the feeling about a team. 

In [152]:
df["sentiment"] = sentiment_scores
df = df[df.sentiment != 0]

Now we are ready to do some investigation. There are all kinds of questions we can ask. One easy one is to get the average sentiment. 

In [153]:
print (sum(df.sentiment)/len(df.sentiment))

0.0657596699182


We could also do some more subsetting using the location. First, I get rid of all empty locations, and then I find locations that contain the string 'NY' for New York 

In [154]:
dfNYC = df.dropna()

dfNYC = dfNYC[dfNYC.Location.str.contains('NY')]

print(sum(dfNYC.sentiment)/len(dfNYC.sentiment))

0.00562560211736


As we can see New York fans tweet more negatively about the Giants than the general population. This is just one example of something you can look at. Try to find something interesting about your data or your query. 