# Final Project Submission

* **Student name**: Sara Robinson
* **Student pace**: *self paced*/part time/full time
* **Scheduled project review date/time**: 9/7/2021 11:00
* **Instructor name**: Jeff Herman
* **Blog post URL**: https://medium.com/@sara.robinson27/classifying-a-tweets-sentiment-based-on-its-content-9835069aa2b3
* **Notebook**: 1/4

# Introduction

The purpose of this project is to build a model that can rate the sentiment of a tweet based on its content. The data is from CrowdFlower and contains over 9000 tweets about Apple and Google products rated by humans as either postive, negative, or neither.

This notebook contains the data cleaning code, where I take in the data, clean and organize it, and make sure it's ready to be explored.

# Data Preparation

## Import Libraries

In the following cells I import the necessary libraries for this notebook.

In [1]:
import pandas as pd
import nltk
import re                                  
import string
from nltk.corpus import stopwords 
from nltk.tokenize import TweetTokenizer, word_tokenize

## Load and Inspect Data

In the following cells I load the dataset, review it to see what it looks like, and begin to clean it up a bit.

In [2]:
df = pd.read_csv('tweet_data.csv', encoding = 'unicode_escape') #Read in dataset
df.head() #Print first 5 rows

Unnamed: 0,tweet_text,emotion_in_tweet_is_directed_at,is_there_an_emotion_directed_at_a_brand_or_product
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative emotion
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive emotion
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion


The first thing I notice is that the column names are very bulky/wordy, so I want to change those to simply "Tweet", "Product", and "Emotion". This will make it much simpler as I go through this process.

In [3]:
df = df.rename(columns = {'tweet_text': 'Tweet', 
                         'emotion_in_tweet_is_directed_at': 'Product', 
                         'is_there_an_emotion_directed_at_a_brand_or_product': 'Emotion'}) #Rename columns
df.head() #Check to see if columns were renamed

Unnamed: 0,Tweet,Product,Emotion
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative emotion
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive emotion
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive emotion
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative emotion
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive emotion


In [4]:
print(df.info()) #Seeing information about our data

print(df.duplicated().sum()) #Checking to see if any rows are duplicated

print(df.isna().sum()) #Checking for Null entries

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9093 entries, 0 to 9092
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Tweet    9092 non-null   object
 1   Product  3291 non-null   object
 2   Emotion  9093 non-null   object
dtypes: object(3)
memory usage: 213.2+ KB
None
22
Tweet         1
Product    5802
Emotion       0
dtype: int64


There are 22 duplicates and 1 null entry in Tweet, let's remove all of those. There are 5802 tweets where the product is unidentified, to fill our dataset I will replace those null values with "Undetermined," which will be our identifying moniker for unknown products.

In [5]:
df.drop_duplicates(inplace = True) #Removing duplicates

df.dropna(subset = ['Tweet'], inplace = True) #Droping one null entry

df['Product'].fillna('Undetermined', inplace = True) #Replacing null values with "Undetermined"

print(df.info()) #Checking info

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9070 entries, 0 to 9092
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Tweet    9070 non-null   object
 1   Product  9070 non-null   object
 2   Emotion  9070 non-null   object
dtypes: object(3)
memory usage: 283.4+ KB
None


I'd like to take a look at the value counts within Emotion. What is the distribution between classes of positive, negative, or neither tweets?

In [6]:
df['Emotion'].value_counts() #Reviewing value counts of each class within emotion

No emotion toward brand or product    5375
Positive emotion                      2970
Negative emotion                       569
I can't tell                           156
Name: Emotion, dtype: int64

These are quite imbalanced, over half of the tweets were found to have no emotion, followed by a third of the tweets rated as positive. Less than 600 were rated as negative, and less than 200 tweets were classified as "I can't tell," meaning the rater was unable to determine the sentiment of the tweet. Just a little over a third of the tweets were actually rated as positive or negative.

While it is clear that these classes are very uneven, for now I am going to group together "No emotion" and "I can't tell" and turn them into "Neutral". At this point, I am also going to change the names within Emotion from "Positive emotion" to just "Positive," and "Negative emotion" to just "Negative". This will be much cleaner and clearer as I move forward in this analysis.

In [7]:
def clean_emotions(df, column): #Building function to change emotions
    emotion_list = [] #Making list for new names of emotions
    for i in df[column]:
        if i == "No emotion toward brand or product": #Renaming no emotions
            emotion_list.append('Neutral') #Renaming as Neutral
        elif i == "I can't tell": #Renaming I can't tell
            emotion_list.append('Neutral') #Renaming as Neutral
        elif i == "Positive emotion": #Renaming positive emotion
            emotion_list.append('Positive') #Renaming as Positive
        elif i == "Negative emotion": #Renaming negative emotion
            emotion_list.append('Negative') #Renaming as Negative
    df['Emotion'] = emotion_list #Setting column to new names
    return df

df = clean_emotions(df, 'Emotion') #Set df to clean emotions function
df['Emotion'].value_counts() #Checking value counts to see if they were changed

Neutral     5531
Positive    2970
Negative     569
Name: Emotion, dtype: int64

Now I'd like to take a look at the value counts within Product. What are the classes within this feature and what is the distribution of tweets between them?

In [8]:
df['Product'].value_counts() #Reviewing value counts of each class within product

Undetermined                       5788
iPad                                945
Apple                               659
iPad or iPhone App                  469
Google                              428
iPhone                              296
Other Google product or service     293
Android App                          80
Android                              77
Other Apple product or service       35
Name: Product, dtype: int64

The spread within products is extremely imbalanced. A majority (over half) of the tweets were not about a determined product. Because of this, I'd like to add another column called "Brand" that will simply be what Brand the tweet is about, based on the information from the "Product" column. Since these tweets are about Apple or Google products I figure it might be important later on to have this information so I'll make it easier for myself and add the column now. First we'll double-check all of the entries in "Product". Then we'll create another function that should loop through the product column and return the brand to our new column. If the brand is undetermined then the function will loop through the text of the tweet and see if any of the product words were used in the tweet, if none of them were used then the brand will be undetermined, if words for both brands are used then the brand will be "Both". Hopefully this will help create more balanced classes within this new feature.

In [9]:
def find_brand(Product, Tweet): #Building function to determine Brand
    brand = 'Undetermined' #Labeling brand as Undetermined
    if ((Product.lower().__contains__('google')) or (Product.lower().__contains__('android'))): #Labeling Google
        brand = 'Google' #Unless tweet contains google or android
    elif ((Product.lower().__contains__('apple')) or (Product.lower().__contains__('ip'))): #Labeling Apple
        brand = 'Apple' #Unless tweet contains apple or ip
    
    if (brand == 'Undetermined'): 
        lower_tweet = Tweet.lower() #Making tweet lowercase
        is_google = (lower_tweet.__contains__('google')) or (lower_tweet.__contains__('android')) #Undetermined google
        is_apple = (lower_tweet.__contains__('apple')) or (lower_tweet.__contains__('ip')) #Undetermined apple
        
        if (is_google and is_apple): #if it has both identifiers in the tweet
            brand = 'Both' #Labeling brand as both
        elif (is_google):
            brand = 'Google' #Labeling brand as Google
        elif (is_apple):
            brand = 'Apple' #Labeling brand as Apple
    
    return brand

df['Brand'] = df.apply(lambda row: find_brand(row['Product'], row['Tweet']), axis = 1) #Applying function to column
df['Brand'].value_counts() #Reviewing value counts of each class within brand

Apple           5361
Google          2757
Undetermined     739
Both             213
Name: Brand, dtype: int64

The spread has significantly changed! Our classes are still imbalanced, but this time in a different way. Over half of the tweets are about the brand Apple, while a third are about the brand Google. Tweets where the brand is undetermined make up about eight percent of all tweets, and tweets about both brands make up around only two percent! While this is better than over half of them being unknown (referring to undetermined product tweets) it is still not good to have these be this imbalanced.

Before I move on to preprocessing, I'd like to add a few more columns that will provide me with more information. First I'd like to add a column that contains the character count for the tweets before they are processed. I'd also like to create a column that will contain only the hashtags used within each tweet, as well as a column containing the count of hashtags within each tweet. While I am not using those columns right now I feel they might be useful during the EDA process. I will also add a column called "Clean" which will contain the cleaned and processed tweets. That way I still have the original tweets if needed but can easily access the preprocessed tweets as well. For now this column will just be a copy of the "Tweet" column.

In [10]:
def tweet_character_count(text_of_tweet): #Function to count characters in tweet
    return len(text_of_tweet.strip()) #Returns length of tweet

df['Tweet Character Count'] = df.apply(lambda row: tweet_character_count(row['Tweet']), axis = 1) #Making new column

df['Hashtag'] = df['Tweet'].apply(lambda x: re.findall(r'\B#\w*[a-zA-Z]+\w*', x)) #Making hashtag column

df['Hashtag Count'] = df['Hashtag'].str.len() #Creating column with number of Hashtags used

df['Clean'] = df['Tweet'] #New column for cleaned tweets

df.head() #Checking to see if columns were created accurately

Unnamed: 0,Tweet,Product,Emotion,Brand,Tweet Character Count,Hashtag,Hashtag Count,Clean
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative,Apple,127,"[#RISE_Austin, #SXSW]",2,.@wesley83 I have a 3G iPhone. After 3 hrs twe...
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive,Apple,139,[#SXSW],1,@jessedee Know about @fludapp ? Awesome iPad/i...
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive,Apple,79,"[#iPad, #SXSW]",2,@swonderlin Can not wait for #iPad 2 also. The...
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative,Apple,82,[#sxsw],1,@sxsw I hope this year's festival isn't as cra...
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive,Google,131,[#SXSW],1,@sxtxstate great stuff on Fri #SXSW: Marissa M...


Everything looks good. We will still need to deal with the class imbalances issues, specifically regarding our Emotion feature, prior to modeling. For now let's move on to text preprocessing.

# Text Preprocessing

Before we can start exploring our data, we need to finish cleaning and preparing it. For NLP, this process is different as we have to prepare our text data in a specific way so that it is able to be modeled properly. This includes removing URLs and punctuation, tokenization, stop words, etc. 

It's good to take a quick look at what the tweets look like first so we can walk through what we need to do to clean it up. In order to turn a tweet into modelable text data we need to do the following:

* Make everything lowercase
* Remove all URLs or URL placeholder values (such as {link} or [video])
* Remove all HTML reference characters
* Remove all twitter handles
* Remove all characters that aren't letters
* Remove all punctuation
* Remove extra spaces

Let's pull up a tweet that we can use to reference whether or not these changes were properly made. Let's find one with a URL and a placeholder value for either {link} or [video]. We don't need to find a tweet for each individual bullet point because each tweet should include a twitter handle and punctuation, among other non-character letters.

In [11]:
df[df['Tweet'].str.contains('https?:\/\/\S+')] #Looking for a reference tweet with a URL

Unnamed: 0,Tweet,Product,Emotion,Brand,Tweet Character Count,Hashtag,Hashtag Count,Clean
5,@teachntech00 New iPad Apps For #SpeechTherapy...,Undetermined,Neutral,Apple,140,"[#SpeechTherapy, #SXSW, #iear, #edchat, #asd]",5,@teachntech00 New iPad Apps For #SpeechTherapy...
8,Beautifully smart and simple idea RT @madebyma...,iPad or iPhone App,Positive,Apple,129,"[#hollergram, #sxsw]",2,Beautifully smart and simple idea RT @madebyma...
11,Find &amp; Start Impromptu Parties at #SXSW Wi...,Android App,Positive,Google,129,[#SXSW],1,Find &amp; Start Impromptu Parties at #SXSW Wi...
12,"Foursquare ups the game, just in time for #SXS...",Android App,Positive,Google,133,[#SXSW],1,"Foursquare ups the game, just in time for #SXS..."
13,Gotta love this #SXSW Google Calendar featurin...,Other Google product or service,Positive,Google,142,[#SXSW],1,Gotta love this #SXSW Google Calendar featurin...
14,Great #sxsw ipad app from @madebymany: http://...,iPad or iPhone App,Positive,Apple,65,[#sxsw],1,Great #sxsw ipad app from @madebymany: http://...
15,"haha, awesomely rad iPad app by @madebymany ht...",iPad or iPhone App,Positive,Apple,82,"[#hollergram, #sxsw]",2,"haha, awesomely rad iPad app by @madebymany ht..."
16,Holler Gram for iPad on the iTunes App Store -...,Undetermined,Neutral,Apple,92,[#sxsw],1,Holler Gram for iPad on the iTunes App Store -...
19,Must have #SXSW app! RT @malbonster: Lovely re...,iPad or iPhone App,Positive,Apple,118,[#SXSW],1,Must have #SXSW app! RT @malbonster: Lovely re...
23,"Photo: Just installed the #SXSW iPhone app, wh...",iPad or iPhone App,Positive,Apple,94,[#SXSW],1,"Photo: Just installed the #SXSW iPhone app, wh..."


Let's use Tweet 1133. I'll double check it in the cell below but from briefly observing the dataframe above it appears as though it has both a placeholder value for {link} and a URL.

In [12]:
df['Tweet'][1133][0:200] #Checking original reference tweet

'Check out the @mention Route {link} ; RSVP here -&gt; https://www.facebook.com/event.php?eid=141164002609303 #sxswi #sxsw'

In the following cell, I will be cleaning up all of the tweets. When that is completed I will check our reference tweet to see how it was changed and if there was anything we missed.

In [13]:
df.Clean = df.Clean.str.lower() #Making everything lowercase

df.Clean = df.Clean.apply(lambda x: re.sub(r'https?:\/\/\S+', '', x)) #Removing URLs with http/s

df.Clean = df.Clean.apply(lambda x: re.sub(r"www\.[a-z]?\.?(com)+|[a-z]+\.(com)", '', x)) #Removing URLs with www

df.Clean = df.Clean.apply(lambda x: re.sub(r'{link}', '', x)) #Removing {link} from tweets

df.Clean = df.Clean.apply(lambda x: re.sub(r"\[video\]", '', x)) #Removing [video] from tweets

df.Clean = df.Clean.apply(lambda x: re.sub(r'&[a-z]+;', '', x)) #Removing HTML reference characters

df.Clean = df.Clean.apply(lambda x: re.sub(r"@[A-Za-z0-9]+", '', x)) #Removing all twitter handles from tweets

df.Clean = df.Clean.apply(lambda x: re.sub(r"[^a-z\s\(\-:\)\\\/\];='#]", '', x)) #Removing other characters

def remove_punctuation(text): #Function to remove punctuation from tweet
    punctuationfree = "".join([i for i in text if i not in string.punctuation]) #Removing punctuation from tweet
    return punctuationfree #Returning punctuation free tweet

df.Clean = df.Clean.apply(lambda x: remove_punctuation(x)) #Applying function to tweets

df.Clean = df.Clean.apply(lambda x: re.sub(r"[ ]{2,}", ' ', x)) #Removing extra spaces

In [14]:
print(df['Tweet'][1133][0:200]) #Checking original reference tweet

print(df['Clean'][1133][0:200]) #Checking cleaned reference tweet

Check out the @mention Route {link} ; RSVP here -&gt; https://www.facebook.com/event.php?eid=141164002609303 #sxswi #sxsw
check out the route rsvp here sxswi sxsw


Our tweet is completely cleaned and ready for the next step. Real quick, I am going to add a column for the character count of the clean tweet column. Although preprocessing has not yet been completed (i.e. stemming/lemmatization has not been applied and stopwords are still included in these tweets), I think it could be interesting later on to compare the character counts before and after cleaning.

In [15]:
df['Clean Character Count'] = df.apply(lambda row: tweet_character_count(row['Clean']), axis = 1) #Making new column
df.head() #Checking to see if column was created accurately

Unnamed: 0,Tweet,Product,Emotion,Brand,Tweet Character Count,Hashtag,Hashtag Count,Clean,Clean Character Count
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative,Apple,127,"[#RISE_Austin, #SXSW]",2,i have a g iphone after hrs tweeting at risea...,104
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive,Apple,139,[#SXSW],1,know about awesome ipadiphone app that youll ...,112
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive,Apple,79,"[#iPad, #SXSW]",2,can not wait for ipad also they should sale t...,61
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative,Apple,82,[#sxsw],1,i hope this years festival isnt as crashy as ...,71
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive,Google,131,[#SXSW],1,great stuff on fri sxsw marissa mayer google ...,103


Before we begin tokenization (splitting each tweet into a list of individual strings) let's take a look at NLTK's library of stopwords. Stopwords are common words (such as before, herself, etc.) that are often removed when modeling text data because while they help convey information while communicating, while modeling they are just bulky data points. We'll look at the library first because we will eventually be modeling this dataset based on sentiment analysis, and we want to make sure not to remove any words that might be an indication of sentiment.

In [16]:
nltk_stopwords = stopwords.words('english') #Pulling up NLTK stopwords
nltk_stopwords.sort() #Sorting stopwords alphabetically
print(nltk_stopwords) #Printing list of sorted NLTK stopwords

['a', 'about', 'above', 'after', 'again', 'against', 'ain', 'all', 'am', 'an', 'and', 'any', 'are', 'aren', "aren't", 'as', 'at', 'be', 'because', 'been', 'before', 'being', 'below', 'between', 'both', 'but', 'by', 'can', 'couldn', "couldn't", 'd', 'did', 'didn', "didn't", 'do', 'does', 'doesn', "doesn't", 'doing', 'don', "don't", 'down', 'during', 'each', 'few', 'for', 'from', 'further', 'had', 'hadn', "hadn't", 'has', 'hasn', "hasn't", 'have', 'haven', "haven't", 'having', 'he', 'her', 'here', 'hers', 'herself', 'him', 'himself', 'his', 'how', 'i', 'if', 'in', 'into', 'is', 'isn', "isn't", 'it', "it's", 'its', 'itself', 'just', 'll', 'm', 'ma', 'me', 'mightn', "mightn't", 'more', 'most', 'mustn', "mustn't", 'my', 'myself', 'needn', "needn't", 'no', 'nor', 'not', 'now', 'o', 'of', 'off', 'on', 'once', 'only', 'or', 'other', 'our', 'ours', 'ourselves', 'out', 'over', 'own', 're', 's', 'same', 'shan', "shan't", 'she', "she's", 'should', "should've", 'shouldn', "shouldn't", 'so', 'some',

A lot of these words are good stopwords for this dataset, however most of them can be used to describe sentiment (are vs aren't, does vs doesn't, etc.). I will parse through this list and make a new list of stopwords. The biggest thing is not removing words that can be used to compare, i.e. "as" can be used like, "not as much" so I want to keep that.

In [17]:
new_stopwords = ['a', 'am', 'an', 'and', 'at', 'be', 'for', 'from', 'if', 
                 'in', 'it', "it's", 'its', 'itself', 'my', 'of', 'on', 'or', 'rt', 
                 'that', 'the', 'their', 'theirs', 'these', 'this', 'those', 'to'] #New list

Now I will add a column that will have the cleaned tweet tokens with stopwords removed, I am also going to add a column that will have a count of cleaned tokens. I am also going to tokenize the unclean Tweet column using NLTK's Tweet Tokenizer. I will be grabbing a count of these tokens as well. I'd like to note that it is not necessary to tokenize the uncleaned data, however for the purposes of my analysis I'd like to tokenize it for use further on in my exploration.

In [18]:
def remove_stopwords(text): #Function to remove stopwords
    return [word for word in word_tokenize(text) if not word in new_stopwords] #Returns tweet without stopwords

df['Clean Tokens'] = df.Clean.apply(lambda x: remove_stopwords(x)) #Applying function to build new column
df['Clean Token Count'] = df['Clean Tokens'].str.len() #Creating column with number of clean tokens

tweet_tokenizer = TweetTokenizer() #Instantiating Tweet Tokenizer
df['Tweet Tokens'] = df['Tweet'].apply(tweet_tokenizer.tokenize) #Create new column with uncleaned tweet tokens
df['Tweet Token Count'] = df['Tweet Tokens'].str.len() #Creating column with number of tweet tokens

df.head() #Reviewing dataframe to see if columns were added

Unnamed: 0,Tweet,Product,Emotion,Brand,Tweet Character Count,Hashtag,Hashtag Count,Clean,Clean Character Count,Clean Tokens,Clean Token Count,Tweet Tokens,Tweet Token Count
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative,Apple,127,"[#RISE_Austin, #SXSW]",2,i have a g iphone after hrs tweeting at risea...,104,"[i, have, g, iphone, after, hrs, tweeting, ris...",16,"[., @wesley83, I, have, a, 3G, iPhone, ., Afte...",29
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive,Apple,139,[#SXSW],1,know about awesome ipadiphone app that youll ...,112,"[know, about, awesome, ipadiphone, app, youll,...",15,"[@jessedee, Know, about, @fludapp, ?, Awesome,...",26
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive,Apple,79,"[#iPad, #SXSW]",2,can not wait for ipad also they should sale t...,61,"[can, not, wait, ipad, also, they, should, sal...",11,"[@swonderlin, Can, not, wait, for, #iPad, 2, a...",17
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative,Apple,82,[#sxsw],1,i hope this years festival isnt as crashy as ...,71,"[i, hope, years, festival, isnt, as, crashy, a...",12,"[@sxsw, I, hope, this, year's, festival, isn't...",16
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive,Google,131,[#SXSW],1,great stuff on fri sxsw marissa mayer google ...,103,"[great, stuff, fri, sxsw, marissa, mayer, goog...",14,"[@sxtxstate, great, stuff, on, Fri, #SXSW, :, ...",27


At this point we have completed cleaning and preprocessing the tweet data. Next I will save the updated dataframe as a new csv to use in the next notebook, where I will move on to exploring.

In [19]:
df.to_csv('CleanedDF.csv', index = False) #Saving updated dataframe for next notebook

CDF = pd.read_csv('CleanedDF.csv') #Reading updated dataframe
CDF.head() #Checking updated dataframe

Unnamed: 0,Tweet,Product,Emotion,Brand,Tweet Character Count,Hashtag,Hashtag Count,Clean,Clean Character Count,Clean Tokens,Clean Token Count,Tweet Tokens,Tweet Token Count
0,.@wesley83 I have a 3G iPhone. After 3 hrs twe...,iPhone,Negative,Apple,127,"['#RISE_Austin', '#SXSW']",2,i have a g iphone after hrs tweeting at risea...,104,"['i', 'have', 'g', 'iphone', 'after', 'hrs', '...",16,"['.', '@wesley83', 'I', 'have', 'a', '3G', 'iP...",29
1,@jessedee Know about @fludapp ? Awesome iPad/i...,iPad or iPhone App,Positive,Apple,139,['#SXSW'],1,know about awesome ipadiphone app that youll ...,112,"['know', 'about', 'awesome', 'ipadiphone', 'ap...",15,"['@jessedee', 'Know', 'about', '@fludapp', '?'...",26
2,@swonderlin Can not wait for #iPad 2 also. The...,iPad,Positive,Apple,79,"['#iPad', '#SXSW']",2,can not wait for ipad also they should sale t...,61,"['can', 'not', 'wait', 'ipad', 'also', 'they',...",11,"['@swonderlin', 'Can', 'not', 'wait', 'for', '...",17
3,@sxsw I hope this year's festival isn't as cra...,iPad or iPhone App,Negative,Apple,82,['#sxsw'],1,i hope this years festival isnt as crashy as ...,71,"['i', 'hope', 'years', 'festival', 'isnt', 'as...",12,"['@sxsw', 'I', 'hope', 'this', ""year's"", 'fest...",16
4,@sxtxstate great stuff on Fri #SXSW: Marissa M...,Google,Positive,Google,131,['#SXSW'],1,great stuff on fri sxsw marissa mayer google ...,103,"['great', 'stuff', 'fri', 'sxsw', 'marissa', '...",14,"['@sxtxstate', 'great', 'stuff', 'on', 'Fri', ...",27
