# Challenge 2: Sentiment Analysis

In this challenge we will learn sentiment analysis and practice performing sentiment analysis on Twitter tweets. 

## Introduction

Sentiment analysis is to *systematically identify, extract, quantify, and study affective states and subjective information* based on texts ([reference](https://en.wikipedia.org/wiki/Sentiment_analysis)). In simple words, it's to understand whether a person is happy or unhappy in producing the piece of text. Why we (or rather, companies) care about sentiment in texts? It's because by understanding the sentiments in texts, we will be able to know if our customers are happy or unhappy about our products and services. If they are unhappy, the subsequent action is to figure out what have caused the unhappiness and make improvements.

Basic sentiment analysis only understands the *positive* or *negative* (sometimes *neutral* too) polarities of the sentiment. More advanced sentiment analysis will also consider dimensions such as agreement, subjectivity, confidence, irony, and so on. In this challenge we will conduct the basic positive vs negative sentiment analysis based on real Twitter tweets.

NLTK comes with a [sentiment analysis package](https://www.nltk.org/api/nltk.sentiment.html). This package is great for dummies to perform sentiment analysis because it requires only the textual data to make predictions. For example:

```python
>>> from nltk.sentiment.vader import SentimentIntensityAnalyzer
>>> txt = "Ironhack is a Global Tech School ranked num 2 worldwide.   Our mission is to help people transform their careers and join a thriving community of tech professionals that love what they do."
>>> analyzer = SentimentIntensityAnalyzer()
>>> analyzer.polarity_scores(txt)
{'neg': 0.0, 'neu': 0.741, 'pos': 0.259, 'compound': 0.8442}
```

In this challenge, however, you will not use NLTK's sentiment analysis package because in your Machine Learning training in the past 2 weeks you have learned how to make predictions more accurate than that. The [tweets data](https://www.kaggle.com/kazanova/sentiment140) we will be using today are already coded for the positive/negative sentiment. You will be able to use the Naïve Bayes classifier you learned in the lesson to predict the sentiment of tweets based on the labels.

## Conducting Sentiment Analysis

### Loading and Exploring Data

The dataset we'll be using today is located in the lab directory named `Sentiment140.csv.zip`. You need to unzip it into a `.csv` file. Then in the cell below, load and explore the data.

*Notes:* 

* The dataset was downloaded from [Kaggle](https://www.kaggle.com/kazanova/sentiment140). We made a slight change on the original data so that each column has a label.

* The dataset is huuuuge (1.6m tweets). When you develop your data analysis codes, you can sample a subset of the data (e.g. 20k records) so that you will save a lot of time when you test your codes.

In [82]:
# your code here
import pandas as pd
sent = pd.read_csv('../data/Sentiment140.csv')

In [83]:
sent_sample = sent.head(20000)

In [84]:
sent_sample

Unnamed: 0,target,ids,date,flag,user,text
0,0,1467810369,Mon Apr 06 22:19:45 PDT 2009,NO_QUERY,_TheSpecialOne_,"@switchfoot http://twitpic.com/2y1zl - Awww, t..."
1,0,1467810672,Mon Apr 06 22:19:49 PDT 2009,NO_QUERY,scotthamilton,is upset that he can't update his Facebook by ...
2,0,1467810917,Mon Apr 06 22:19:53 PDT 2009,NO_QUERY,mattycus,@Kenichan I dived many times for the ball. Man...
3,0,1467811184,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,ElleCTF,my whole body feels itchy and like its on fire
4,0,1467811193,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,Karoli,"@nationwideclass no, it's not behaving at all...."
...,...,...,...,...,...,...
19995,0,1556975331,Sun Apr 19 01:19:14 PDT 2009,NO_QUERY,TOMurdockPapers,"Not much time off this weekend, work trip to M..."
19996,0,1556976068,Sun Apr 19 01:19:30 PDT 2009,NO_QUERY,nikibennn,One more day of holidays
19997,0,1556976167,Sun Apr 19 01:19:32 PDT 2009,NO_QUERY,eifflesummer,feeling so down right now .. i hate you DAMN H...
19998,0,1556976222,Sun Apr 19 01:19:34 PDT 2009,NO_QUERY,lomobabes,"geez,i hv to READ the whole book of personalit..."


### Prepare Textual Data for Sentiment Analysis

Now, apply the functions you have written in Challenge 1 to your whole data set. These functions include:

* `clean_up()`

* `tokenize()`

* `stem_and_lemmatize()`

* `remove_stopwords()`

Create a new column called `text_processed` in the dataframe to contain the processed data. At the end, your `text_processed` column should contain lists of word tokens that are cleaned up. Your data should look like below:

![Processed Data](data-cleaning-results.png)

In [85]:
# your code here
# Install a pip package in the current Jupyter kernel
#import sys
#!{sys.executable} -m pip install ipynb


In [86]:
from ipynb.fs.full.challenge import clean_up, tokenize, stem_and_lemmatize, remove_stopwords

### Creating Bag of Words

The purpose of this step is to create a [bag of words](https://en.wikipedia.org/wiki/Bag-of-words_model) from the processed data. The bag of words contains all the unique words in your whole text body (a.k.a. *corpus*) with the number of occurrence of each word. It will allow you to understand which words are the most important features across the whole corpus.

Also, you can imagine you will have a massive set of words. The less important words (i.e. those of very low number of occurrence) do not contribute much to the sentiment. Therefore, you only need to use the most important words to build your feature set in the next step. In our case, we will use the top 5,000 words with the highest frequency to build the features.

In the cell below, combine all the words in `text_processed` and calculate the frequency distribution of all words. A convenient library to calculate the term frequency distribution is NLTK's `FreqDist` class ([documentation](https://www.nltk.org/api/nltk.html#module-nltk.probability)). Then select the top 5,000 words from the frequency distribution.

In [88]:
sent_sample

Unnamed: 0,target,ids,date,flag,user,text
0,0,1467810369,Mon Apr 06 22:19:45 PDT 2009,NO_QUERY,_TheSpecialOne_,"@switchfoot http://twitpic.com/2y1zl - Awww, t..."
1,0,1467810672,Mon Apr 06 22:19:49 PDT 2009,NO_QUERY,scotthamilton,is upset that he can't update his Facebook by ...
2,0,1467810917,Mon Apr 06 22:19:53 PDT 2009,NO_QUERY,mattycus,@Kenichan I dived many times for the ball. Man...
3,0,1467811184,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,ElleCTF,my whole body feels itchy and like its on fire
4,0,1467811193,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,Karoli,"@nationwideclass no, it's not behaving at all...."
...,...,...,...,...,...,...
19995,0,1556975331,Sun Apr 19 01:19:14 PDT 2009,NO_QUERY,TOMurdockPapers,"Not much time off this weekend, work trip to M..."
19996,0,1556976068,Sun Apr 19 01:19:30 PDT 2009,NO_QUERY,nikibennn,One more day of holidays
19997,0,1556976167,Sun Apr 19 01:19:32 PDT 2009,NO_QUERY,eifflesummer,feeling so down right now .. i hate you DAMN H...
19998,0,1556976222,Sun Apr 19 01:19:34 PDT 2009,NO_QUERY,lomobabes,"geez,i hv to READ the whole book of personalit..."


In [92]:
sent_sample['text_processed'] = sent_sample.apply(lambda x: clean_up(x['text']),axis=1)

sent_sample['text_processed'] = sent_sample.apply(lambda x: tokenize(x['text_processed']),axis=1)

sent_sample['text_processed'] = sent_sample.apply(lambda x: stem_and_lemmatize(x['text_processed']),axis=1)

sent_sample['text_processed'] = sent_sample.apply(lambda x: remove_stopwords(x['text_processed']),axis=1)


['@switchfoot', 'http://twitpic.com/2y1zl', '-', 'Awww,', "that's", 'a', 'bummer.', '', 'You', 'shoulda', 'got', 'David', 'Carr', 'of', 'Third', 'Day', 'to', 'do', 'it.', ';D']
['is', 'upset', 'that', 'he', "can't", 'update', 'his', 'Facebook', 'by', 'texting', 'it...', 'and', 'might', 'cry', 'as', 'a', 'result', '', 'School', 'today', 'also.', 'Blah!']
['@Kenichan', 'I', 'dived', 'many', 'times', 'for', 'the', 'ball.', 'Managed', 'to', 'save', '50%', '', 'The', 'rest', 'go', 'out', 'of', 'bounds']
['my', 'whole', 'body', 'feels', 'itchy', 'and', 'like', 'its', 'on', 'fire', '']
['@nationwideclass', 'no,', "it's", 'not', 'behaving', 'at', 'all.', "i'm", 'mad.', 'why', 'am', 'i', 'here?', 'because', 'I', "can't", 'see', 'you', 'all', 'over', 'there.', '']
['@Kwesidei', 'not', 'the', 'whole', 'crew', '']
['Need', 'a', 'hug', '']
['@LOLTrish', 'hey', '', 'long', 'time', 'no', 'see!', 'Yes..', 'Rains', 'a', 'bit', ',only', 'a', 'bit', '', 'LOL', ',', "I'm", 'fine', 'thanks', ',', "how's", 

['@dougiemcfly', 'hey', 'saw', 'u', 'guys', 'play', '@', "pushover..didn't", 'get', '2', 'meet', 'u', 'tho', 'cuz', 'of', 'th', 'HUGE', 'line', '', 'i', 'was', 'very', 'upset', 'lol..a', 'msg', 'would', 'make', 'up', '4', 'it!']
['Good', 'morning...', 'I', 'wish', 'the', 'weather', 'was', 'as', 'good', 'as', 'in', 'Germany', 'today', '']
['I', 'wish', 'we', 'had', 'a', 'Dunkin', 'Donuts', 'in', 'Holland', '', 'Today', 'my', 'moms', 'back', 'from', 'Japan.', "Can't", 'wait', 'to', 'see', 'her!']
['Sooooooo', 'busy', 'right', 'now.', 'Have', 'a', 'lot', 'of', 'custom', 'orders', 'to', 'catch', 'up', 'on!', "Haven't", 'blogged', 'since', 'the', '31st.', '', 'There', "aren't", 'enough', 'hrs', 'in', 'a', 'day!', ';)']
['right', 'got', 'a', 'stinking', 'headache', 'but', 'I', 'need', 'to', 'run', "I'm", 'not', 'a', 'happy', 'bunny', '']
['sorry,', 'I', 'should', 'say', 'that', 'this', 'vid', 'hits', 'you', 'hard,', 'please', 'beware', 'the', 'last', '2', 'minutes', 'especially', '', 'http:/

['@EllenDeG', 'ellen.........', 'Do', 'my', 'messages', 'not', 'get', 'to', 'you.......', '', 'if', 'you', "didn't", 'notice', "i'm", 'sad.', 'I', 'try', 'so', 'hard', 'to', 'communicate', 'wif', 'u']
['Observe', '&amp;', 'Report', 'was', 'premiering', 'at', 'the', 'Chinese', 'Theater', 'I', 'was', 'driving', 'by', 'slowly', 'but', 'only', 'saw', 'the', 'reporters', '&amp;', 'red', 'carpet,', 'no', 'stars', '']
['Tried', 'with', 'smsjunction.com..', 'but', 'got', 'ODBC', 'Driver', 'errors.', '', '']
['@Ch0en', 'huh??', 'Like', 'what??', 'I', 'didnt', 'know?', '']
['I', 'had', 'plans', 'today,', 'and', 'now', "I'm", 'scared', 'of', 'you', 'know,', 'moving,', 'for', 'fear', 'of', 'teh', 'boke', '', 'fml.']
['Just', 'heard', "Eminem's", 'new', 'single.', "It's", 'official.', 'He', 'fell', 'off.', ':', 'Just', 'heard', "Eminem's", 'new', 'single.', "It's", 'official.', 'He', 'fell', 'off.', '']
['A', 'bit', 'under', 'the', 'weather', 'the', 'last', 'coupla', 'days', '--', 'workouts', 'have

['@Muzzzza', 'ur', 'as', 'bad', 'as', '@KateEdwards', '']
['@tombot18', 'Oh', 'dear.', 'That', 'means', 'I', "won't", 'be', 'driven', 'away', 'to', 'do', 'something', 'more', 'productive', '']
['bugger,', 'the', 'spray', 'paint', 'just', 'showed', 'up,', 'i', 'spose', 'that', 'means', 'no', 'riding', 'and', 'doing', 'work', 'for', 'me', 'now', '', '', '', 'sad', 'day!']
['got', 'up,', 'an', 'hou', 'ago,', 'now', 'lerning', 'again.', 'really', 'boring', 'stuff', '']
['@leepeesa', 'Story', 'of', 'my', 'life-stop', 'looking', 'and', 'icaisfrank', 'comes', 'again', '...', 'What', 'is', 'this', 'I', 'read', 'about', 'a', 'vet', 'visit???', '', 'Memories/photos', 'may', 'help?']
['@lwmedium', 'It', 'is,', 'apparently', 'the', 'Chinese', 'government', 'put', 'pressure', 'on', 'The', 'SA', 'one', 'not', 'to', 'let', 'him', 'in.', 'Whole', 'Peace', 'conference', 'was', 'canceled', '']
['Nothin', 'like', 'throwin', 'up', 'on', 'your', 'customers', 'front', 'door', '']
['@J_xox', 'Laterr', 'hunn!

['Going', 'deaf', 'in', 'my', 'right', 'ear.', 'Too', 'many', 'feedback', 'squalls', 'at', 'soundchecks.', '', 'In', 'with', 'the', 'earplugs']
['finally', 'broke', 'my', 'exercise', 'bike', 'this', 'morning', '', '', 'urgently', 'need', 'a', 'new', 'one...hoping', 'to', 'pick', 'one', 'up', 'at', 'lunchtime!']
['63.15', 'quid', 'for', 'a', 'rear', 'wiper', 'arm', 'for', 'our', 'car.', 'Thats', 'not', 'including', 'the', 'wiper', 'blade', 'either...', '']
['has', 'a', 'cold', '']
['Eeek!', 'Seems', "I'm", 'looking', 'for', 'a', 'new', 'job.', 'Just', 'been', 'told', 'my', 'office', 'is', 'closing', '', '']
['soo', 'tired', 'but', 'i', 'gotta', 'do', 'homework', '', 'sadface']
['@squink', 'Oh', 'yay,', 'that', 'means', "I'll", 'get', 'here', 'soon.', '&gt;_&gt;', 'I', 'never', 'used', 'to', 'suffer', 'but', 'each', 'year', 'it', 'gets', 'worse', '']
['is', 'working.', '', 'Lee', 'is', 'making', 'me.', '', 'Boooo.', '', '', '', 'Big', 'mean', 'Daddy', 'is', 'being', 'harrassed', 'by', 'c

['This', 'day', 'officially', 'sux..', 'bad', 'mood', 'and', 'stuff', 'like', 'that', '']
['I', 'hate', 'no', 'sex', 'week', '']
['I', 'think', '#nickdarcy', 'got', 'a', 'raw', 'deal!', 'Another', 'champion', 'leaves', 'the', 'pool.', '']
['Are', '@replies/mentions', 'playing', 'up', 'or', 'is', 'the', 'whole', 'Twitterverse', 'just', 'ignoring', 'me?', '']
['@SportsGirlsPlay', 'I', 'coach', 'too.', 'Forced', 'into', 'retirement', 'a', 'good', 'few', 'years', 'ago', 'due', 'to', 'injury', '', 'You', 'coach', 'in', 'the', 'US?']
['sitting', 'home', 'wishing', 'he', 'would', 'just', 'call', 'me', '']
['KENYAN', 'TEENAGE', 'Girls', 'seek', 'solace', 'in', 'cow', 'drugs', 'to', 'induce', 'abortion', '']
["I'm", 'sick', 'of', 'living', 'in', 'this', 'apartment', 'by', 'myself...I', 'miss', 'home', 'so', 'much', '']
['i', 'slept', 'for', 'too', 'long', 'last', 'night.', 'my', 'head', 'aches', 'now', '']
['hopes', 'she', 'wouldnt', 'have', 'that', 'toothache', 'again', '']
['ARGHHH', 'spent',

['@gone2dmb', 'yeah', 'i', 'had', 'my', 'old', 'username', 'back', "then...he's", 'barely', 'been', 'on', 'since', 'the', 'change', '']
['@edpryorbeatz', "F'n", 'self-fulfilling', 'prophecies!!', '']
['quitting', 'my', 'piano', 'lesson', 'this', 'month.', 'i', 'was', 'never', 'excellent', '']
['Fever,', 'cold,', 'cough', 'and', 'sunburn', '']
['Got', 'my', 'new', 'psp', 'today!', 'Woop.', 'No', 'time', 'to', 'play', 'it', 'tho', 'cos', 'im', 'at', 'work.', 'Not', 'woop', '']
['@wisteela', 'sunny', 'and', 'windy', 'here', 'now', 'the', 'rain', "didn't", 'last', 'long', '']
['We', 'are', 'on', 'a', 'one', 'way', 'road', 'to', 'hell', '&amp;', 'I', "don't", 'think', 'Obama', 'can', 'do', 'anything', 'about', 'it', '']
['I', 'just', 'hit', 'a', 'squirell', '', 'poor', 'lil', 'guy']
['is', 'BORED', 'at', 'work', '']
["Didn't", 'get', 'a', 'chance', 'to', 'buy', 'yarn', 'today', '', 'Maybe', 'tomorrow.', 'Long', 'day', 'at', 'work.']
['@rotting_orange', "I'm", 'sorry', 'about', 'your', 'job'

['@spamantha00', 'I', 'KNOW!', '', 'So', 'sad..', '', '']
['@bcslaski', 'i', 'dont', 'think', 'ive', 'logged', 'in', 'in', 'like', 'a', 'month', 'or', 'more!', 'work', 'too', 'much.', 'my', 'town', 'is', 'probably', 'all', 'run', 'down', '']
['', 'the', 'stitches', 'on', 'my', 'radiohead', 'shirt', 'are', 'coming', 'undone.', 'wahhhh']
["i'm", 'freezing', '']
['@cleolinda', 'I', 'suppose', 'there', "aren't", 'any', 'around', 'Alabama', 'though.', '']
['Standing', 'between', 'me', '&amp;', 'spring', 'break:', '6', 'days,', 'filing', 'taxes', '(I', 'owe', '', '),', 'end', 'of', 'term', 'and', 'figuring', 'out', 'grades,', 'dentist', 'today...']
["it's", 'snowing', '', "i'm", 'off', 'too', 'school', 'now.I', 'start', 'my', 'intensive', 'gym', 'class', 'today,', 'so', "i'm", 'at', 'school', 'until', '9!']
['is', 'annoyed', 'when', 'other', 'people', 'are', '&quot;too', 'busy&quot;', 'to', 'do', 'their', 'work,', 'so', 'they', 'ask', 'me', 'to', 'do', 'it.', 'not', 'fair!', '', '']
['@justi

['in', 'pain', 'right', 'now.', 'expecting', 'more', 'pain', 'in', 'due', 'time.', '']
["It's", 'his', 'BD', 'today..wish', 'i', 'was', 'there', 'instead', 'of', 'here', '']
['@_BoyWonder', 'sorry', '']
['Three', 'new', 'followers', 'in', 'the', 'space', 'of', 'one', 'minute', '(no', 'kidding)', 'and', 'all', 'of', 'them', 'appear', 'to', 'be', 'spammers.', '']
['Just', 'realized', 'she', 'doesnt', 'have', 'a', 'copy', 'of', 'the', 'novel', 'for', 'empire', 'tomorrow', '']
['another', 'snowy', 'day', 'in', 'northern', 'ohio...', '']
['Dad', 'now', 'wants', 'oxygen', 'for', '&quot;when', 'he', 'needs', 'it,&quot;', "Doesn't", 'want', 'to', 'be', 'dependent', 'on', 'it.', "Can't", 'quit', 'smoking.', '']
['i', 'lost', 'my', 'flash', 'drive!', '', 'I', 'am', 'so', 'irritated', 'with', 'myself.', 'my', 'novel', 'is', 'on', 'it.', '']
['Class', 'At', '1', 'Then', 'Work', 'At', '2.....Gonna', 'Be', 'A', 'Long', 'Boring', 'Day', '']
['what', 'rocks', '-', 'estee', 'lauder', 'bronze', 'goddess

['I', 'feel', 'like', 'someone', 'punched', 'me', 'in', 'the', 'lip!!', '']
['@IAMtheCOMMODORE', 'You', 'guys', 'should', 'do', 'your', 'acostic', 'performance', 'at', '5!', 'I', 'have', 'a', 'championship', 'meet', 'tomorrow', 'that', "won't", 'end', 'til', 'then.', '', '-Cami']
['is', 'in', 'his', 'dorm', 'ALONE', 'watching', 'a', 'movie', '']
['Finally', 'met', 'best', "friend's", 'fiance.', '']
['Everyone', 'wants', 'froyo', 'tonight', 'too...', 'so', 'I', 'has', 'to', 'wait', 'on', 'this', 'long', 'line', '']
['@jo_anie', 'Oye!!!', 'Me', 'too!!!', '', "I'm", 'also', 'pumping...', '']
['@rainbow_sauce', 'ya', 'srsly', '', 'but', 'I', 'had', 'to', 'kiss', 'TVXQ', 'members,', 'it', "wasn't", 'that', 'bad', '8)..']
['......', 'Oh', 'how', 'i', 'love', 'being', 'awake', 'at', 'stupid', "o'clock", 'when', 'everybody', 'else', 'is', 'in', 'the', 'land', 'of', 'nod...', '', '']
['I', 'miss', '@yuukicherry', 'so', 'bad.', 'hope', "she'll", 'be', 'home', 'soon.', '']
['icing', 'my', 'ankle'

['@Ivonnebrok', 'thanks!', 'Pms', 'suck!', '']
['winkler', 'tomo!', 'have', 'to', 'get', 'up', 'at', '6', '']
['@GabrielSaporta', 'i', "couldn't", 'make', 'it', 'out', 'to', 'see', 'you', 'guys', 'tonight;', 'mega', 'bummed', '', 'hope', 'it', 'was', 'a', 'good', 'show', 'though!', "i'll", 'see', "y'all", 'next', 'time.']
['I', "don't", 'understand', 'why', 'some', 'of', 'the', 'best', 'people', 'in', 'the', 'world', 'are', 'thrown', 'to', 'the', 'ground', 'by', 'evil...', 'I', 'wish', 'life', 'was', 'fair', 'to', 'all', 'who', 'are', 'good', '']
['My', 'dislocated', 'toe', 'has', 'decided', 'to', 'redislocate', 'itself', 'so', 'now', 'its', 'reverse', 'dislocated', '.', 'And', 'it', 'hurts', '.', '']
['Saturday', 'morning', 'promotion:', 'if', 'there', 'was', 'only', 'one', 'firefox', 'plugin,', 'it', 'should', 'be', 'http://www.feedly.com', '(but', 'it', 'eats', 'up', 'my', 'life', '']
['@benjorg', 'Benny,', 'when', 'is', 'AFS', 'gonna', 'start', 'touring', 'again!?', 'I', 'miss', 'a

['good', 'morning', 'all!', 'finally', 'home', 'from', 'work...', 'geez', 'the', 'night', 'from', 'hell!', 'and', 'wishing', 'i', 'could', 'have', 'gone', 'to', 'the', 'coachella', 'music', 'fest!', '']
['Flight', 'delayed', 'from', '9:10', 'to', '10:34', '', '', 'http://twitpic.com/3iukn']
['Today:', '1.', 'Dentist', '', '2.', 'Downtown', 'with', 'my', 'girlies', '3.', 'Famous', 'videoshoot', '4.', 'Go', 'back', 'home', '&amp;', 'work.']
['rt:', '@mini_mojo', 'Today', "isn't", 'looking', 'much', 'better', 'in', 'Houston.', 'Sorry', '', 'http://tinyurl.com/c346kx']
['@mrmackenzie', 'They', 'could', 'be', 'the', "queen's", 'ferrets', 'and', "they'd", 'still', 'not', 'be', 'nice', 'as', 'far', 'as', "I'm", 'concerned.', "I'm", 'no', 'a', 'ferret', 'lover', "I'm", 'afraid', '']
['I', 'have', 'nothing', 'to', 'do', 'today', '']
['@QUADTHECOMPOSER', 'I', 'burned', 'myself', 'on', 'this', 'giant', 'dbl', 'sided', 'flat', 'iron', 'grill', '@', 'work...I', 'was', 'reaching', 'n', 'nudged', 'it

['my', 'cat', 'bit', 'me.....', '', 'evil', 'thing..........though', 'i', 'was', 'annoying', 'him....oops....']
['still', 'a', 'bit', 'under', 'the', 'weather', '', 'hope', 'that', 'i', 'get', 'well', 'soon']
['Back', 'home', 'from', 'the', '#coktup', 'and', 'a', 'half', 'hour', 'wait', 'at', 'the', 'railway', 'station..', 'Now', 'I', 'see', 'I', "should've", 'gone', 'back', 'there', '']
["It's", 'Record', 'Store', 'Day', 'and', "I'm", 'out', 'of', 'money.', '']
['@awarren88', 'I', 'SO', 'wanted', 'to', 'go', 'to', 'Calico', 'Ft.,', 'but', 'we', 'just', 'got', 'up', 'and', 'got', 'the', 'message', 'and', 'they', 'have', 'already', 'left.', '', "I'm", 'SO', 'bummed', 'now.', '']
['Watching', '@moospeare', 'mow', 'the', 'lawns', 'and', 'feeling', 'useless', 'unable', 'to', 'help', 'out', '']
['@Fearnecotton', 'Only', 'good', 'music', '!!', 'Too', 'bad', 'I', "don't", 'live', 'in', 'the', 'UK', 'anymore...', '']
["Can't", 'find', 'my', 'brown', 'shoes.', '']
['is', 'spending', 'the', 'who

['back', 'from', 'canada', '', 'ugh.', 'i', 'lost', 'my', 'kitty.', 'i', 'hate', 'life!']
['@rebekahmitchell', 'adorable!', 'Wish', 'I', 'could', 'be', 'there', '']
['Im', 'so', 'sick', 'of', 'bein', 'lonely', '']
['@klarokaro', 'Having', 'a', 'hell', 'of', 'a', 'job', 'finding', 'any', 'frame', 'in', 'my', 'size', '', 'Keen', 'on', 'the', 'Time', 'Speeder', 'tho']
['Thts', 'al', 'i', 'wana', 'hear!', 'Man', 'im', 'such', 'a', 'bad', 'wife', '']
["I'm", 'still', 'tired', '', 'I', 'was', 'up', 'til', '5am.', 'Gonna', 'meet', 'my', 'friend', 'at', 'jumpin', 'java', 'in', 'a', 'lil', 'while', 'so', 'I', 'gotta', 'up', 'and', 'at', 'it!']
['@quocbao', 'em', 'th?y', 'd�ng', 'b?n', '�', 's??ng', '', 'nh?ng', 'm�', 'b?n', '�', 'ko', 'h?', 'tr?', 'wifi', '', 'Vn', 'l?i', 'ch?a', 'c�', '3G', 'n�n', 'c?m', 'gi�c', 'h?i', 'thi?u', 'th?n']
['@firsttiger', 'no', 'we', 'did', 'not', 'win', '']
['@LorraineStanick', 'there', "doesn't", 'seem', 'to', 'be', 'many', 'gurus', 'coming', '']
['my', 'left', 

['@_EdwardCullen_', 'bye', '']
['@nileylovestory', 'hey', 'whatd', 'u', 'video', 'say?', 'Im', 'sick', 'in', 'bed', '&amp;', 'cant', 'watch', 'it', '', 'explain?']
['No', 'commencement', 'speech', 'para', 'mi...', '']
['Nothing', 'happening', 'D', 'is', 'MIA', '']
['is', 'bored', 'cleaned', 'all', 'day', '']
['@disagreer', 'sorry', 'bout', 'the', 'caps.', '', 'they', 'can', 'come', 'back!']
['has', 'been', 'cleaning', 'the', 'house', 'off', 'and', 'on', 'all', 'day.', '', 'It', 'is', 'getting', 'there.', '', 'LONG', 'way', 'to', 'go.', '', '']
['@Schofe', 'I', 'went', '2', 'the', 'NHM', 'in', 'December,', 'took', 'my', 'boyf', "who'd", 'never', 'been!', 'Didnt', 'get', '2', 'c', 'the', 'dinos', 'tho.', '']
['Everyone', 'is', 'enjoying', 'the', 'nice', 'day', 'outside', '', 'I', 'hope', 'my', 'day', 'gets', 'better']
['bought', 'kim', "kardashian's", 'workout', 'dvds', 'lol.', 'exciiited', 'to', 'do', 'them.', 'movie', 'marathon', 'night', 'tonight!', 'its', 'soo', 'rainy', 'outside', '

['Between', 'moments', 'of', 'greatness,', 'good', 'friend', "Pam's", 'mom', 'just', 'passed.', '', 'Great', 'woman,', 'HUGE', 'contributor.', 'Changed', 'many', 'lives.']
['Visiting', 'the', 'invalid', '@maxfisher', 'in', 'DC', '']
['', 'still', 'loading.', '#asot400']
['In', 'line', 'for', 'Toy', 'Story', 'Mania', 'at', 'Disney', 'before', 'work.', '', 'Love', 'this', 'ride.', '', 'Hate', 'having', 'to', 'work', 'really', 'late,', 'though.', '']
['What', 'a', 'day.....running', 'around....bro', 'lost', 'his', 'game', '', 'and', "i'm", '@', 'work', 'for', 'd', 'night.....']
['Wet', 'hair', '+', 'No', 'Hair', 'Dryer', '=', '', '...', 'Dry', 'fucker,', 'DRY!!!']
['@SongzYuuup', 'wat', 'part', 'of', 'tn', 'plz', 'say', 'memphis', '']
['has', 'been', 'watching', 'the', 'film', 'The', 'Visitor', 'and', 'is', 'feeling', 'depressed.', '']
['@hayleyparamore', 'nt', 'long', 'til', 'u', 'gys', 'r', 'on', 'tour', 'wit', 'NO', 'DOUBT!', 'gutted', 'there', 'r', 'no', 'uk', 'dates', '', 'seen', 'an

['twitter!', 'i', 'feel', 'like', 'im', 'running', 'away', 'from', 'you,', 'forgive', 'me?', '']
['@nkhare', 'where', 'are', 'you', 'playing?', '', 'I', 'had', 'a', 'dream', 'last', 'night', 'that', 'I', 'shot', 'an', '84...only', 'a', 'dream.', '']
['i&quot;m', 'going', 'crazy!!!', 'somebody', 'PLEEEZE', 'help', 'me!!!', 'i', "can't", 'upload', 'my', 'pic!!', '']
['So', 'sad', 'I', 'tried', 'on', 'like', 'a', 'million', 'things', 'at', 'Nordstrom', 'and', 'nothing', 'worked!', '']
['is', 'very', 'mad', 'twitter', "won't", 'let', 'me', 'uplode', 'my', 'pic....', '']
['....elderly', 'that', "I'm", 'terrified', 'of', 'becoming.', 'I', "don't", 'mind', 'healthy', 'YOUNG', 'elderly', 'people.', 'But', 'this.', 'This', 'is', 'my', 'absolute', 'fear.', '&amp;', 'I', "can't", 'quit', '']
['((SIGH))', 'I', 'SOOOO', 'AM', 'NOT', 'IN', 'THE', 'MOOD', 'FOR', 'TODAY....NOPE', 'NOT', 'AT', 'ALL.', '']
['Awwweeee', 'poor', 'White', 'Castle', 'Steve.', '']
['please', 'give', 'me', 'video', 'stream', 

['So', 'sad', 'bout', 'chuck', '']
['Trying', 'to', 'upload', 'a', 'pix', 'but', 'having', 'trouble', '']
['this', 'is', 'the', 'only', 'thing', 'i', 'hate', 'about', 'spring', '', '..', 'i', 'was', 'fine', 'then', 'i', 'walked', 'outside', 'today.', '.instant', 'sore/scratchy', 'throat', 'and', 'iritated', 'eyes!']
['@champ711', 'i', 'wish', 'we', 'could', 'figure', 'out', 'this', 'picture', 'thing', '']
['@giagiagia', "You're", 'lucky', "you're", 'in', 'a', 'very', 'cold', 'place.', '', "I'm", 'meltingggg.']
['@snowmask', 'omg', "it's", 'so', 'early', 'there', '', 'did', 'you', 'see', 'my', 'gift?', ':3']
["I'm", 'pretty', 'much', 'the', '&quot;saddest&quot;', 'person', 'in', 'the', 'world', 'right', 'now,', 'just', 'because', 'this', 'is', 'eventually', 'ending', '', '#ASOT400']
['@ashleyrwatts', '15-20', 'fighting', 'and', 'i', 'have', 'a', 'feeling', 'the', 'vast', 'majority', 'are', 'going', 'to', 'become', 'customers', '', '&lt;3']
['doesnt', 'feel', 'well', '']
['Watching', 'Sl

['@TraceyJ305', "I'll", 'call', 'you', 'tomorrow', 'afternoon!!', 'You', "don't", 'have', 'a', 'nudge', 'link!!', 'Or', 'I', "can't", 'find', 'it...', 'Hmmm', '']
['im', 'lonely', '', 'keep', 'me', 'company!', '22', 'female,', 'Cleveland', '-', 'Ohio']
['@raindx', 'I', 'know!', 'I', 'know!', '']
['Just', 'had', 'an', 'intervention', 'with', 'one', 'of', 'my', 'closest.', 'Alcohol', 'is', 'great,', 'but', "we're", 'getting', 'worried...', '']
['is', 'so', 'tired', 'of', 'working', 'Sun!!!', '']
['ugh', 'management', 'project...', 'back', 'to', 'Tech', 'tomorrow..', "I'm", 'going', 'to', 'miss', 'my', 'puppy', 'TONS!!!!!', '']
['Cell', 'phone', 'is', 'dead!', 'My', 'life', 'is', 'so', 'hard.', '']
['i', 'have', 'really', 'crusty', 'feet,', 'ah', 'awoken', 'yet', 'again', 'by', 'loud', 'drunks', 'and', 'its', 'not', 'even', 'summer', 'love', 'the', 'shore', '']
['@youngdefective', '*hugs*', '']
['Hmm', 'im', 'usually', 'dead', 'right', 'about', 'night...ugh', 'skool', 'monday...looong', '

["There's", 'a', 'girl', 'here', 'who', 'looks', 'like', 'Daisy', 'from', '&quot;Rock', 'of', 'Love', '2.&quot;', 'Lol', '']
['my', 'room', 'is', 'a', 'fucking', 'mess.', 'also,', 'I', 'my', "sister's", 'hamster', 'died', 'last', 'night', '']
['Not', 'feeling', 'well...', '']
['my', 'saturday', 'night', 'ended', 'hella', 'early...', '', 'Kyle', '=', 'sad', 'panda']
['All', 'packed', 'and', 'ready', 'to', 'go', '-', 'wake', 'up', 'time', '4:30', 'am', '', '', 'BOO!!!']
['@ibreathesbs', 'mine', 'was', 'long', 'and', 'boring...', 'just', 'studying', '']
['i', 'wish', 'you', "would've", 'stayed', 'in', 'Boone', 'this', 'weekend', '']
['i', 'have', 'a', 'cold', '', 'thanks', 'to', "Montreal's", 'stupid', 'weather']
['@maxpreusse', 'aired', '2006', '']
['hanging', 'with', 'the', 'hubby', 'and', 'my', 'babygirl', 'just', 'watched', 'the', 'blazer', 'game..', 'so', 'sad', '']
['i', 'want', 'my', 'psp', 'back.', '']
['my', 'piercing', 'is', 'bleeding', 'yet', 'again.....', '']
['Trtying', 'to',

['@ixlr8', 'jeff', 'says', 'no,', 'its', 'slang.', 'I', 'could', 'see', 'where', 'youre', 'both', 'right', 'and', 'anyway', 'that', 'turn', 'is', 'over.', '', 'I', 'am', 'winning', 'anyway', '']
['Closing', 'time', 'at', 'zeros', 'boo', '', 'time', 'to', 'chug', 'those', 'beers', 'fuckers']
['wrote', 'the', 'fucking', 'greatest', 'song', 'IN', 'MY', 'SLEEP', 'last', 'night', 'now', "can't", 'remember', 'it.', 'school', 'goes', 'back', 'tomorrow', '']
['Feel', 'tired', '&amp;', 'hopeless,', 'wanna', 'crawl', 'back', 'in2', 'bed', '&amp;', 'sleep', '', 'ZzZzZ,', 'Gonna', 'finish', 'up', 'some', 'homework', 'first', 'tho!', 'haha,', 'I', 'hope', 'this', 'week', 'goes', 'fast!']
['MyLeaky', 'finish', 'dusting', 'already!', '']
['@cassieventura', 'dude', 'your', 'so', 'awesome', 'but', 'we', 'dont', 'hear', 'much', 'of', 'you', 'over', 'here', 'in', 'N.Z', '']
['@zackalltimelow', '', 'Now', 'why', 'in', 'the', 'hell', 'would', 'people', 'make', 'fun', 'of', 'you,', 'Zack?', "I'm", 'literall

['@ameym21', 'ur', 'nice', 'dad', 'miss', 'the', 'train', '', '@tai927', 'your', 'so', 'hot']
['dang', 'it...', 'study', 'break', 'is', 'almost', 'over', 'already', '']
['Not', 'fun', 'transferring', 'tweetdeck.com', 'to', 'new', 'hosting', '', 'Should', 'be', 'there', 'now', 'for', 'most...apologies', 'to', 'those', 'affected,', 'working', 'on', 'it']
['@jyk595', '...she', 'felt', 'bad', 'for', 'you.', '']
['yeah', 'im', 'goin', 'to', 'sleep...my', 'phone', 'didnt', 'listen,', 'no', 'sweat', 'it', 'out....', '']
['Watched', 'Marley', '&amp;', 'Me', 'this', 'morning,', 'gets', 'me', 'everytime', '']
['i', "can't", 'find', 'my', 'sister', 'on', 'twitter', '', 'makes', 'it', 'harder', 'to', 'stalk', 'her']
['Have', 'giant', 'headache', '']
['@Mustin', 'I', 'have', 'had', 'the', 'same', 'kind', 'of', 'failures', 'due', 'to', 'expired', 'ingredients.', '', 'Too', 'bad.']
['@ceraaa', 'I', 'work', 'down', 'at', 'a', 'local', 'pub', 'just', 'like', 'washing', 'dishes', 'and', 'stuff', 'on', '

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  sent_sample['text_processed'] = sent_sample.apply(lambda x: clean_up(x['text']),axis=1)



andywana not sure what they are only that they are pos as much as i want to i dont think can trade away company assets sorry andy 
oanhlove i hate when that happens 
i have a sad feeling that dallas is not going to show up  i gotta say though youd think more shows would use music from the game mmm
ugh degrees tomorrow 
where did u move to  i thought u were already in sd  hmmm random u found me glad to hear yer doing well
batmanyng i miss my ps its out of commission  wutcha playing have you copped blood on the sand
just leaving the parking lot of work 
the life is cool but not for me 
sadly though ive never gotten to experience the post coitus cigarette before and now i never will 
i had such a nice day too bad the rain comes in tomorrow at am 
starrbby too bad i wont be around i lost my job and cant even pay my phone bill lmao aw shucks 
damm back to school tomorrow 
mo jobs no money  how in the hell is min wage here  fn clams an hour
katortiz  not forever see you soon
lt_algonquin ag

jbeauty lol   goodnight
im laying in bed facing the wall and trying to relax but im hearing so many things plus the air conditioning sound is so louad 
someone needs to give this baby a home i would but its a bit too soon 
in bed finally  long day tomorrow 
cant sleep ugh if this is going to be a trend ill need to find something to do with my wakeful night hours read learn a language 
airlie is  sick   and now mason is sick too he is going to miss his school disco 
our trip has been canceled see the latest blog post for info 
just saw  little and  huge stingray in footdeep water off catseye beach at low tide no turtles yet  
have to update my picture cos i look old and fat  oh i am old and fat  playing badminton is not working on weight 
just saw some snow flakes 
lol honeybaby i sound like a nyquil commercial that word only looks right with a green background i want to sleep but i cant 
feeling soree bad idea to go running when your sick 
t___t need more sleep but my body wont let me 

work time 
im not ready for school yet 
todays plan driving back to vienna spring cleaning in my flat 
hippydi thats sad 
suck  when you know you havent done anything wrong  but it feels like you have 
dian_yach i would love to swim but dont know how  
mmmm i want eatser show on the th not th   this is bad for me  maddie d
today im really amazed scared by how many basic errors i find on ecommerce websites wrong error messages no loading icons etc 
why wont twitter let me change my picture 
zoeaimee im trying to research some dude for my english assignment that is due tomorrow and my internet keeps freezing and wont load 
just got up pshhh going on the trampoline even though i am all sore 
homework 
pmarnandus re daily gossip well the twitter gossips are mostly from e online which i cannot access 
 this is ashley from kicks afternoon show hes not impressed with the twitter 
islandiva i sent u a tweet yesterday but i dont know why it didnt work  i guess youre sleeping right now i am work

kariajay when i was in school i thought id be a millionaire at  and retire at   look at me today p
godfrey_gda yup  exams and  coursework kmftplus my pratical nxt mtnh  how woz ur driving lesson knocked over any pigeons p
on my way to work 
cuzza that sounds almost as good as nandos  related nandos have increased prices again 
urrrr i should be waking up right now instead im just going to bed  too much on the brain
an opportunity has been given all i can do i wait and see what happens but from past experiences its not looking good 
jessicastrust sorry ive run out of milk 
jacvanek are you a superpupermegaovermodel sorry idk precisely  but thank you for adding me here and on myspace p
colinlefevre still have never done a brick tour 
heeeby jebus im on twitter but oddly no one else i know is seems a lil pointless 
search for chametz finished last bread meal tomorrow morning 
angeluserro lol yeah i will be ok i think my dog just scratched my eye like my actual eye ball everything is blurr

fruittree i know the poor thing mustve been so scared all these huge things coming towards it 
accidentaly overslept   had to drive this morningstare at the sun    page paper to start and finish in an hour oh joy
why i cant find dear friend  i need her to tweet with me  help me people 
how is it pm already not enough done 
exams suck 
javastix as of now it still aint up yet 
is halfway through quota thousand splendid sunsquot and already thinks its brilliant and so very sad 
its snowingno more needs to be said 
urgh  look outside  its winter wonderland all over again 
trip to castle turned into a  mile round trip with the kids being sick in the back of the car 
is wishing hed brought some of the chocolaty treats to work 
snow really makes me sad 
i spent all of last night waking up every  seconds because i was nervous about registration 
damn i just broke that wooden writing pad which i usually uses when sketching and writing while on bed or relaxing on chair 
drink exploded in my bag 

 pm still at musica sigh 
still cant find any of my friends 
ϛ k  pouty face shitty day out in boston again ugh no wonder im sick
in math klass bored as hell  urghhhhh i hate math
uughhhh back at work help please 
hello everyone im sorry about yesterday all my updates kept disappearing 
my hand hurts from playing so much last night  
oh scratch that i just lost two followers boo 
working from homesons allergies are killing him   drop by for a visit wwwideastormcom
i can already teel that its going to b a rough day and its not even  yet 
need to get ready for work 
mcm why no hi to me 
so having to face the outcome of my last tweetlate nightsearly monrnings being a bad stuwart of my college opportunity 
is really ill 
i hate snowwhy did it have to come back   seeing monsters vs aliens tonight with ashleebaby
so we got the images back for the magazine spread woop woop would love to share details but we cant  until the editorial comes out
the_rooster im with you terrible injuries and deva

figuring out whats up for today whys it cold 
mam i need me some sleep stuck in class and wish i was in bed 
i hate picking out desks   i should just build themit would be  times easier
destragarcia come back to the states for orlando carnival since i cant make it to miami 
ninfreak i called a few times yesterday   now i cant find my cell it might be out in the car
ugh i am sick for the second time in a week this one is worse  
i still dont know who jadakiss is 
he called me a shorty 
waiting for my wifey to get out of work im home alone 
apriloj oh no is there any way of getting them after they sell out were only gonna be able yo afford regular tickets 
dissertation and it is hatefull 
well i did take a personal day today hurts really bad when i cough and i cough a lot less when i am not moving 
im so torn between the bold my  amp the  
alisonl theres really a group like that they should blend the two words together to form  knitter oh wait i guess thats normal 
i love the sun i hate 

getting ready  work 
still wondering what happened to gerard butlers page  
the reason i left you tonight was that i thought you left me first all alone even carlos took off 
erardo yeah  more hours maybe sooner please pray  of my health is all i need to make money but ban im developing a minor cough 
only two days of vacation left 
at the wally till am please come rescue me 
ywatching a ovie and horny 
watching tron and then trying to get some sleep before this horrible weekend begins 
parakeetluvr im sorry hun  i wish i could help
sammarinucci i saw the trailer for my sisters keeper at  again it looks really good but im not really a fan of the cast 
well time to pack up now 
you didnt come outside to do meet and greet tonight 
kaykay i guess everyone went out nite 
emilyk the sec on the left opposite side of jonathanrknight  i was totally bummed but i talked to him at the mampg and he looked for me
destinyjoyful deessss i texted you and now i am about to go to bed  check your texts i

well this is bullshit and i dont know how to fix it  sorry you guys virusy 
aiannucci i was going to go and see your film but then michael portillo said they he didnt like it so now im not sorry armando 
katie we are so srry forgive us 
okay  its ridiculous to print all of my photos kodak were charging me  wtf i only ended up printing a few 
seems tweetdeck wont show tweets in chinese 
mangerial accounting  
elliotminor  no tour to little barrow this time the canteen will miss ya  good luck guys xx
i hate ppl i hate ppl who tear down trees i hate ppl who tear down trees for unnecessary construction decades of growth gone ughh 
dopemaneazyecpt  what type of work u do darliiiing  hangover  
no gigs tonight  but sleep seems like a good idea
nobody tells the difference with my hair 
pervetastic aw  he just won an aerial o
njoku sounds good im still working thru easter chocs 
looks like i cant find anyone to host tonights broadcast therefore its cancelled sorry guys 
its a new day and i cou

sangitashres me too   but ive accepted that my breathing and stomach will never be 
ballenegger ok ill stop retweeting so much forgive me 
 uh oh i bought the wrong deo waaaah i dont want my kilikili to smell like luya 
tiffheartzero yeah apparently so i noticed a ring on his finger at the show 
downloading latest yahoo messenger for mac now couldnt get tweetdeck to install 
i think the east coast is stealin all the good weather 
yum yum eating chocolate visited ol dschool was very sad 
theellenshow im so excited wish i had a pet 
panique aww  well im new anyway so im not much fun to hang out with on there since my level is so low lol
heartbreakable boo boo bee boo 
taking a shower and getting ready for the orange and blue game my last one in college 
wants to go downtown and take photosbut has no one to hang with 
thehza i had a classic tr convertible roadster totaled one night by a drunk woman in a huge buick parked on the street at gfs sob 
amazingphoebe i cant see you tomorrow morn

gbfiremelon ah well neverending story is a worthy winner was looking forward to reading your analysis of the arkwright books though 
tinkugallery  call for help if you need it tweet when you get home
has nothing to do todaay  uggh
in two days the fun shall cease 
good morning well after noon i feel all stuffy and congested  seeing my uncle who i havent seen in about  years today
im hungry and lonely 
gonna be up till  hubby gotta wake up  go  bombay amp i dont want him  miss it but mom dun get it im nt awake coz i like it ma 
sebby_peek okay goodnight i love you so much xxxx im sorry too 
diegosecond  
tryna dnwload the beatksking app for my g but there is no wifi right now sonofva amp i have a sick beat in my head no fair 
stuck at work  perfect weather outside
just got my pupils dilated  wtf  my eyes are quottwippinquot 
bad time management 
going to kyles ceremony for his deployment to iraq sad day 
nprscottsimon really wish i could download the radio bits and not have to stream int

had a great time with sarah going to moes in ormond beach  i wish i could have spent more time with her but finals are almost here 
im lonely  keep me company  female cincinnati  ohio
on my way home from terra haute p im so sore 
still a headache at  
jamesp lol thats nastyyy nothing wrong with being a fan of someone 
saragarth you laughed on that poor trees facebranch omg how rude of you hahahaha poor tree must be a boring stationary life 
i get to wear a skirt again tonight yay my wonderful pants are dirty 
myapplestuff bad head in the morning 
nikkaypandarr hey youre not the only one ive been sulking because my websites are down 
asot anyone got videos i aint got 
monkeyclap lol i do but its like even when i sleep loads i stil get them its weird and annoyin 
glad we fit the game in but fighting a headache now 
im home 
hope his show doesnt move to fridays tho whats gonna be on telly on saturdays then 
im getting this unsettling feeling im the antidote to tom_miller 
again back to ub

im lonely  keep me company  female jacksonville  florida
thamanagement oh no whats wrong pumpkin 
cant get this stupid table to stand up right it keeps falling over 
spencerpratt i show you so much love homie but you never respond 
guitar i know 
pepperfire trying to send dm but wont work 
sad that im missing nacho fest right meowwwzzz 
just watched the notebook its so sad 
the_real_shaq im very upset coz my brother just got kicked out of work n i really dont know what to say to him 
loves spending saturday nights writing an english paper 
so ill just be waiting here for  hours for my music to copy on to my other computer 
waking up i am so tired 
melodyanderson melody  i want to help u
been told i look like phil collins not for the first time 
dozens ideas for bfiction average amount of writing skills lack of free time 
whit a lot of negative energy in my mind fear for the worst 
still at  disappointed with how badly kbps shoutcast works on my phone over g 
in bed listening to ipod no

barrybarrybarry going to vegas for the st time and learned barry is not playing june awe that is so disappointing  awe
doing a little cleaning before bed my head hurts 
 minute chilisehh pretty pathetic 
deasaurr iya dong xd haha  omfg aku nangis baca itu  sad and absolutely beautiful lt
mikaylamiles where do you find your motivation  i used to run everyday  now i have to practically force myself to exercise 
chanelwestcoast hay im cc also aka walsh talk to me during work it must be a very very very very hard job 
kursed how mean 
nick_carter nick why dont say hi to me 
 more day of spring break  wish it could have lasted evr 
phantomv too far away  nearest is in dubuque 
ohhhhi spent too much 
therealtiffany  im sorry so how are you
a walk to remember 
to ravengeordie amp whizaway only made it  miles  it was way too hot and i drank too much water at mile  and had to battle cramps 
so not happy with direct messaging and tweetdesk  they just wont work for me maybe im just twitter stupid

bored outta my brains 
at lurelost my chanel ring 
so the gig on the st got cancelled apparently someone doesnt want be in the same line up as us 
my nvidia card was giving me fits so i pulled it now my pevm is having a hard time believing it can do x 
had a blast in austin with katie six and ashley back in dallas for one more night of texas tomorrow its back to pa 
im cold   its  degrees outside and the dang air conditioner is onwtf
sunburntdoe most likely on tuesday or wednesday  i have a check up on tuesday at  amp i dont know how long thats going to take
chelsea_playboy i dont know  but its fixed now 
joe and jon did the butt dance together n i thought i was recording it but i didnt push the button all the way so i didnt get it 
is unnecessarily intoxicated and sruff 
thoughts on the new noisettes im missing the crashing drums and screeching vocals 
had a great day overall missing my baby to cuddle with until morning 
uhmmm no video   thanks anyway  asot
sarahdownsouth well if you 

moondio your hip  por que
studyin for my intro to aampp test   so glad semester is almost over
twistedrufus   im sorry  yea im totally free whenever you wanna talk  just please no shooting  id be sad
globalisgroup i didnt 
i wanna go home 
hitting the weights hard now benching  grams 
zackalltimelow im sorry for whatever people did to u 
 i know its bad and will be painful but i really want that jaw surgery sucks that braces cant even fix my underbite 
ehh im totally frustrated with life at the moment 
have no idea what to dotired n lazy 
i wish i had verizoni want that new blackberry click phone 
mewilkes okay i just read the yw part i respectfully withdraw 
roobies holy shit i thought that was the  but the entries looked so irrelevant i thought it was the wrong tag  what have they done
nicolerichie yeah i cried though 
psam  weve got too many resources these days but we aint doing anything  lets do somthing
found my red pen but not before it got all over my new white tank top  copyed

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  sent_sample['text_processed'] = sent_sample.apply(lambda x: tokenize(x['text_processed']),axis=1)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  sent_sample['text_processed'] = sent_sample.apply(lambda x: stem_and_lemmatize(x['text_processed']),axis=1)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
 

Unnamed: 0,target,ids,date,flag,user,text,text_processed
0,0,1467810369,Mon Apr 06 22:19:45 PDT 2009,NO_QUERY,_TheSpecialOne_,"@switchfoot http://twitpic.com/2y1zl - Awww, t...","[switchfoot, switchfoot, awww, awww, thats, bu..."
1,0,1467810672,Mon Apr 06 22:19:49 PDT 2009,NO_QUERY,scotthamilton,is upset that he can't update his Facebook by ...,"[upset, upset, cant, cant, updat, update, hi, ..."
2,0,1467810917,Mon Apr 06 22:19:53 PDT 2009,NO_QUERY,mattycus,@Kenichan I dived many times for the ball. Man...,"[kenichan, kenichan, dive, dived, mani, many, ..."
3,0,1467811184,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,ElleCTF,my whole body feels itchy and like its on fire,"[whole, whole, bodi, body, feel, feel, itchi, ..."
4,0,1467811193,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,Karoli,"@nationwideclass no, it's not behaving at all....","[nationwideclass, nationwideclass, behav, beha..."
...,...,...,...,...,...,...,...
19995,0,1556975331,Sun Apr 19 01:19:14 PDT 2009,NO_QUERY,TOMurdockPapers,"Not much time off this weekend, work trip to M...","[much, much, time, time, thi, weekend, weekend..."
19996,0,1556976068,Sun Apr 19 01:19:30 PDT 2009,NO_QUERY,nikibennn,One more day of holidays,"[one, one, day, day, holiday, holiday]"
19997,0,1556976167,Sun Apr 19 01:19:32 PDT 2009,NO_QUERY,eifflesummer,feeling so down right now .. i hate you DAMN H...,"[feel, feeling, right, right, hate, hate, damn..."
19998,0,1556976222,Sun Apr 19 01:19:34 PDT 2009,NO_QUERY,lomobabes,"geez,i hv to READ the whole book of personalit...","[geezi, geezi, hv, hv, read, read, whole, whol..."


In [93]:
sent_sample.head()

Unnamed: 0,target,ids,date,flag,user,text,text_processed
0,0,1467810369,Mon Apr 06 22:19:45 PDT 2009,NO_QUERY,_TheSpecialOne_,"@switchfoot http://twitpic.com/2y1zl - Awww, t...","[switchfoot, switchfoot, awww, awww, thats, bu..."
1,0,1467810672,Mon Apr 06 22:19:49 PDT 2009,NO_QUERY,scotthamilton,is upset that he can't update his Facebook by ...,"[upset, upset, cant, cant, updat, update, hi, ..."
2,0,1467810917,Mon Apr 06 22:19:53 PDT 2009,NO_QUERY,mattycus,@Kenichan I dived many times for the ball. Man...,"[kenichan, kenichan, dive, dived, mani, many, ..."
3,0,1467811184,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,ElleCTF,my whole body feels itchy and like its on fire,"[whole, whole, bodi, body, feel, feel, itchi, ..."
4,0,1467811193,Mon Apr 06 22:19:57 PDT 2009,NO_QUERY,Karoli,"@nationwideclass no, it's not behaving at all....","[nationwideclass, nationwideclass, behav, beha..."


In [103]:
# your code here
from nltk.probability import FreqDist

words = [w for lst in sent_sample['text_processed'] for w in lst]

fdist = FreqDist(words)

top5000 = fdist.most_common(5000)

top5000

[('im', 4912),
 ('go', 3272),
 ('wa', 2973),
 ('work', 2815),
 ('get', 2727),
 ('day', 2552),
 ('dont', 2166),
 ('cant', 2150),
 ('like', 1945),
 ('today', 1886),
 ('want', 1828),
 ('got', 1705),
 ('miss', 1630),
 ('back', 1604),
 ('time', 1574),
 ('still', 1494),
 ('feel', 1483),
 ('good', 1389),
 ('one', 1380),
 ('thi', 1344),
 ('wish', 1331),
 ('need', 1300),
 ('know', 1283),
 ('sad', 1240),
 ('ha', 1240),
 ('think', 1218),
 ('night', 1206),
 ('home', 1172),
 ('last', 1152),
 ('sleep', 1070),
 ('didnt', 1030),
 ('bad', 1029),
 ('much', 1018),
 ('u', 1002),
 ('oh', 990),
 ('well', 980),
 ('hope', 978),
 ('love', 974),
 ('lol', 968),
 ('see', 955),
 ('hate', 927),
 ('make', 913),
 ('going', 886),
 ('sick', 869),
 ('tomorrow', 864),
 ('twitter', 835),
 ('come', 824),
 ('new', 775),
 ('bed', 774),
 ('amp', 774),
 ('though', 764),
 ('realli', 728),
 ('really', 726),
 ('would', 726),
 ('week', 714),
 ('right', 694),
 ('tonight', 694),
 ('look', 684),
 ('could', 684),
 ('hour', 674),
 ('wa

In [105]:
topwords = [w[0] for w in top5000 ]

topwords[:10]

['im', 'go', 'wa', 'work', 'get', 'day', 'dont', 'cant', 'like', 'today']

### Building Features

Now let's build the features. Using the top 5,000 words, create a 2-dimensional matrix to record whether each of those words is contained in each document (tweet). Then you also have an output column to indicate whether the sentiment in each tweet is positive. For example, assuming your bag of words has 5 items (`['one', 'two', 'three', 'four', 'five']`) out of 4 documents (`['A', 'B', 'C', 'D']`), your feature set is essentially:

| Doc | one | two | three | four | five | is_positive |
|---|---|---|---|---|---|---|
| A | True | False | False | True | False | True |
| B | False | False | False | True | True | False |
| C | False | True | False | False | False | True |
| D | True | False | False | False | True | False|

However, because the `nltk.NaiveBayesClassifier.train` class we will use in the next step does not work with Pandas dataframe, the structure of your feature set should be converted to the Python list looking like below:

```python
[
	({
		'one': True,
		'two': False,
		'three': False,
		'four': True,
		'five': False
	}, True),
	({
		'one': False,
		'two': False,
		'three': False,
		'four': True,
		'five': True
	}, False),
	({
		'one': False,
		'two': True,
		'three': False,
		'four': False,
		'five': False
	}, True),
	({
		'one': True,
		'two': False,
		'three': False,
		'four': False,
		'five': True
	}, False)
]
```

To help you in this step, watch the [following video](https://www.youtube.com/watch?v=-vVskDsHcVc) to learn how to build the feature set with Python and NLTK. The source code in this video can be found [here](https://pythonprogramming.net/words-as-features-nltk-tutorial/).

[![Building Features](building-features.jpg)](https://www.youtube.com/watch?v=-vVskDsHcVc)

In [None]:
# your code here

### Building and Traininng Naive Bayes Model

In this step you will split your feature set into a training and a test set. Then you will create a Bayes classifier instance using `nltk.NaiveBayesClassifier.train` ([example](https://www.nltk.org/book/ch06.html)) to train with the training dataset.

After training the model, call `classifier.show_most_informative_features()` to inspect the most important features. The output will look like:

```
Most Informative Features
	    snow = True            False : True   =     34.3 : 1.0
	  easter = True            False : True   =     26.2 : 1.0
	 headach = True            False : True   =     20.9 : 1.0
	    argh = True            False : True   =     17.6 : 1.0
	unfortun = True            False : True   =     16.9 : 1.0
	    jona = True             True : False  =     16.2 : 1.0
	     ach = True            False : True   =     14.9 : 1.0
	     sad = True            False : True   =     13.0 : 1.0
	  parent = True            False : True   =     12.9 : 1.0
	  spring = True            False : True   =     12.7 : 1.0
```

The [following video](https://www.youtube.com/watch?v=rISOsUaTrO4) will help you complete this step. The source code in this video can be found [here](https://pythonprogramming.net/naive-bayes-classifier-nltk-tutorial/).

[![Building and Training NB](nb-model-building.jpg)](https://www.youtube.com/watch?v=rISOsUaTrO4)

In [1]:
# your code here

### Testing Naive Bayes Model

Now we'll test our classifier with the test dataset. This is done by calling `nltk.classify.accuracy(classifier, test)`.

As mentioned in one of the tutorial videos, a Naive Bayes model is considered OK if your accuracy score is over 0.6. If your accuracy score is over 0.7, you've done a great job!

In [None]:
# your code here

## Bonus Question 1: Improve Model Performance

If you are still not exhausted so far and want to dig deeper, try to improve your classifier performance. There are many aspects you can dig into, for example:

* Improve stemming and lemmatization. Inspect your bag of words and the most important features. Are there any words you should furuther remove from analysis? You can append these words to further remove to the stop words list.

* Remember we only used the top 5,000 features to build model? Try using different numbers of top features. The bottom line is to use as few features as you can without compromising your model performance. The fewer features you select into your model, the faster your model is trained. Then you can use a larger sample size to improve your model accuracy score.

In [None]:
# your code here

## Bonus Question 2: Machine Learning Pipeline

In a new Jupyter Notebook, combine all your codes into a function (or a class). Your new function will execute the complete machine learning pipeline job by receiving the dataset location and output the classifier. This will allow you to use your function to predict the sentiment of any tweet in real time. 

In [None]:
# your code here

## Bonus Question 3: Apache Spark

If you have completed the Apache Spark advanced topic lab, what you can do is to migrate your pipeline from local to a Databricks Notebook. Share your notebook with your instructor and classmates to show off your achievements!

In [None]:
# your code here