<h1>Twitter Text Analysis With Pandas and Matplotlib</h1>
<h2>Perform Word Counts From Tweet Text</h2>
<p>In these lessons, you will learn how to do word frequency counts using the Twitter text.</p>
<p>We will be working with a simplified version of the dataset by reducing it from JSON to a CSV (comma-separated values) format to and using pandas read_csv to open it. </p>
<p>The original dataset was created using the Twitter Search API and searching on the hashtag 'nerd'. Tweets were collected every 15 minutes and saved to a file. After two weeks the files were processed to remove duplicate tweets and combined into a single file. Duplicate tweets are an artifact of requesting the maximum number of tweets for each 15 minute epoch. Twitter limits the Search API to 100 tweets per 15 minute epoch. They post 150 in their documentation but we have observed it to be 100. </p> 


In [1]:
import csv
import re
import datetime
import os
import sys
import math
import matplotlib.pyplot as plt
import pandas as pd
from textblob import TextBlob
from pprint import pprint

In [10]:
dftweet = pd.read_csv('csv/nerd_main.csv')
dftweet.tweet_created_at = pd.to_datetime(dftweet.tweet_created_at)
dftweet.head()

Unnamed: 0,tweet_id,tweet_created_at,language,user_screen_name,user_created_at,user_id,followers_count,friends_count,time_zone,utc_offset,retweeted_status,retweet_id,retweet_user_screen_name,retweet_user_id
0,1018178822087192581,2018-07-14 17:02:02,es,Huntersephiroth,Mon Jan 02 22:18:36 +0000 2012,453415076,48,229,,,1,1.0181734475817204e+18,LuisGyG,824881.0
1,1018179236220297216,2018-07-14 17:03:41,en,WikakomSteam,Thu May 05 17:26:18 +0000 2016,728274635712770049,5589,3475,,,0,,,
2,1018179317392650241,2018-07-14 17:04:00,en,Book4Creative,Sat Jul 15 18:30:51 +0000 2017,886291986596192258,24,29,,,1,9.873995085092824e+17,Book4Creative,8.862919865961921e+17
3,1018179688093478912,2018-07-14 17:05:29,en,Voldrega,Tue May 24 02:43:55 +0000 2011,304189053,314,405,,,0,,,
4,1018180125697830912,2018-07-14 17:07:13,en,Voldrega,Tue May 24 02:43:55 +0000 2011,304189053,314,405,,,0,,,


In [2]:
dftext = pd.read_csv('csv/nerd_text.csv', encoding = "ISO-8859-1")
dftext.tail()

Unnamed: 0,tweet_id,text
15364,1023257523082604544,Yo siendo gd nivel 1000 y no pienso parar jamá...
15365,1023257692087832576,What it looks like when you splash hot oil int...
15366,1023258269656199173,RT @DonnabellaMorte: Did anyone else have to d...
15367,1023258771953516544,RT @nerd100nerd: Sexta feira sua linda #Friday...
15368,1023259017278377984,LIVE We are live right now with more @BeatSabe...


In [3]:
dftext['text'][0]

'RT @LuisGyG: Hoy toca switch de teléfono y me estaré mudando al @motorola_mx #motoz3play Les cuento. \x95 \x95 \x95 \x95 \x95 #electronics #glasses #geek\x85'

In [4]:
dates = pd.date_range('20130101', periods=6)

In [6]:
id_var = 1023257523082604544
dftext[dftext.tweet_id == id_var].text

15364    Yo siendo gd nivel 1000 y no pienso parar jamá...
Name: text, dtype: object

In [11]:
topretweets = dftweet.groupby('retweet_id').size().sort_values(ascending=False).reset_index()

In [12]:
for i in range(1,19):
    id_var = topretweets['retweet_id'][i]
#     print(id_var)
#     print(str(id_var))
    print(dftext[dftext.tweet_id == int(id_var)].text)

4843    ¡Es oficial!  el 16 de Agosto será el estreno ...
Name: text, dtype: object
5185    NEW #PODCAST! This week were joined by Steven...
Name: text, dtype: object
9566    This picture won the Internet for me today!  R...
Name: text, dtype: object
14230    Did anyone else have to diagram sentences in g...
Name: text, dtype: object
12071    How do I know Im losing my eyesight? I CAN NO...
Name: text, dtype: object
6699    Follow @GeekSquee Retweet and Like for a chanc...
Name: text, dtype: object
13969    201820NEXTFUJI ROCK FESTIVAL '183515&lt;DAY1&g...
Name: text, dtype: object
14048    Fairly guilty of this... #DoctorWho #DrWho #Ta...
Name: text, dtype: object
5489    Ya falta poco para el estreno de #PlanV @planv...
Name: text, dtype: object
12236    My newest post is about my reading quirks and ...
Name: text, dtype: object
4545    Me years from now:Spurs fan: Why did Kawhi lea...
Name: text, dtype: object
5238    Kanojo wa dare to demo sex surufull hentai - h...
Name: text, d

In [13]:
topretweets['retweet_id'][1]

'1019765122468892672'