## Wrangle "@weratedogs" twitter data
The objective of this document is to wrangle the tweet archive of @weratedogs user on twitter. To inspect the data and augment it with other sources of information including by fetching data from twitter API. Once the data is gathered from all sources, the data will be inspected and cleaned.

In [99]:
# import necessary libraries
import pandas as pd
import numpy as np
import requests # to make http requests
import tweepy # to work with the twitter api

import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

In [65]:
# file and URL names to process
twitter_archive_file = 'twitter-archive-enhanced.csv'
image_predictions_url = 'https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv'
tweets_file = 'tweet_json.txt'

### Gather
Get data from the initially known data sources

In [5]:
# get data from WeRateDogs twitter archive
twitter_archive = pd.read_csv(twitter_archive_file)
twitter_archive.head(1)

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,


In [12]:
# get tweet image predictions data from url
image_predictions = pd.read_csv(image_predictions_url, sep='\t')
image_predictions.head(1)

Unnamed: 0,tweet_id,jpg_url,img_num,p1,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog
0,666020888022790149,https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg,1,Welsh_springer_spaniel,0.465074,True,collie,0.156665,True,Shetland_sheepdog,0.061428,True


### Assess
Understand the obtained data a little before fetching the tweets

In [13]:
twitter_archive.shape

(2356, 17)

In [15]:
image_predictions.shape

(2075, 12)

### Gather
__`twitter_archive` table__ has more lines of data, hence let us fetch the tweets based on this

In [29]:
# get keys and secrets from the environment
import os
consumer_key = os.environ['TWITTER_API_KEY']
consumer_secret = os.environ['TWITTER_API_SECRET']
access_token = os.environ['TWITTER_ACCESS_TOKEN']
access_token_secret = os.environ['TWITTER_ACCESS_SECRET']

In [86]:
# Initialize the tweepy library 
import tweepy
import json
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)

In [76]:
# get the full tweet json for each tweet and write to a file one line per tweet
with open(tweets_file, 'w') as file:
    # statuses_lookup can fetch 100 tweets at a time
    # take 100 tweets at a time from twitter_archive
    for index in range(0, len(twitter_archive), 100):
        id_list = twitter_archive.tweet_id.iloc[index:index+100]
        try:
            statuses = api.statuses_lookup(id_list.array)
            for status in statuses:
                file.write(json.dumps(status._json) + '\n')
        except Exception as e:
            print(str(e))

In [85]:
# read this data into a dataframe
detailed_tweets = pd.read_json(tweets_file, lines=True)
detailed_tweets.head(1)

Unnamed: 0,contributors,coordinates,created_at,entities,extended_entities,favorite_count,favorited,geo,id,id_str,...,quoted_status,quoted_status_id,quoted_status_id_str,retweet_count,retweeted,retweeted_status,source,text,truncated,user
0,,,2017-06-18 16:57:37,"{'hashtags': [], 'symbols': [], 'user_mentions...",,18150,False,,876484053909872640,876484053909872640,...,,,,2268,False,,"<a href=""http://twitter.com/download/iphone"" r...",This is Benedict. He wants to thank you for th...,True,"{'id': 4196983835, 'id_str': '4196983835', 'na..."


### Assess

#### Analyze the `twitter_archive` data

In [77]:
twitter_archive.sample(10)

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
496,813157409116065792,,,2016-12-25 23:00:08 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Layla. It is her first Christmas. She ...,,,,https://twitter.com/dog_rates/status/813157409...,12,10,Layla,,,,
720,783347506784731136,,,2016-10-04 16:46:14 +0000,"<a href=""http://twitter.com/download/iphone"" r...",RT @dog_rates: This is Kenny. He just wants to...,6.742918e+17,4196984000.0,2015-12-08 18:17:56 +0000,https://twitter.com/dog_rates/status/674291837...,11,10,Kenny,,,,
1109,733828123016450049,,,2016-05-21 01:13:53 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Terry. The harder you hug him the fart...,,,,https://twitter.com/dog_rates/status/733828123...,10,10,Terry,,,,
916,756998049151549440,,,2016-07-23 23:42:53 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Oliver. He's an English Creamschnitzel...,,,,https://twitter.com/dog_rates/status/756998049...,11,10,Oliver,,,,
1332,705475953783398401,,,2016-03-03 19:32:29 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Say hello to Zara. She found a sandal and coul...,,,,https://twitter.com/dog_rates/status/705475953...,12,10,Zara,,,,
1596,686286779679375361,,,2016-01-10 20:41:33 +0000,"<a href=""http://vine.co"" rel=""nofollow"">Vine -...",When bae calls your name from across the room....,,,,https://vine.co/v/iMZx6aDbExn,12,10,,,,,
1623,684902183876321280,,,2016-01-07 00:59:40 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Perry. He's an Augustus Gloopster. Ver...,,,,https://twitter.com/dog_rates/status/684902183...,11,10,Perry,,,,
725,782722598790725632,,,2016-10-02 23:23:04 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Penny. She fought a bee and the bee wo...,,,,https://twitter.com/dog_rates/status/782722598...,10,10,Penny,,,,
1557,688804835492233216,,,2016-01-17 19:27:24 +0000,"<a href=""http://twitter.com/download/iphone"" r...",When you stumble but recover quickly cause you...,,,,https://twitter.com/dog_rates/status/688804835...,12,10,,,,,
2165,669367896104181761,,,2015-11-25 04:11:57 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Chip. Chip's pretending to be choked. ...,,,,https://twitter.com/dog_rates/status/669367896...,10,10,Chip,,,,


In [88]:
twitter_archive.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2356 entries, 0 to 2355
Data columns (total 17 columns):
tweet_id                      2356 non-null int64
in_reply_to_status_id         78 non-null float64
in_reply_to_user_id           78 non-null float64
timestamp                     2356 non-null object
source                        2356 non-null object
text                          2356 non-null object
retweeted_status_id           181 non-null float64
retweeted_status_user_id      181 non-null float64
retweeted_status_timestamp    181 non-null object
expanded_urls                 2297 non-null object
rating_numerator              2356 non-null int64
rating_denominator            2356 non-null int64
name                          2356 non-null object
doggo                         2356 non-null object
floofer                       2356 non-null object
pupper                        2356 non-null object
puppo                         2356 non-null object
dtypes: float64(4), int64(3), ob

In [90]:
twitter_archive.describe()

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,retweeted_status_id,retweeted_status_user_id,rating_numerator,rating_denominator
count,2356.0,78.0,78.0,181.0,181.0,2356.0,2356.0
mean,7.427716e+17,7.455079e+17,2.014171e+16,7.7204e+17,1.241698e+16,13.126486,10.455433
std,6.856705e+16,7.582492e+16,1.252797e+17,6.236928e+16,9.599254e+16,45.876648,6.745237
min,6.660209e+17,6.658147e+17,11856340.0,6.661041e+17,783214.0,0.0,0.0
25%,6.783989e+17,6.757419e+17,308637400.0,7.186315e+17,4196984000.0,10.0,10.0
50%,7.196279e+17,7.038708e+17,4196984000.0,7.804657e+17,4196984000.0,11.0,10.0
75%,7.993373e+17,8.257804e+17,4196984000.0,8.203146e+17,4196984000.0,12.0,10.0
max,8.924206e+17,8.862664e+17,8.405479e+17,8.87474e+17,7.874618e+17,1776.0,170.0


In [126]:
twitter_archive[twitter_archive.rating_numerator < 10].sample(10)

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
2255,667773195014021121,,,2015-11-20 18:35:10 +0000,"<a href=""http://twitter.com"" rel=""nofollow"">Tw...",This is a rare Hungarian Pinot named Jessiga. ...,,,,https://twitter.com/dog_rates/status/667773195...,8,10,a,,,,
1861,675483430902214656,,,2015-12-12 01:12:54 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Rare shielded battle dog here. Very happy abou...,,,,https://twitter.com/dog_rates/status/675483430...,5,10,,,,,
2347,666057090499244032,,,2015-11-16 00:55:59 +0000,"<a href=""http://twitter.com/download/iphone"" r...",My oh my. This is a rare blond Canadian terrie...,,,,https://twitter.com/dog_rates/status/666057090...,9,10,a,,,,
1735,679729593985699840,,,2015-12-23 18:25:38 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Hunter. He was playing with his ball m...,,,,https://twitter.com/dog_rates/status/679729593...,8,10,Hunter,,,,
1756,678767140346941444,,,2015-12-21 02:41:11 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Mia. She makes awful decisions. 8/10 h...,,,,https://twitter.com/dog_rates/status/678767140...,8,10,Mia,,,,
1484,693231807727280129,,,2016-01-30 00:38:37 +0000,"<a href=""http://twitter.com/download/iphone"" r...","This is Bodie. He's not proud of what he did, ...",,,,https://twitter.com/dog_rates/status/693231807...,9,10,Bodie,,,,
1978,672984142909456390,,,2015-12-05 03:41:37 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Very happy pup here. Always smiling. Loves his...,,,,https://twitter.com/dog_rates/status/672984142...,9,10,,,,,
2302,667012601033924608,,,2015-11-18 16:12:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Klevin. He laughs a lot. Very cool dog...,,,,https://twitter.com/dog_rates/status/667012601...,9,10,Klevin,,,,
1803,676948236477857792,,,2015-12-16 02:13:31 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Karl. Karl thinks he's slick. 6/10 sne...,,,,https://twitter.com/dog_rates/status/676948236...,6,10,Karl,,,,
2242,667911425562669056,,,2015-11-21 03:44:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Wow. Armored dog here. Ready for battle. Face ...,,,,https://twitter.com/dog_rates/status/667911425...,5,10,,,,,


In [112]:
twitter_archive.iloc[315].expanded_urls

'https://twitter.com/dog_rates/status/835152434251116546/photo/1,https://twitter.com/dog_rates/status/835152434251116546/photo/1,https://twitter.com/dog_rates/status/835152434251116546/photo/1'

In [120]:
twitter_archive.iloc[2317].expanded_urls

'https://twitter.com/dog_rates/status/666644823164719104/photo/1'

In [128]:
twitter_archive[twitter_archive.rating_numerator < 10].shape

(440, 17)

In [113]:
twitter_archive[twitter_archive.rating_numerator > 15].sample(10)

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
1120,731156023742988288,,,2016-05-13 16:15:54 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Say hello to this unbelievably well behaved sq...,,,,https://twitter.com/dog_rates/status/731156023...,204,170,this,,,,
1712,680494726643068929,,,2015-12-25 21:06:00 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here we have uncovered an entire battalion of ...,,,,https://twitter.com/dog_rates/status/680494726...,26,10,,,,,
1351,704054845121142784,,,2016-02-28 21:25:30 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here is a whole flock of puppers. 60/50 I'll ...,,,,https://twitter.com/dog_rates/status/704054845...,60,50,a,,,,
313,835246439529840640,8.35246e+17,26259576.0,2017-02-24 21:54:03 +0000,"<a href=""http://twitter.com/download/iphone"" r...",@jonnysun @Lin_Manuel ok jomny I know you're e...,,,,,960,0,,,,,
2074,670842764863651840,,,2015-11-29 05:52:33 +0000,"<a href=""http://twitter.com/download/iphone"" r...",After so many requests... here you go.\n\nGood...,,,,https://twitter.com/dog_rates/status/670842764...,420,10,,,,,
433,820690176645140481,,,2017-01-15 17:52:40 +0000,"<a href=""http://twitter.com/download/iphone"" r...",The floofs have been released I repeat the flo...,,,,https://twitter.com/dog_rates/status/820690176...,84,70,,,,,
188,855862651834028034,8.558616e+17,194351775.0,2017-04-22 19:15:32 +0000,"<a href=""http://twitter.com/download/iphone"" r...",@dhmontgomery We also gave snoop dogg a 420/10...,,,,,420,10,,,,,
1254,710658690886586372,,,2016-03-18 02:46:49 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here's a brigade of puppers. All look very pre...,,,,https://twitter.com/dog_rates/status/710658690...,80,80,,,,,
1779,677716515794329600,,,2015-12-18 05:06:23 +0000,"<a href=""http://twitter.com/download/iphone"" r...",IT'S PUPPERGEDDON. Total of 144/120 ...I think...,,,,https://twitter.com/dog_rates/status/677716515...,144,120,,,,,
1228,713900603437621249,,,2016-03-27 01:29:02 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Happy Saturday here's 9 puppers on a bench. 99...,,,,https://twitter.com/dog_rates/status/713900603...,99,90,,,,,


In [110]:
twitter_archive.iloc[979].expanded_urls

'https://twitter.com/dog_rates/status/749981277374128128/photo/1'

In [129]:
twitter_archive[twitter_archive.rating_numerator > 15].shape

(26, 17)

In [130]:
twitter_archive[twitter_archive.name == "None"].sample(10)

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
1459,695064344191721472,,,2016-02-04 02:00:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This may be the greatest video I've ever been ...,,,,https://twitter.com/dog_rates/status/695064344...,4,10,,,,,
1111,733482008106668032,,,2016-05-20 02:18:32 +0000,"<a href=""http://twitter.com/download/iphone"" r...","""Ello this is dog how may I assist"" ...10/10 h...",,,,https://twitter.com/dog_rates/status/733482008...,10,10,,,,,
195,855138241867124737,,,2017-04-20 19:16:59 +0000,"<a href=""http://twitter.com/download/iphone"" r...",RT @frasercampbell_: oh my... what's that... b...,8.551225e+17,7.475543e+17,2017-04-20 18:14:33 +0000,https://twitter.com/frasercampbell_/status/855...,14,10,,,,,
1689,681340665377193984,6.813394e+17,4196984000.0,2015-12-28 05:07:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",I've been told there's a slight possibility he...,,,,,5,10,,,,,
1654,683449695444799489,,,2016-01-03 00:47:59 +0000,"<a href=""http://twitter.com/download/iphone"" r...",I just want to be friends with this dog. Appea...,,,,https://twitter.com/dog_rates/status/683449695...,10,10,,,,,
1224,714214115368108032,,,2016-03-27 22:14:49 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Happy Easter from the squad! 🐇🐶 13/10 for all ...,,,,https://twitter.com/dog_rates/status/714214115...,13,10,,,,,
1506,691756958957883396,,,2016-01-25 22:58:05 +0000,"<a href=""http://twitter.com/download/iphone"" r...",THE BRITISH ARE COMING\nTHE BRITISH ARE COMING...,,,,https://twitter.com/dog_rates/status/691756958...,10,10,,,,,
1068,740373189193256964,,,2016-06-08 02:41:38 +0000,"<a href=""http://twitter.com/download/iphone"" r...","After so many requests, this is Bretagne. She ...",,,,https://twitter.com/dog_rates/status/740373189...,9,11,,,,,
1263,710117014656950272,,,2016-03-16 14:54:24 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This pupper got her hair chalked for her birth...,,,,https://twitter.com/dog_rates/status/710117014...,11,10,,,,pupper,
411,823269594223824897,,,2017-01-22 20:42:21 +0000,"<a href=""http://twitter.com/download/iphone"" r...",RT @dog_rates: We only rate dogs. Please don't...,8.222448e+17,4196984000.0,2017-01-20 00:50:15 +0000,https://twitter.com/dog_rates/status/822244816...,11,10,,,,,


In [124]:
twitter_archive[twitter_archive.name == "None"].shape

(745, 17)

#### Analyze the `image_predictions` data

In [122]:
image_predictions.sample(10)

Unnamed: 0,tweet_id,jpg_url,img_num,p1,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog
2064,890006608113172480,https://pbs.twimg.com/media/DFnwSY4WAAAMliS.jpg,1,Samoyed,0.957979,True,Pomeranian,0.013884,True,chow,0.008167,True
460,674793399141146624,https://pbs.twimg.com/media/CV1ZA3oWEAA1HW_.jpg,1,giant_schnauzer,0.119693,True,Afghan_hound,0.072763,True,miniature_schnauzer,0.063786,True
1709,818145370475810820,https://pbs.twimg.com/media/C1qi26rW8AMaj9K.jpg,1,golden_retriever,0.621931,True,Labrador_retriever,0.364997,True,redbone,0.003971,True
2048,886983233522544640,https://pbs.twimg.com/media/DE8yicJW0AAAvBJ.jpg,2,Chihuahua,0.793469,True,toy_terrier,0.143528,True,can_opener,0.032253,False
1987,872620804844003328,https://pbs.twimg.com/media/DBwr_hzXkAEnZBW.jpg,1,cocker_spaniel,0.513191,True,Sussex_spaniel,0.159088,True,standard_poodle,0.149509,True
685,683857920510050305,https://pbs.twimg.com/media/CX2NJmRWYAAxz_5.jpg,1,bluetick,0.174738,True,Shetland_sheepdog,0.126101,True,beagle,0.122887,True
1647,808733504066486276,https://pbs.twimg.com/media/Czky0v9VIAEXRkd.jpg,1,seat_belt,0.779137,False,toy_poodle,0.036927,True,golden_retriever,0.016972,True
116,668113020489474048,https://pbs.twimg.com/media/CUWdPsqWcAERQVv.jpg,1,Pembroke,0.548896,True,Cardigan,0.191101,True,collie,0.059814,True
733,686749460672679938,https://pbs.twimg.com/media/CYfS75fWAAAllde.jpg,1,cheeseburger,0.643808,False,hotdog,0.201378,False,bagel,0.06388,False
438,674422304705744896,https://pbs.twimg.com/media/CVwHgblWcAACWOD.jpg,1,golden_retriever,0.964497,True,Labrador_retriever,0.009006,True,tennis_ball,0.007139,False


In [131]:
image_predictions.describe()

Unnamed: 0,tweet_id,img_num,p1_conf,p2_conf,p3_conf
count,2075.0,2075.0,2075.0,2075.0,2075.0
mean,7.384514e+17,1.203855,0.594548,0.1345886,0.06032417
std,6.785203e+16,0.561875,0.271174,0.1006657,0.05090593
min,6.660209e+17,1.0,0.044333,1.0113e-08,1.74017e-10
25%,6.764835e+17,1.0,0.364412,0.05388625,0.0162224
50%,7.119988e+17,1.0,0.58823,0.118181,0.0494438
75%,7.932034e+17,1.0,0.843855,0.1955655,0.09180755
max,8.924206e+17,4.0,1.0,0.488014,0.273419


In [132]:
image_predictions.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2075 entries, 0 to 2074
Data columns (total 12 columns):
tweet_id    2075 non-null int64
jpg_url     2075 non-null object
img_num     2075 non-null int64
p1          2075 non-null object
p1_conf     2075 non-null float64
p1_dog      2075 non-null bool
p2          2075 non-null object
p2_conf     2075 non-null float64
p2_dog      2075 non-null bool
p3          2075 non-null object
p3_conf     2075 non-null float64
p3_dog      2075 non-null bool
dtypes: bool(3), float64(3), int64(2), object(4)
memory usage: 152.1+ KB


In [133]:
image_predictions[image_predictions.p1_dog == False]

Unnamed: 0,tweet_id,jpg_url,img_num,p1,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog
6,666051853826850816,https://pbs.twimg.com/media/CT5KoJ1WoAAJash.jpg,1,box_turtle,0.933012,False,mud_turtle,4.588540e-02,False,terrapin,1.788530e-02,False
8,666057090499244032,https://pbs.twimg.com/media/CT5PY90WoAAQGLo.jpg,1,shopping_cart,0.962465,False,shopping_basket,1.459380e-02,False,golden_retriever,7.958960e-03,True
17,666104133288665088,https://pbs.twimg.com/media/CT56LSZWoAAlJj2.jpg,1,hen,0.965932,False,cock,3.391940e-02,False,partridge,5.206580e-05,False
18,666268910803644416,https://pbs.twimg.com/media/CT8QCd1WEAADXws.jpg,1,desktop_computer,0.086502,False,desk,8.554740e-02,False,bookcase,7.947970e-02,False
21,666293911632134144,https://pbs.twimg.com/media/CT8mx7KW4AEQu8N.jpg,1,three-toed_sloth,0.914671,False,otter,1.525000e-02,False,great_grey_owl,1.320720e-02,False
22,666337882303524864,https://pbs.twimg.com/media/CT9OwFIWEAMuRje.jpg,1,ox,0.416669,False,Newfoundland,2.784070e-01,True,groenendael,1.026430e-01,True
25,666362758909284353,https://pbs.twimg.com/media/CT9lXGsUcAAyUFt.jpg,1,guinea_pig,0.996496,False,skunk,2.402450e-03,False,hamster,4.608630e-04,False
29,666411507551481857,https://pbs.twimg.com/media/CT-RugiWIAELEaq.jpg,1,coho,0.404640,False,barracouta,2.714850e-01,False,gar,1.899450e-01,False
33,666430724426358785,https://pbs.twimg.com/media/CT-jNYqW4AAPi2M.jpg,1,llama,0.505184,False,Irish_terrier,1.041090e-01,True,dingo,6.207120e-02,False
43,666776908487630848,https://pbs.twimg.com/media/CUDeDoWUYAAD-EM.jpg,1,seat_belt,0.375057,False,miniature_pinscher,1.671750e-01,True,Chihuahua,8.695060e-02,True


In [138]:
image_predictions[(image_predictions.p1_dog == False) & (image_predictions.p2_dog == False) & (image_predictions.p3_dog == False)]

Unnamed: 0,tweet_id,jpg_url,img_num,p1,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog
6,666051853826850816,https://pbs.twimg.com/media/CT5KoJ1WoAAJash.jpg,1,box_turtle,0.933012,False,mud_turtle,4.588540e-02,False,terrapin,1.788530e-02,False
17,666104133288665088,https://pbs.twimg.com/media/CT56LSZWoAAlJj2.jpg,1,hen,0.965932,False,cock,3.391940e-02,False,partridge,5.206580e-05,False
18,666268910803644416,https://pbs.twimg.com/media/CT8QCd1WEAADXws.jpg,1,desktop_computer,0.086502,False,desk,8.554740e-02,False,bookcase,7.947970e-02,False
21,666293911632134144,https://pbs.twimg.com/media/CT8mx7KW4AEQu8N.jpg,1,three-toed_sloth,0.914671,False,otter,1.525000e-02,False,great_grey_owl,1.320720e-02,False
25,666362758909284353,https://pbs.twimg.com/media/CT9lXGsUcAAyUFt.jpg,1,guinea_pig,0.996496,False,skunk,2.402450e-03,False,hamster,4.608630e-04,False
29,666411507551481857,https://pbs.twimg.com/media/CT-RugiWIAELEaq.jpg,1,coho,0.404640,False,barracouta,2.714850e-01,False,gar,1.899450e-01,False
45,666786068205871104,https://pbs.twimg.com/media/CUDmZIkWcAAIPPe.jpg,1,snail,0.999888,False,slug,5.514170e-05,False,acorn,2.625800e-05,False
50,666837028449972224,https://pbs.twimg.com/media/CUEUva1WsAA2jPb.jpg,1,triceratops,0.442113,False,armadillo,1.140710e-01,False,common_iguana,4.325530e-02,False
51,666983947667116034,https://pbs.twimg.com/media/CUGaXDhW4AY9JUH.jpg,1,swab,0.589446,False,chain_saw,1.901420e-01,False,wig,3.450970e-02,False
53,667012601033924608,https://pbs.twimg.com/media/CUG0bC0U8AAw2su.jpg,1,hyena,0.987230,False,African_hunting_dog,1.260080e-02,False,coyote,5.735010e-05,False


#### Analyze the `detailed_tweets` table

In [139]:
detailed_tweets.sample(10)

Unnamed: 0,contributors,coordinates,created_at,entities,extended_entities,favorite_count,favorited,geo,id,id_str,...,quoted_status,quoted_status_id,quoted_status_id_str,retweet_count,retweeted,retweeted_status,source,text,truncated,user
1037,,,2016-06-13 01:06:33,"{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 742161185223630852, 'id_str'...",4468,False,,742161199639494656,742161199639494656,...,,,,1429,False,,"<a href=""http://twitter.com/download/iphone"" r...",This is Doug. He's trying to float away. 12/10...,False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
2018,,,2015-12-03 02:45:32,"{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 672245248555409408, 'id_str'...",680,False,,672245253877968896,672245253877968896,...,,,,159,False,,"<a href=""http://twitter.com/download/iphone"" r...",Meet Snickers. He's adorable. Also comes in t-...,False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
2099,,,2015-11-28 19:04:19,"{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 670679606182055936, 'id_str'...",745,False,,670679630144274432,670679630144274432,...,,,,278,False,,"<a href=""http://twitter.com/download/iphone"" r...",This is Pluto. He's holding little waddling do...,False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
2109,,,2015-11-25 01:20:08,"{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 669324644248219648, 'id_str'...",494,False,,669324657376567296,669324657376567296,...,,,,207,False,,"<a href=""http://twitter.com/download/iphone"" r...",Meet Ralf. He's a miniature Buick DiCaprio. Ca...,False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
1616,,,2016-01-08 19:45:39,"{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 685547930707750912, 'id_str'...",33890,False,,685547936038666240,685547936038666240,...,,,,16185,False,,"<a href=""http://twitter.com/download/iphone"" r...",Everybody needs to read this. Jack is our firs...,False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
2147,,,2015-11-24 02:29:49,"{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 668979801907175424, 'id_str'...",793,False,,668979806671884288,668979806671884288,...,,,,348,False,,"<a href=""http://twitter.com/download/iphone"" r...",This is Chaz. He's an X Games half pipe supers...,False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
1550,,,2016-01-12 02:06:41,"{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 686730987418550272, 'id_str'...",4262,False,,686730991906516992,686730991906516992,...,,,,1235,False,,"<a href=""http://twitter.com/download/iphone"" r...",I just love this picture. 12/10 lovely af http...,False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
1192,,,2016-03-14 02:39:42,"{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 709207338792525824, 'id_str'...",12890,False,,709207347839836162,709207347839836160,...,,,,5980,False,,"<a href=""http://twitter.com/download/iphone"" r...",This is Penny. She's trying on her prom dress....,False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
2048,,,2015-12-01 19:10:13,"{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 671768277677441024, 'id_str'...",1184,False,,671768281401958400,671768281401958400,...,,,,507,False,,"<a href=""http://twitter.com/download/iphone"" r...",When you try to recreate the scene from Lady &...,False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
1908,,,2015-12-07 03:45:53,"{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 673709986195890176, 'id_str'...",849,False,,673709992831262724,673709992831262720,...,,,,275,False,,"<a href=""http://twitter.com/download/iphone"" r...",I know a lot of you are studying for finals. G...,False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."


In [140]:
detailed_tweets.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2333 entries, 0 to 2332
Data columns (total 29 columns):
contributors                 0 non-null float64
coordinates                  0 non-null float64
created_at                   2333 non-null datetime64[ns]
entities                     2333 non-null object
extended_entities            1818 non-null object
favorite_count               2333 non-null int64
favorited                    2333 non-null bool
geo                          0 non-null float64
id                           2333 non-null int64
id_str                       2333 non-null int64
in_reply_to_screen_name      77 non-null object
in_reply_to_status_id        77 non-null float64
in_reply_to_status_id_str    77 non-null float64
in_reply_to_user_id          77 non-null float64
in_reply_to_user_id_str      77 non-null float64
is_quote_status              2333 non-null bool
lang                         2333 non-null object
place                        1 non-null object
possibl

In [141]:
detailed_tweets.describe()

Unnamed: 0,contributors,coordinates,favorite_count,geo,id,id_str,in_reply_to_status_id,in_reply_to_status_id_str,in_reply_to_user_id,in_reply_to_user_id_str,possibly_sensitive,quoted_status_id,quoted_status_id_str,retweet_count
count,0.0,0.0,2333.0,0.0,2333.0,2333.0,77.0,77.0,77.0,77.0,2199.0,26.0,26.0,2333.0
mean,,,7781.91213,,7.419023e+17,7.419023e+17,7.440692e+17,7.440692e+17,2.040329e+16,2.040329e+16,0.0,8.113972e+17,8.113972e+17,2814.629233
std,,,12072.355158,,6.818084e+16,6.818084e+16,7.524295e+16,7.524295e+16,1.260797e+17,1.260797e+17,0.0,6.295843e+16,6.295843e+16,4760.842662
min,,,0.0,,6.660209e+17,6.660209e+17,6.658147e+17,6.658147e+17,11856340.0,11856340.0,0.0,6.721083e+17,6.721083e+17,1.0
25%,,,1355.0,,6.782786e+17,6.782786e+17,6.757073e+17,6.757073e+17,358972800.0,358972800.0,0.0,7.761338e+17,7.761338e+17,565.0
50%,,,3390.0,,7.184547e+17,7.184547e+17,7.032559e+17,7.032559e+17,4196984000.0,4196984000.0,0.0,8.281173e+17,8.281173e+17,1316.0
75%,,,9545.0,,7.98644e+17,7.98644e+17,8.233264e+17,8.233264e+17,4196984000.0,4196984000.0,0.0,8.637581e+17,8.637581e+17,3281.0
max,,,161207.0,,8.924206e+17,8.924206e+17,8.862664e+17,8.862664e+17,8.405479e+17,8.405479e+17,0.0,8.860534e+17,8.860534e+17,80810.0


In [142]:
detailed_tweets[detailed_tweets.favorited == True]

Unnamed: 0,contributors,coordinates,created_at,entities,extended_entities,favorite_count,favorited,geo,id,id_str,...,quoted_status,quoted_status_id,quoted_status_id_str,retweet_count,retweeted,retweeted_status,source,text,truncated,user


In [143]:
detailed_tweets[detailed_tweets.retweeted == True]

Unnamed: 0,contributors,coordinates,created_at,entities,extended_entities,favorite_count,favorited,geo,id,id_str,...,quoted_status,quoted_status_id,quoted_status_id_str,retweet_count,retweeted,retweeted_status,source,text,truncated,user


In [144]:
detailed_tweets[detailed_tweets.is_quote_status == True]

Unnamed: 0,contributors,coordinates,created_at,entities,extended_entities,favorite_count,favorited,geo,id,id_str,...,quoted_status,quoted_status_id,quoted_status_id_str,retweet_count,retweeted,retweeted_status,source,text,truncated,user
6,,,2017-06-24 13:24:20,"{'hashtags': [], 'symbols': [], 'user_mentions...",,29162,False,,878604707211726852,878604707211726848,...,{'created_at': 'Sat Jun 24 13:05:06 +0000 2017...,8.785999e+17,8.785999e+17,6778,False,,"<a href=""http://twitter.com/download/iphone"" r...",Martha is stunning how h*ckin dare you. 13/10 ...,False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
12,,,2017-07-13 15:19:09,"{'hashtags': [], 'symbols': [], 'user_mentions...",,19734,False,,885518971528720385,885518971528720384,...,,,,3522,False,,"<a href=""http://twitter.com/download/iphone"" r...",I have a new hero and his name is Howard. 14/1...,False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
54,,,2017-07-15 02:45:48,"{'hashtags': [{'text': 'BATP', 'indices': [21,...",,0,False,,886054160059072513,886054160059072512,...,,8.860534e+17,8.860534e+17,104,False,{'created_at': 'Sat Jul 15 02:44:07 +0000 2017...,"<a href=""http://twitter.com/download/iphone"" r...",RT @Athletics: 12/10 #BATP https://t.co/WxwJmv...,False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
68,,,2017-06-18 20:30:39,"{'hashtags': [], 'symbols': [], 'user_mentions...",,22785,False,,876537666061221889,876537666061221888,...,{'created_at': 'Sat Jun 17 19:41:50 +0000 2017...,8.76163e+17,8.76163e+17,4427,False,,"<a href=""http://twitter.com/download/iphone"" r...",I can say with the pupmost confidence that the...,True,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
75,,,2017-06-14 21:06:43,"{'hashtags': [], 'symbols': [], 'user_mentions...",,26517,False,,875097192612077568,875097192612077568,...,{'created_at': 'Mon Jun 12 23:49:34 +0000 2017...,8.744134e+17,8.744134e+17,5767,False,,"<a href=""http://twitter.com/download/iphone"" r...",You'll get your package when that precious man...,False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
93,,,2017-07-10 03:08:17,"{'hashtags': [], 'symbols': [], 'user_mentions...",,70304,False,,884247878851493888,884247878851493888,...,{'created_at': 'Sun Jul 09 08:26:49 +0000 2017...,8.839657e+17,8.839657e+17,19255,False,,"<a href=""http://twitter.com/download/iphone"" r...",OMG HE DIDN'T MEAN TO HE WAS JUST TRYING A LIT...,False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
114,,,2017-05-22 18:21:28,"{'hashtags': [], 'symbols': [], 'user_mentions...",,19617,False,,866720684873056260,866720684873056256,...,{'created_at': 'Mon May 22 01:00:31 +0000 2017...,8.664587e+17,8.664587e+17,4650,False,,"<a href=""http://twitter.com/download/iphone"" r...",He was providing for his family 13/10 how dare...,False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
138,,,2017-04-22 16:18:34,"{'hashtags': [], 'symbols': [], 'user_mentions...",,26581,False,,855818117272018944,855818117272018944,...,{'created_at': 'Sat Apr 22 05:36:05 +0000 2017...,8.556564e+17,8.556564e+17,5420,False,,"<a href=""http://twitter.com/download/iphone"" r...",I HEARD HE TIED HIS OWN BOWTIE MARK AND HE JUS...,False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
142,,,2017-06-03 20:33:19,"{'hashtags': [], 'symbols': [], 'user_mentions...",,20328,False,,871102520638267392,871102520638267392,...,{'created_at': 'Sat Jun 03 18:46:59 +0000 2017...,8.710758e+17,8.710758e+17,5290,False,,"<a href=""http://twitter.com/download/iphone"" r...",Never doubt a doggo 14/10 https://t.co/AbBLh2FZCH,False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
157,,,2017-04-22 18:55:51,"{'hashtags': [], 'symbols': [], 'user_mentions...",,11830,False,,855857698524602368,855857698524602368,...,,,,2102,False,,"<a href=""http://twitter.com/download/iphone"" r...","HE'S LIKE ""WAIT A MINUTE I'M AN ANIMAL THIS IS...",False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."


In [147]:
detailed_tweets[detailed_tweets.place.notnull()]

Unnamed: 0,contributors,coordinates,created_at,entities,extended_entities,favorite_count,favorited,geo,id,id_str,...,quoted_status,quoted_status_id,quoted_status_id_str,retweet_count,retweeted,retweeted_status,source,text,truncated,user
837,,,2016-08-10 01:23:03,"{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 763183833575481344, 'id_str'...",5599,False,,763183847194451968,763183847194451968,...,,,,1546,False,,"<a href=""http://twitter.com/download/iphone"" r...",This is Clark. He collects teddy bears. It's a...,False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."


In [152]:
detailed_tweets.iloc[837]

contributors                                                               NaN
coordinates                                                                NaN
created_at                                                 2016-08-10 01:23:03
entities                     {'hashtags': [], 'symbols': [], 'user_mentions...
extended_entities            {'media': [{'id': 763183833575481344, 'id_str'...
favorite_count                                                            5599
favorited                                                                False
geo                                                                        NaN
id                                                          763183847194451968
id_str                                                      763183847194451968
in_reply_to_screen_name                                                   None
in_reply_to_status_id                                                      NaN
in_reply_to_status_id_str                           

In [159]:
detailed_tweets[detailed_tweets.in_reply_to_screen_name.notnull()]

Unnamed: 0,contributors,coordinates,created_at,entities,extended_entities,favorite_count,favorited,geo,id,id_str,...,quoted_status,quoted_status_id,quoted_status_id_str,retweet_count,retweeted,retweeted_status,source,text,truncated,user
7,,,2017-06-27 12:14:36,"{'hashtags': [], 'symbols': [], 'user_mentions...",,302,False,,879674319642796034,879674319642796032,...,,,,10,False,,"<a href=""http://twitter.com/download/iphone"" r...",@RealKentMurphy 14/10 confirmed,False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
21,,,2017-07-15 16:51:35,"{'hashtags': [], 'symbols': [], 'user_mentions...",,116,False,,886267009285017600,886267009285017600,...,,,,4,False,,"<a href=""http://twitter.com/download/iphone"" r...",@NonWhiteHat @MayhewMayhem omg hello tanner yo...,False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
38,,,2017-07-02 21:58:53,"{'hashtags': [], 'symbols': [], 'user_mentions...",,123,False,,881633300179243008,881633300179243008,...,,,,7,False,,"<a href=""http://twitter.com/download/iphone"" r...",@roushfenway These are good dogs but 17/10 is ...,False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
103,,,2017-04-22 19:05:32,"{'hashtags': [], 'symbols': [], 'user_mentions...",,4987,False,,855860136149123072,855860136149123072,...,,,,1007,False,,"<a href=""http://twitter.com/download/iphone"" r...",@s8n You tried very hard to portray this good ...,False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
115,,,2017-05-13 16:15:35,"{'hashtags': [], 'symbols': [], 'user_mentions...",,2205,False,,863427515083354112,863427515083354112,...,,,,94,False,,"<a href=""http://twitter.com/download/iphone"" r...",@Jack_Septic_Eye I'd need a few more pics to p...,False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
136,,,2017-05-12 17:12:53,"{'hashtags': [], 'symbols': [], 'user_mentions...",,8624,False,,863079547188785154,863079547188785152,...,,,,1079,False,,"<a href=""http://twitter.com/download/iphone"" r...",Ladies and gentlemen... I found Pipsy. He may ...,True,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
151,,,2017-04-24 15:13:52,"{'hashtags': [], 'symbols': [], 'user_mentions...","{'media': [{'id': 856526604033556482, 'id_str'...",11798,False,,856526610513747968,856526610513747968,...,,,,1865,False,,"<a href=""http://twitter.com/download/iphone"" r...","THIS IS CHARLIE, MARK. HE DID JUST WANT TO SAY...",False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
169,,,2017-04-22 19:15:32,"{'hashtags': [], 'symbols': [], 'user_mentions...",,344,False,,855862651834028034,855862651834028032,...,,,,26,False,,"<a href=""http://twitter.com/download/iphone"" r...",@dhmontgomery We also gave snoop dogg a 420/10...,False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
183,,,2017-04-26 12:48:51,"{'hashtags': [], 'symbols': [], 'user_mentions...",,227,False,,857214891891077121,857214891891077120,...,,,,17,False,,"<a href=""http://twitter.com/download/iphone"" r...",@Marc_IRL pixelated af 12/10,False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."
186,,,2017-06-02 19:38:25,"{'hashtags': [], 'symbols': [], 'user_mentions...",,118,False,,870726314365509632,870726314365509632,...,,,,3,False,,"<a href=""http://twitter.com/download/iphone"" r...",@ComplicitOwl @ShopWeRateDogs &gt;10/10 is res...,False,"{'id': 4196983835, 'id_str': '4196983835', 'na..."


In [160]:
detailed_tweets.iloc[21]

contributors                                                               NaN
coordinates                                                                NaN
created_at                                                 2017-07-15 16:51:35
entities                     {'hashtags': [], 'symbols': [], 'user_mentions...
extended_entities                                                          NaN
favorite_count                                                             116
favorited                                                                False
geo                                                                        NaN
id                                                          886267009285017600
id_str                                                      886267009285017600
in_reply_to_screen_name                                            NonWhiteHat
in_reply_to_status_id                                              8.86266e+17
in_reply_to_status_id_str                           

### Quality
#### `twitter_archive` table
- 78 non-original tweets which contains in_reply_to_status_id, in_reply_to_user_id
- in_reply_to_status_id is a float, could be an int
- in_reply_to_user_id is a float, could be a string
- timestamp is an object, should be a datetime
- retweeted_status_id is a float, could be a string
- retweeted_status_user_id is a float, could be a string
- retweeted_status_timestamp is an object, should be a datetime
- 181 retweets present in table - contains retweeted_status_id, retweeted_status_user_id, retweeted_status_timestamp
- missing records in expanded_urls
- some names have the value None
- some rows have no entries in either of doggo, floofer, pupper, puppo
- ~~some rating_numerators have value 0~~
- ~~some rating_numerators have very large values~~
- ~~some rating_denominators have value 0~~
- ~~some rating_denominators have very large values~~
- some of the animals (with low scores) are not dogs (need to decide how to process these)
- some of the animals (with very high scores) are in fact dogs and valid tweets

#### `image_predictions` table
- record 1647 is that of a dog, but p1 finds it a seat belt
- record 1906 is clearly a dog, but p1/p2/p3 finds it otherwise
- record 1953 is a dog in the bushes, but identification is otherwise

#### `detailed_tweets` table
- contributors has empty values
- coordinates has empty values
- geo has empty values
- place has empty values
- in_reply_to_status_id is a float, could be a string
- quoted_status_id is a float, could be a string

### Tidiness
#### `twitter_archive` table
- Dog 'state' is split into 4 columns - doggo, floofer, pupper, puppo
- `image_predictions` could be combined into `twitter_archive` table
- `detailed_tweets` could be combined into `twitter_archive` table
- contributors is unnecessary column
- coordinates is unnecessary column
- geo is unnecessary column

