# Wrangle and Analyze Data - WeRateDogs

## Introduction

The goal for this project is to wrangle WeRateDogs Twitter data to create interesting and trustworthy analyses and visualizations. I will be using Python and its libraries to gather data from a variety of sources and in a variety of formats, assess its quality and tidiness, then clean it. The dataset that I will be wrangling is the tweet archive of Twitter user @dog_rates, also known as WeRateDogs. WeRateDogs is a Twitter account that rates people's dogs with a humorous comment about the dog.

In [11]:
# Import the libraries.
import pandas as pd
import requests
import os
import tweepy
import time
import json


#import numpy as np

#from datetime import datetime
#import matplotlib.pyplot as plt
#%matplotlib inline
#import seaborn as sns

# Display entire cells
#pd.set_option('display.max_colwidth', -1)

## Gathering Data

#### 1. Twitter csv file - Archive

In [2]:
# Read Twitter archive csv file into DataFrame 
archive_df = pd.read_csv('twitter-archive-enhanced.csv')
archive_df.head(1)

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,


#### 2. Twitter tsv file - Image Predictions

In [3]:
# Import & write image predictions to file
url = 'https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv'
response = requests.get(url)

with open(os.path.join('image-predictions.tsv'), 'wb') as file:
    file.write(response.content)
    
predictive_df = pd.read_csv('image-predictions.tsv', sep = '\t', encoding = 'utf-8')
predictive_df.head(1)

Unnamed: 0,tweet_id,jpg_url,img_num,p1,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog
0,666020888022790149,https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg,1,Welsh_springer_spaniel,0.465074,True,collie,0.156665,True,Shetland_sheepdog,0.061428,True


#### 3. Twitter API

In [4]:
# Create an API object
consumer_key = ''
consumer_secret = ''
access_token = ''
access_secret = ''

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

api = tweepy.API(auth)

In [6]:
# Retreive a list of dictionaries from the twitter api, with informations about each tweets id, favorite count, and retweet count
df_list = []
errors = []
for id in archive_df['tweet_id']:
    try:
        tweet = api.get_status(id, tweet_mode='extended')
        df_list.append({'tweet_id': str(tweet.id),
                        'favorite_count': int(tweet.favorite_count),
                        'retweet_count': int(tweet.retweet_count)})
    except Exception as e:
        print(str(id) + " : " + str(e))
        errors.append(id)

888202515573088257 : [{'code': 144, 'message': 'No status found with that ID.'}]
873697596434513921 : [{'code': 144, 'message': 'No status found with that ID.'}]
872668790621863937 : [{'code': 144, 'message': 'No status found with that ID.'}]
872261713294495745 : [{'code': 144, 'message': 'No status found with that ID.'}]
869988702071779329 : [{'code': 144, 'message': 'No status found with that ID.'}]
866816280283807744 : [{'code': 144, 'message': 'No status found with that ID.'}]
861769973181624320 : [{'code': 144, 'message': 'No status found with that ID.'}]
845459076796616705 : [{'code': 144, 'message': 'No status found with that ID.'}]
842892208864923648 : [{'code': 144, 'message': 'No status found with that ID.'}]
837012587749474308 : [{'code': 144, 'message': 'No status found with that ID.'}]
827228250799742977 : [{'code': 144, 'message': 'No status found with that ID.'}]
812747805718642688 : [{'code': 144, 'message': 'No status found with that ID.'}]
802247111496568832 : [{'code

747963614829678593 : [{'message': 'Rate limit exceeded', 'code': 88}]
747933425676525569 : [{'message': 'Rate limit exceeded', 'code': 88}]
747885874273214464 : [{'message': 'Rate limit exceeded', 'code': 88}]
747844099428986880 : [{'message': 'Rate limit exceeded', 'code': 88}]
747816857231626240 : [{'message': 'Rate limit exceeded', 'code': 88}]
747651430853525504 : [{'message': 'Rate limit exceeded', 'code': 88}]
747648653817413632 : [{'message': 'Rate limit exceeded', 'code': 88}]
747600769478692864 : [{'message': 'Rate limit exceeded', 'code': 88}]
747594051852075008 : [{'message': 'Rate limit exceeded', 'code': 88}]
747512671126323200 : [{'message': 'Rate limit exceeded', 'code': 88}]
747461612269887489 : [{'message': 'Rate limit exceeded', 'code': 88}]
747439450712596480 : [{'message': 'Rate limit exceeded', 'code': 88}]
747242308580548608 : [{'message': 'Rate limit exceeded', 'code': 88}]
747219827526344708 : [{'message': 'Rate limit exceeded', 'code': 88}]
747204161125646336 :

732005617171337216 : [{'message': 'Rate limit exceeded', 'code': 88}]
731285275100512256 : [{'message': 'Rate limit exceeded', 'code': 88}]
731156023742988288 : [{'message': 'Rate limit exceeded', 'code': 88}]
730924654643314689 : [{'message': 'Rate limit exceeded', 'code': 88}]
730573383004487680 : [{'message': 'Rate limit exceeded', 'code': 88}]
730427201120833536 : [{'message': 'Rate limit exceeded', 'code': 88}]
730211855403241472 : [{'message': 'Rate limit exceeded', 'code': 88}]
730196704625098752 : [{'message': 'Rate limit exceeded', 'code': 88}]
729854734790754305 : [{'message': 'Rate limit exceeded', 'code': 88}]
729838605770891264 : [{'message': 'Rate limit exceeded', 'code': 88}]
729823566028484608 : [{'message': 'Rate limit exceeded', 'code': 88}]
729463711119904772 : [{'message': 'Rate limit exceeded', 'code': 88}]
729113531270991872 : [{'message': 'Rate limit exceeded', 'code': 88}]
728986383096946689 : [{'message': 'Rate limit exceeded', 'code': 88}]
728760639972315136 :

712438159032893441 : [{'message': 'Rate limit exceeded', 'code': 88}]
712309440758808576 : [{'message': 'Rate limit exceeded', 'code': 88}]
712097430750289920 : [{'message': 'Rate limit exceeded', 'code': 88}]
712092745624633345 : [{'message': 'Rate limit exceeded', 'code': 88}]
712085617388212225 : [{'message': 'Rate limit exceeded', 'code': 88}]
712065007010385924 : [{'message': 'Rate limit exceeded', 'code': 88}]
711998809858043904 : [{'message': 'Rate limit exceeded', 'code': 88}]
711968124745228288 : [{'message': 'Rate limit exceeded', 'code': 88}]
711743778164514816 : [{'message': 'Rate limit exceeded', 'code': 88}]
711732680602345472 : [{'message': 'Rate limit exceeded', 'code': 88}]
711694788429553666 : [{'message': 'Rate limit exceeded', 'code': 88}]
711652651650457602 : [{'message': 'Rate limit exceeded', 'code': 88}]
711363825979756544 : [{'message': 'Rate limit exceeded', 'code': 88}]
711306686208872448 : [{'message': 'Rate limit exceeded', 'code': 88}]
711008018775851008 :

703631701117943808 : [{'message': 'Rate limit exceeded', 'code': 88}]
703611486317502464 : [{'message': 'Rate limit exceeded', 'code': 88}]
703425003149250560 : [{'message': 'Rate limit exceeded', 'code': 88}]
703407252292673536 : [{'message': 'Rate limit exceeded', 'code': 88}]
703382836347330562 : [{'message': 'Rate limit exceeded', 'code': 88}]
703356393781329922 : [{'message': 'Rate limit exceeded', 'code': 88}]
703268521220972544 : [{'message': 'Rate limit exceeded', 'code': 88}]
703079050210877440 : [{'message': 'Rate limit exceeded', 'code': 88}]
703041949650034688 : [{'message': 'Rate limit exceeded', 'code': 88}]
702932127499816960 : [{'message': 'Rate limit exceeded', 'code': 88}]
702899151802126337 : [{'message': 'Rate limit exceeded', 'code': 88}]
702684942141153280 : [{'message': 'Rate limit exceeded', 'code': 88}]
702671118226825216 : [{'message': 'Rate limit exceeded', 'code': 88}]
702598099714314240 : [{'message': 'Rate limit exceeded', 'code': 88}]
702539513671897089 :

693942351086120961 : [{'message': 'Rate limit exceeded', 'code': 88}]
693647888581312512 : [{'message': 'Rate limit exceeded', 'code': 88}]
693644216740769793 : [{'message': 'Rate limit exceeded', 'code': 88}]
693642232151285760 : [{'message': 'Rate limit exceeded', 'code': 88}]
693629975228977152 : [{'message': 'Rate limit exceeded', 'code': 88}]
693622659251335168 : [{'message': 'Rate limit exceeded', 'code': 88}]
693590843962331137 : [{'message': 'Rate limit exceeded', 'code': 88}]
693582294167244802 : [{'message': 'Rate limit exceeded', 'code': 88}]
693486665285931008 : [{'message': 'Rate limit exceeded', 'code': 88}]
693280720173801472 : [{'message': 'Rate limit exceeded', 'code': 88}]
693267061318012928 : [{'message': 'Rate limit exceeded', 'code': 88}]
693262851218264065 : [{'message': 'Rate limit exceeded', 'code': 88}]
693231807727280129 : [{'message': 'Rate limit exceeded', 'code': 88}]
693155686491000832 : [{'message': 'Rate limit exceeded', 'code': 88}]
693109034023534592 :

686618349602762752 : [{'message': 'Rate limit exceeded', 'code': 88}]
686606069955735556 : [{'message': 'Rate limit exceeded', 'code': 88}]
686394059078897668 : [{'message': 'Rate limit exceeded', 'code': 88}]
686386521809772549 : [{'message': 'Rate limit exceeded', 'code': 88}]
686377065986265092 : [{'message': 'Rate limit exceeded', 'code': 88}]
686358356425093120 : [{'message': 'Rate limit exceeded', 'code': 88}]
686286779679375361 : [{'message': 'Rate limit exceeded', 'code': 88}]
686050296934563840 : [{'message': 'Rate limit exceeded', 'code': 88}]
686035780142297088 : [{'message': 'Rate limit exceeded', 'code': 88}]
686034024800862208 : [{'message': 'Rate limit exceeded', 'code': 88}]
686007916130873345 : [{'message': 'Rate limit exceeded', 'code': 88}]
686003207160610816 : [{'message': 'Rate limit exceeded', 'code': 88}]
685973236358713344 : [{'message': 'Rate limit exceeded', 'code': 88}]
685943807276412928 : [{'message': 'Rate limit exceeded', 'code': 88}]
685906723014619143 :

680798457301471234 : [{'message': 'Rate limit exceeded', 'code': 88}]
680609293079592961 : [{'message': 'Rate limit exceeded', 'code': 88}]
680583894916304897 : [{'message': 'Rate limit exceeded', 'code': 88}]
680497766108381184 : [{'message': 'Rate limit exceeded', 'code': 88}]
680494726643068929 : [{'message': 'Rate limit exceeded', 'code': 88}]
680473011644985345 : [{'message': 'Rate limit exceeded', 'code': 88}]
680440374763077632 : [{'message': 'Rate limit exceeded', 'code': 88}]
680221482581123072 : [{'message': 'Rate limit exceeded', 'code': 88}]
680206703334408192 : [{'message': 'Rate limit exceeded', 'code': 88}]
680191257256136705 : [{'message': 'Rate limit exceeded', 'code': 88}]
680176173301628928 : [{'message': 'Rate limit exceeded', 'code': 88}]
680161097740095489 : [{'message': 'Rate limit exceeded', 'code': 88}]
680145970311643136 : [{'message': 'Rate limit exceeded', 'code': 88}]
680130881361686529 : [{'message': 'Rate limit exceeded', 'code': 88}]
680115823365742593 :

676440007570247681 : [{'message': 'Rate limit exceeded', 'code': 88}]
676430933382295552 : [{'message': 'Rate limit exceeded', 'code': 88}]
676263575653122048 : [{'message': 'Rate limit exceeded', 'code': 88}]
676237365392908289 : [{'message': 'Rate limit exceeded', 'code': 88}]
676219687039057920 : [{'message': 'Rate limit exceeded', 'code': 88}]
676215927814406144 : [{'message': 'Rate limit exceeded', 'code': 88}]
676191832485810177 : [{'message': 'Rate limit exceeded', 'code': 88}]
676146341966438401 : [{'message': 'Rate limit exceeded', 'code': 88}]
676121918416756736 : [{'message': 'Rate limit exceeded', 'code': 88}]
676101918813499392 : [{'message': 'Rate limit exceeded', 'code': 88}]
676098748976615425 : [{'message': 'Rate limit exceeded', 'code': 88}]
676089483918516224 : [{'message': 'Rate limit exceeded', 'code': 88}]
675898130735476737 : [{'message': 'Rate limit exceeded', 'code': 88}]
675891555769696257 : [{'message': 'Rate limit exceeded', 'code': 88}]
675888385639251968 :

673708611235921920 : [{'message': 'Rate limit exceeded', 'code': 88}]
673707060090052608 : [{'message': 'Rate limit exceeded', 'code': 88}]
673705679337693185 : [{'message': 'Rate limit exceeded', 'code': 88}]
673700254269775872 : [{'message': 'Rate limit exceeded', 'code': 88}]
673697980713705472 : [{'message': 'Rate limit exceeded', 'code': 88}]
673689733134946305 : [{'message': 'Rate limit exceeded', 'code': 88}]
673688752737402881 : [{'message': 'Rate limit exceeded', 'code': 88}]
673686845050527744 : [{'message': 'Rate limit exceeded', 'code': 88}]
673680198160809984 : [{'message': 'Rate limit exceeded', 'code': 88}]
673662677122719744 : [{'message': 'Rate limit exceeded', 'code': 88}]
673656262056419329 : [{'message': 'Rate limit exceeded', 'code': 88}]
673636718965334016 : [{'message': 'Rate limit exceeded', 'code': 88}]
673612854080196609 : [{'message': 'Rate limit exceeded', 'code': 88}]
673583129559498752 : [{'message': 'Rate limit exceeded', 'code': 88}]
673580926094458881 :

671163268581498880 : [{'message': 'Rate limit exceeded', 'code': 88}]
671159727754231808 : [{'message': 'Rate limit exceeded', 'code': 88}]
671154572044468225 : [{'message': 'Rate limit exceeded', 'code': 88}]
671151324042559489 : [{'message': 'Rate limit exceeded', 'code': 88}]
671147085991960577 : [{'message': 'Rate limit exceeded', 'code': 88}]
671141549288370177 : [{'message': 'Rate limit exceeded', 'code': 88}]
671138694582165504 : [{'message': 'Rate limit exceeded', 'code': 88}]
671134062904504320 : [{'message': 'Rate limit exceeded', 'code': 88}]
671122204919246848 : [{'message': 'Rate limit exceeded', 'code': 88}]
671115716440031232 : [{'message': 'Rate limit exceeded', 'code': 88}]
671109016219725825 : [{'message': 'Rate limit exceeded', 'code': 88}]
670995969505435648 : [{'message': 'Rate limit exceeded', 'code': 88}]
670842764863651840 : [{'message': 'Rate limit exceeded', 'code': 88}]
670840546554966016 : [{'message': 'Rate limit exceeded', 'code': 88}]
670838202509447168 :

669000397445533696 : [{'message': 'Rate limit exceeded', 'code': 88}]
668994913074286592 : [{'message': 'Rate limit exceeded', 'code': 88}]
668992363537309700 : [{'message': 'Rate limit exceeded', 'code': 88}]
668989615043424256 : [{'message': 'Rate limit exceeded', 'code': 88}]
668988183816871936 : [{'message': 'Rate limit exceeded', 'code': 88}]
668986018524233728 : [{'message': 'Rate limit exceeded', 'code': 88}]
668981893510119424 : [{'message': 'Rate limit exceeded', 'code': 88}]
668979806671884288 : [{'message': 'Rate limit exceeded', 'code': 88}]
668975677807423489 : [{'message': 'Rate limit exceeded', 'code': 88}]
668967877119254528 : [{'message': 'Rate limit exceeded', 'code': 88}]
668960084974809088 : [{'message': 'Rate limit exceeded', 'code': 88}]
668955713004314625 : [{'message': 'Rate limit exceeded', 'code': 88}]
668932921458302977 : [{'message': 'Rate limit exceeded', 'code': 88}]
668902994700836864 : [{'message': 'Rate limit exceeded', 'code': 88}]
668892474547511297 :

667070482143944705 : [{'message': 'Rate limit exceeded', 'code': 88}]
667065535570550784 : [{'message': 'Rate limit exceeded', 'code': 88}]
667062181243039745 : [{'message': 'Rate limit exceeded', 'code': 88}]
667044094246576128 : [{'message': 'Rate limit exceeded', 'code': 88}]
667012601033924608 : [{'message': 'Rate limit exceeded', 'code': 88}]
666996132027977728 : [{'message': 'Rate limit exceeded', 'code': 88}]
666983947667116034 : [{'message': 'Rate limit exceeded', 'code': 88}]
666837028449972224 : [{'message': 'Rate limit exceeded', 'code': 88}]
666835007768551424 : [{'message': 'Rate limit exceeded', 'code': 88}]
666826780179869698 : [{'message': 'Rate limit exceeded', 'code': 88}]
666817836334096384 : [{'message': 'Rate limit exceeded', 'code': 88}]
666804364988780544 : [{'message': 'Rate limit exceeded', 'code': 88}]
666786068205871104 : [{'message': 'Rate limit exceeded', 'code': 88}]
666781792255496192 : [{'message': 'Rate limit exceeded', 'code': 88}]
666776908487630848 :

In [8]:
# Returns the number of unretreivable id's
len(errors)

1471

In [12]:
# write the list of dictionaries into a json file
with open('tweet_json.txt', 'w') as outfile:  
    json.dump(df_list, outfile)

In [13]:
# Write the json file into a dataframe
with open('tweet_json.txt', 'r') as file:
    api_df = pd.DataFrame(json.load(file), columns=['tweet_id', 'favorite_count', 'retweet_count'])

## Assessing Data

### 1. Visual assessment

#### Twitter csv file - Archive

In [5]:
# Display the Archive csv file
archive_df

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,
1,892177421306343426,,,2017-08-01 00:17:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Tilly. She's just checking pup on you....,,,,https://twitter.com/dog_rates/status/892177421...,13,10,Tilly,,,,
2,891815181378084864,,,2017-07-31 00:18:03 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Archie. He is a rare Norwegian Pouncin...,,,,https://twitter.com/dog_rates/status/891815181...,12,10,Archie,,,,
3,891689557279858688,,,2017-07-30 15:58:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Darla. She commenced a snooze mid meal...,,,,https://twitter.com/dog_rates/status/891689557...,13,10,Darla,,,,
4,891327558926688256,,,2017-07-29 16:00:24 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Franklin. He would like you to stop ca...,,,,https://twitter.com/dog_rates/status/891327558...,12,10,Franklin,,,,
5,891087950875897856,,,2017-07-29 00:08:17 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here we have a majestic great white breaching ...,,,,https://twitter.com/dog_rates/status/891087950...,13,10,,,,,
6,890971913173991426,,,2017-07-28 16:27:12 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Meet Jax. He enjoys ice cream so much he gets ...,,,,"https://gofundme.com/ydvmve-surgery-for-jax,ht...",13,10,Jax,,,,
7,890729181411237888,,,2017-07-28 00:22:40 +0000,"<a href=""http://twitter.com/download/iphone"" r...",When you watch your owner call another dog a g...,,,,https://twitter.com/dog_rates/status/890729181...,13,10,,,,,
8,890609185150312448,,,2017-07-27 16:25:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Zoey. She doesn't want to be one of th...,,,,https://twitter.com/dog_rates/status/890609185...,13,10,Zoey,,,,
9,890240255349198849,,,2017-07-26 15:59:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Cassie. She is a college pup. Studying...,,,,https://twitter.com/dog_rates/status/890240255...,14,10,Cassie,doggo,,,


In [7]:
archive_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2356 entries, 0 to 2355
Data columns (total 17 columns):
tweet_id                      2356 non-null int64
in_reply_to_status_id         78 non-null float64
in_reply_to_user_id           78 non-null float64
timestamp                     2356 non-null object
source                        2356 non-null object
text                          2356 non-null object
retweeted_status_id           181 non-null float64
retweeted_status_user_id      181 non-null float64
retweeted_status_timestamp    181 non-null object
expanded_urls                 2297 non-null object
rating_numerator              2356 non-null int64
rating_denominator            2356 non-null int64
name                          2356 non-null object
doggo                         2356 non-null object
floofer                       2356 non-null object
pupper                        2356 non-null object
puppo                         2356 non-null object
dtypes: float64(4), int64(3), ob

In [8]:
archive_df.describe()

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,retweeted_status_id,retweeted_status_user_id,rating_numerator,rating_denominator
count,2356.0,78.0,78.0,181.0,181.0,2356.0,2356.0
mean,7.427716e+17,7.455079e+17,2.014171e+16,7.7204e+17,1.241698e+16,13.126486,10.455433
std,6.856705e+16,7.582492e+16,1.252797e+17,6.236928e+16,9.599254e+16,45.876648,6.745237
min,6.660209e+17,6.658147e+17,11856340.0,6.661041e+17,783214.0,0.0,0.0
25%,6.783989e+17,6.757419e+17,308637400.0,7.186315e+17,4196984000.0,10.0,10.0
50%,7.196279e+17,7.038708e+17,4196984000.0,7.804657e+17,4196984000.0,11.0,10.0
75%,7.993373e+17,8.257804e+17,4196984000.0,8.203146e+17,4196984000.0,12.0,10.0
max,8.924206e+17,8.862664e+17,8.405479e+17,8.87474e+17,7.874618e+17,1776.0,170.0


#### Twitter tsv file - Image Predictions

In [6]:
# Display the Image Predictions File
predictive_df

Unnamed: 0,tweet_id,jpg_url,img_num,p1,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog
0,666020888022790149,https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg,1,Welsh_springer_spaniel,0.465074,True,collie,0.156665,True,Shetland_sheepdog,0.061428,True
1,666029285002620928,https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg,1,redbone,0.506826,True,miniature_pinscher,0.074192,True,Rhodesian_ridgeback,0.072010,True
2,666033412701032449,https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg,1,German_shepherd,0.596461,True,malinois,0.138584,True,bloodhound,0.116197,True
3,666044226329800704,https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg,1,Rhodesian_ridgeback,0.408143,True,redbone,0.360687,True,miniature_pinscher,0.222752,True
4,666049248165822465,https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg,1,miniature_pinscher,0.560311,True,Rottweiler,0.243682,True,Doberman,0.154629,True
5,666050758794694657,https://pbs.twimg.com/media/CT5Jof1WUAEuVxN.jpg,1,Bernese_mountain_dog,0.651137,True,English_springer,0.263788,True,Greater_Swiss_Mountain_dog,0.016199,True
6,666051853826850816,https://pbs.twimg.com/media/CT5KoJ1WoAAJash.jpg,1,box_turtle,0.933012,False,mud_turtle,0.045885,False,terrapin,0.017885,False
7,666055525042405380,https://pbs.twimg.com/media/CT5N9tpXIAAifs1.jpg,1,chow,0.692517,True,Tibetan_mastiff,0.058279,True,fur_coat,0.054449,False
8,666057090499244032,https://pbs.twimg.com/media/CT5PY90WoAAQGLo.jpg,1,shopping_cart,0.962465,False,shopping_basket,0.014594,False,golden_retriever,0.007959,True
9,666058600524156928,https://pbs.twimg.com/media/CT5Qw94XAAA_2dP.jpg,1,miniature_poodle,0.201493,True,komondor,0.192305,True,soft-coated_wheaten_terrier,0.082086,True


In [9]:
predictive_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2075 entries, 0 to 2074
Data columns (total 12 columns):
tweet_id    2075 non-null int64
jpg_url     2075 non-null object
img_num     2075 non-null int64
p1          2075 non-null object
p1_conf     2075 non-null float64
p1_dog      2075 non-null bool
p2          2075 non-null object
p2_conf     2075 non-null float64
p2_dog      2075 non-null bool
p3          2075 non-null object
p3_conf     2075 non-null float64
p3_dog      2075 non-null bool
dtypes: bool(3), float64(3), int64(2), object(4)
memory usage: 152.1+ KB


In [11]:
predictive_df.describe()

Unnamed: 0,tweet_id,img_num,p1_conf,p2_conf,p3_conf
count,2075.0,2075.0,2075.0,2075.0,2075.0
mean,7.384514e+17,1.203855,0.594548,0.1345886,0.06032417
std,6.785203e+16,0.561875,0.271174,0.1006657,0.05090593
min,6.660209e+17,1.0,0.044333,1.0113e-08,1.74017e-10
25%,6.764835e+17,1.0,0.364412,0.05388625,0.0162224
50%,7.119988e+17,1.0,0.58823,0.118181,0.0494438
75%,7.932034e+17,1.0,0.843855,0.1955655,0.09180755
max,8.924206e+17,4.0,1.0,0.488014,0.273419


#### Twitter API

In [15]:
# Display the twitter api dataframe
api_df

Unnamed: 0,tweet_id,favorite_count,retweet_count
0,892420643555336193,37694,8215
1,892177421306343426,32379,6075
2,891815181378084864,24380,4017
3,891689557279858688,41011,8362
4,891327558926688256,39208,9074
5,891087950875897856,19715,3007
6,890971913173991426,11529,1989
7,890729181411237888,63583,18252
8,890609185150312448,27095,4133
9,890240255349198849,31076,7135


In [17]:
api_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 885 entries, 0 to 884
Data columns (total 3 columns):
tweet_id          885 non-null object
favorite_count    885 non-null int64
retweet_count     885 non-null int64
dtypes: int64(2), object(1)
memory usage: 20.8+ KB


In [18]:
api_df.describe()

Unnamed: 0,favorite_count,retweet_count
count,885.0,885.0
mean,14405.610169,5080.988701
std,15149.543837,5703.863232
min,0.0,1.0
25%,5457.0,2094.0
50%,11019.0,3463.0
75%,20340.0,5947.0
max,139047.0,60376.0


In [None]:
# Display the first five rows of the DataFrame using .head

In [None]:
# Display the last five rows of the DataFrame using .tail

In [None]:
# Display a basic summary of the DataFrame using .info

In [None]:
# Display the entry counts for the Year column using .value_counts

- Missing values (NaN)
- StartDate inconsistencies (ASAP)
- Nondescriptive column headers (ApplicationP, AboutC, RequiredQual ... and also JobRequirment)

### 2. Programmatic assessment

## Clean

### Issue 1: 

#### Define

#### Code

#### Test

## Clean