# Project: Wrangling WeRateDogs' Enhanced Twitter Archive

### Scope
---------------------------------------------------------
WeRateDogs (later referred to as WRD in this document) is a Twitter account that rates people's dogs with a humorous comment about the dog. These ratings almost always have a denominator of 10. The numerators, though? Almost always greater than 10. 11/10, 12/10, 13/10, etc. Why? Because "they're good dogs Brent." WRD has over 4 million followers and has received international media coverage.

This project focuses on actualizing and accentuating the three data wrangling techniques on the WRD Twitter archive. WRD downloaded their Twitter archive and sent it to Udacity via email exclusively for use in this project. The archive contains basic tweet data (tweet ID, timestamp, text, etc.) for all 5000+ of their tweets as they stood on August 1, 2017.

### Modules used in this notebook:
* Pandas
* Numpy
* Matplotlib
* Seaborn
* Tweepy
* Requests
* Plotly
* Datetime
* Pydash

In [4]:
pip install python-dotenv tweepy pydash

Collecting python-dotenvNote: you may need to restart the kernel to use updated packages.
  Downloading python_dotenv-0.20.0-py3-none-any.whl (17 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-0.20.0





In [5]:
import requests
import tweepy as twpy
from dotenv import load_dotenv
import timeit
import datetime as dt
import json

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os

from pydash import at
import plotly.express as px

%matplotlib inline

### Gather

#### This project involves obtaining three seperate datasets from various sources. I will be using different methods to obtain each dataset as specified below. 

This custom function built from Pandas' `read_csv()` method will be used to read various datasets. The `header` and `names` parameters will override each depending on which of the two is set to `None`

In [6]:
def open_set(csv, sep=',', header=0, names=[]):
    df = pd.read_csv(csv, low_memory=False, sep=sep, names=names, header=header)
    
    return df

**`twitter-archive-enhanced` table**

The data for WRD Twitter archive was provided by Udacity and **downloaded manually through the Chrome browser.**

In [7]:
df_tw_arch = open_set('data/twitter-archive-enhanced.csv', header=0, names=None)
df_tw_arch

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,
1,892177421306343426,,,2017-08-01 00:17:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Tilly. She's just checking pup on you....,,,,https://twitter.com/dog_rates/status/892177421...,13,10,Tilly,,,,
2,891815181378084864,,,2017-07-31 00:18:03 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Archie. He is a rare Norwegian Pouncin...,,,,https://twitter.com/dog_rates/status/891815181...,12,10,Archie,,,,
3,891689557279858688,,,2017-07-30 15:58:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Darla. She commenced a snooze mid meal...,,,,https://twitter.com/dog_rates/status/891689557...,13,10,Darla,,,,
4,891327558926688256,,,2017-07-29 16:00:24 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Franklin. He would like you to stop ca...,,,,https://twitter.com/dog_rates/status/891327558...,12,10,Franklin,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2351,666049248165822465,,,2015-11-16 00:24:50 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here we have a 1949 1st generation vulpix. Enj...,,,,https://twitter.com/dog_rates/status/666049248...,5,10,,,,,
2352,666044226329800704,,,2015-11-16 00:04:52 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is a purebred Piers Morgan. Loves to Netf...,,,,https://twitter.com/dog_rates/status/666044226...,6,10,a,,,,
2353,666033412701032449,,,2015-11-15 23:21:54 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Here is a very happy pup. Big fan of well-main...,,,,https://twitter.com/dog_rates/status/666033412...,9,10,a,,,,
2354,666029285002620928,,,2015-11-15 23:05:30 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is a western brown Mitsubishi terrier. Up...,,,,https://twitter.com/dog_rates/status/666029285...,7,10,a,,,,


    ERROR: Command errored out with exit status 1:
     command: 'C:\Users\Tevin Aduma\anaconda3\python.exe' -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\tevinaduma\\AppData\\Local\\Temp\\pip-install-k3wfaycm\\dotenv_7f8073c62bfd4d5ea3aa1308b2ef36eb\\setup.py'"'"'; __file__='"'"'C:\\Users\\tevinaduma\\AppData\\Local\\Temp\\pip-install-k3wfaycm\\dotenv_7f8073c62bfd4d5ea3aa1308b2ef36eb\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\tevinaduma\AppData\Local\Temp\pip-pip-egg-info-eddd43oy'
         cwd: C:\Users\tevinaduma\AppData\Local\Temp\pip-install-k3wfaycm\dotenv_7f8073c62bfd4d5ea3aa1308b2ef36eb\
    Complete output (1591 lines):
        ERROR: Command errored out with exit status 1:
         command:

Collecting dotenv
  Using cached dotenv-0.0.5.tar.gz (2.4 kB)
  Using cached dotenv-0.0.4.tar.gz (2.0 kB)
  Using cached dotenv-0.0.2.tar.gz (6.7 kB)
  Using cached dotenv-0.0.1.tar.gz (6.5 kB)



            from setuptools.dist import _get_unpatched
          File "C:\Users\tevinaduma\AppData\Local\Temp\pip-wheel-fptzyuze\distribute_d3d45f1952ae4aa7b9f73b26de987aa3\setuptools\dist.py", line 103
            except ValueError, e:
                             ^
        SyntaxError: invalid syntax
        ----------------------------------------
        ERROR: Command errored out with exit status 1:
         command: 'C:\Users\Tevin Aduma\anaconda3\python.exe' -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\tevinaduma\\AppData\\Local\\Temp\\pip-wheel-fptzyuze\\distribute_3cb678c88a2946f1a87d5a8fc223afab\\setup.py'"'"'; __file__='"'"'C:\\Users\\tevinaduma\\AppData\\Local\\Temp\\pip-wheel-fptzyuze\\distribute_3cb678c88a2946f1a87d5a8fc223afab\\setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"')

In [8]:
df_tw_arch.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2356 entries, 0 to 2355
Data columns (total 17 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   tweet_id                    2356 non-null   int64  
 1   in_reply_to_status_id       78 non-null     float64
 2   in_reply_to_user_id         78 non-null     float64
 3   timestamp                   2356 non-null   object 
 4   source                      2356 non-null   object 
 5   text                        2356 non-null   object 
 6   retweeted_status_id         181 non-null    float64
 7   retweeted_status_user_id    181 non-null    float64
 8   retweeted_status_timestamp  181 non-null    object 
 9   expanded_urls               2297 non-null   object 
 10  rating_numerator            2356 non-null   int64  
 11  rating_denominator          2356 non-null   int64  
 12  name                        2356 non-null   object 
 13  doggo                       2356 

In [9]:
df_tw_arch.shape

(2356, 17)

**`image-predictions` table**

The `image_predictions.tsv` file is present in each tweet according to a neural network. It is hosted on Udacity's servers will be **downloaded programmatically using the Requests library**. 

The content obtained from the file hosted on the url will be written into a file on the local machine using Python's **open()** method. 

In [16]:
url = 'https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv'

* Use **requests.get() to obtain the data from the url.
* Parse the content into a new file named `image-predictions.tsv`. (Note the file is opened using `wb` since the content obtained is returned in byte format)

In [20]:
r = requests.get(url)

try:
    f = open('image-predictions.tsv', 'wb')
    f.write(r.content)
    
finally:
    f.close()

* tsv stands for tab-separated-values so it would make sense to specify tabs (`\t`) as the separator in Pandas' `read_csv() method`

In [299]:
df_image_pred = open_set('data/image-predictions.tsv', sep='\t', names=None, header=0)
df_image_pred.head()

Unnamed: 0,tweet_id,jpg_url,img_num,p1,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog
0,666020888022790149,https://pbs.twimg.com/media/CT4udn0WwAA0aMy.jpg,1,Welsh_springer_spaniel,0.465074,True,collie,0.156665,True,Shetland_sheepdog,0.061428,True
1,666029285002620928,https://pbs.twimg.com/media/CT42GRgUYAA5iDo.jpg,1,redbone,0.506826,True,miniature_pinscher,0.074192,True,Rhodesian_ridgeback,0.07201,True
2,666033412701032449,https://pbs.twimg.com/media/CT4521TWwAEvMyu.jpg,1,German_shepherd,0.596461,True,malinois,0.138584,True,bloodhound,0.116197,True
3,666044226329800704,https://pbs.twimg.com/media/CT5Dr8HUEAA-lEu.jpg,1,Rhodesian_ridgeback,0.408143,True,redbone,0.360687,True,miniature_pinscher,0.222752,True
4,666049248165822465,https://pbs.twimg.com/media/CT5IQmsXIAAKY4A.jpg,1,miniature_pinscher,0.560311,True,Rottweiler,0.243682,True,Doberman,0.154629,True


In [30]:
df_image_pred.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2075 entries, 0 to 2074
Data columns (total 12 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   tweet_id  2075 non-null   int64  
 1   jpg_url   2075 non-null   object 
 2   img_num   2075 non-null   int64  
 3   p1        2075 non-null   object 
 4   p1_conf   2075 non-null   float64
 5   p1_dog    2075 non-null   bool   
 6   p2        2075 non-null   object 
 7   p2_conf   2075 non-null   float64
 8   p2_dog    2075 non-null   bool   
 9   p3        2075 non-null   object 
 10  p3_conf   2075 non-null   float64
 11  p3_dog    2075 non-null   bool   
dtypes: bool(3), float64(3), int64(2), object(4)
memory usage: 152.1+ KB


In [32]:
df_image_pred.shape

(2075, 12)

In [151]:
new = []
for id in ids:
    new.append(api.get_status(id, tweet_mode='extended')._json)
    print(new)
type(new)

[{'created_at': 'Tue Aug 01 16:23:56 +0000 2017', 'id': 892420643555336193, 'id_str': '892420643555336193', 'full_text': "This is Phineas. He's a mystical boy. Only ever appears in the hole of a donut. 13/10 https://t.co/MgUWQ76dJU", 'truncated': False, 'display_text_range': [0, 85], 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 892420639486877696, 'id_str': '892420639486877696', 'indices': [86, 109], 'media_url': 'http://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg', 'media_url_https': 'https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg', 'url': 'https://t.co/MgUWQ76dJU', 'display_url': 'pic.twitter.com/MgUWQ76dJU', 'expanded_url': 'https://twitter.com/dog_rates/status/892420643555336193/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'medium': {'w': 540, 'h': 528, 'resize': 'fit'}, 'small': {'w': 540, 'h': 528, 'resize': 'fit'}, 'large': {'w': 540, 'h': 528, 'resize': 'fit'}}}]}, 'extended_entities': {'medi

list

In [156]:
new

[{'created_at': 'Tue Aug 01 16:23:56 +0000 2017',
  'id': 892420643555336193,
  'id_str': '892420643555336193',
  'full_text': "This is Phineas. He's a mystical boy. Only ever appears in the hole of a donut. 13/10 https://t.co/MgUWQ76dJU",
  'truncated': False,
  'display_text_range': [0, 85],
  'entities': {'hashtags': [],
   'symbols': [],
   'user_mentions': [],
   'urls': [],
   'media': [{'id': 892420639486877696,
     'id_str': '892420639486877696',
     'indices': [86, 109],
     'media_url': 'http://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg',
     'media_url_https': 'https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg',
     'url': 'https://t.co/MgUWQ76dJU',
     'display_url': 'pic.twitter.com/MgUWQ76dJU',
     'expanded_url': 'https://twitter.com/dog_rates/status/892420643555336193/photo/1',
     'type': 'photo',
     'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'},
      'medium': {'w': 540, 'h': 528, 'resize': 'fit'},
      'small': {'w': 540, 'h': 528, 'resize': 'fit'}

In [157]:
new[0]

{'created_at': 'Tue Aug 01 16:23:56 +0000 2017',
 'id': 892420643555336193,
 'id_str': '892420643555336193',
 'full_text': "This is Phineas. He's a mystical boy. Only ever appears in the hole of a donut. 13/10 https://t.co/MgUWQ76dJU",
 'truncated': False,
 'display_text_range': [0, 85],
 'entities': {'hashtags': [],
  'symbols': [],
  'user_mentions': [],
  'urls': [],
  'media': [{'id': 892420639486877696,
    'id_str': '892420639486877696',
    'indices': [86, 109],
    'media_url': 'http://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg',
    'media_url_https': 'https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg',
    'url': 'https://t.co/MgUWQ76dJU',
    'display_url': 'pic.twitter.com/MgUWQ76dJU',
    'expanded_url': 'https://twitter.com/dog_rates/status/892420643555336193/photo/1',
    'type': 'photo',
    'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'},
     'medium': {'w': 540, 'h': 528, 'resize': 'fit'},
     'small': {'w': 540, 'h': 528, 'resize': 'fit'},
     'large': {'w': 

In [155]:
new[0]['created_at']

'Tue Aug 01 16:23:56 +0000 2017'

In [93]:
new.geo

In [95]:
new.retweet_count

7010

In [97]:
new.favorite_count

33829

In [98]:
new.lang

'en'

In [132]:
my_dict = new._json
my_dict

{'created_at': 'Tue Aug 01 16:23:56 +0000 2017',
 'id': 892420643555336193,
 'id_str': '892420643555336193',
 'full_text': "This is Phineas. He's a mystical boy. Only ever appears in the hole of a donut. 13/10 https://t.co/MgUWQ76dJU",
 'truncated': False,
 'display_text_range': [0, 85],
 'entities': {'hashtags': [],
  'symbols': [],
  'user_mentions': [],
  'urls': [],
  'media': [{'id': 892420639486877696,
    'id_str': '892420639486877696',
    'indices': [86, 109],
    'media_url': 'http://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg',
    'media_url_https': 'https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg',
    'url': 'https://t.co/MgUWQ76dJU',
    'display_url': 'pic.twitter.com/MgUWQ76dJU',
    'expanded_url': 'https://twitter.com/dog_rates/status/892420643555336193/photo/1',
    'type': 'photo',
    'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'},
     'medium': {'w': 540, 'h': 528, 'resize': 'fit'},
     'small': {'w': 540, 'h': 528, 'resize': 'fit'},
     'large': {'w': 

In [137]:
at(my_dict, 'id', 'retweet_count', 'favorite_count', 'place', 'geo', 'lang')

[892420643555336193, 7010, 33829, None, None, 'en']

**obtaining additional tweet data from `Twitter API` and `Tweepy`**
For our final dataset, we will be using the Twitter API and Python's Tweepy library to query Twitter for each tweet's **retweet_count**, **favorite_count**, **geo data** and **language data**. These attributes will later be used to generate insights.

**Note that you need either a combination of your consumer_key, consumer_secret, access_key and access_key secret or a bearer_token to query data through the Twitter API.**

In [10]:
# Obtain credentials
load_dotenv('.env')

True

In [11]:
print(os.getenv('BEARER_TOKEN'))

AAAAAAAAAAAAAAAAAAAAAL0IeAEAAAAAaGicoxtmvKnV3U9xeG1909dyYis%3DLNICnrPQtv50faMRFmcfWzqhnJrXbrUZajtTgpdvV6OOsFyZ5O


I opted for a custom function to obtain tweet data through Tweepy's [**_get_status()_**](http://docs.tweepy.org/en/v3.5.0/api.html) method. The `tweet_id` from  the `df_tw_arch` dataset will be converted to a list and passed into the function. Each tweet's JSON data that I require for this project will be parsed into a new file (`tweet_json.txt`) and appended one after the other.

There are a few tweets and retweets that may have been deleted since WRD's submission of their archive. I have used a `try-except` block to capture their **tweet_id** into a separate array for later analysis.

In [235]:
# custom to extract tweet data
def get_tweets(ids):
    
    # Authorization to bearer_token
    auth = twpy.OAuth2BearerHandler(os.getenv('BEARER_TOKEN'))
    
    # Calling api
    api = twpy.API(auth, wait_on_rate_limit = True)
    
    # Empty Array
    del_tweets = []
    
    # Start a code timer for the loop
    start = timeit.timeit()  
    for tw_id in ids: 
        try:
            tw_status = api.get_status(tw_id, tweet_mode='extended')._json
            try:
                f = open('tweet_json.txt', 'a+', encoding='utf-8')
                f.write(f"{tw_status['id']},{tw_status['retweet_count']},{tw_status['favorite_count']},{tw_status['geo']},{tw_status['lang']}\n")
            finally:
                f.close()
            rt_count = tw_status['retweet_count']
            fv_count = tw_status['favorite_count']
                        
            print(f'This tweet -> {tw_id} has {rt_count} retweets and {fv_count} likes')
        except Exception as e:
            print(f'This tweet -> {tw_id} has been deleted')
            del_tweets.append({'tweet_id': tw_id})
    
    # Stop the code timer for our loop
    end = timeit.timeit()
    
    # Calculate how long it took to run
    print(f'This code took {end - start} to run')
    
    # Print out the deleted tweet_ids
    return (f'These are the deleted tweet_ids:\n{del_tweets}')

* Capture all the data in the `tweet_id` column of `df_tw_arch`

In [160]:
tw_ids = list(df_tw_arch.tweet_id)
tw_ids

[892420643555336193,
 892177421306343426,
 891815181378084864,
 891689557279858688,
 891327558926688256,
 891087950875897856,
 890971913173991426,
 890729181411237888,
 890609185150312448,
 890240255349198849,
 890006608113172480,
 889880896479866881,
 889665388333682689,
 889638837579907072,
 889531135344209921,
 889278841981685760,
 888917238123831296,
 888804989199671297,
 888554962724278272,
 888202515573088257,
 888078434458587136,
 887705289381826560,
 887517139158093824,
 887473957103951883,
 887343217045368832,
 887101392804085760,
 886983233522544640,
 886736880519319552,
 886680336477933568,
 886366144734445568,
 886267009285017600,
 886258384151887873,
 886054160059072513,
 885984800019947520,
 885528943205470208,
 885518971528720385,
 885311592912609280,
 885167619883638784,
 884925521741709313,
 884876753390489601,
 884562892145688576,
 884441805382717440,
 884247878851493888,
 884162670584377345,
 883838122936631299,
 883482846933004288,
 883360690899218434,
 883117836046

* Pass the list of tweet_ids into our custom function `get_tweets()`

In [236]:
get_tweets(tw_ids)

This tweet -> 892420643555336193 has 7010 retweets and 33829 likes
This tweet -> 892177421306343426 has 5301 retweets and 29340 likes
This tweet -> 891815181378084864 has 3482 retweets and 22070 likes
This tweet -> 891689557279858688 has 7228 retweets and 36953 likes
This tweet -> 891327558926688256 has 7765 retweets and 35324 likes
This tweet -> 891087950875897856 has 2602 retweets and 17815 likes
This tweet -> 890971913173991426 has 1666 retweets and 10370 likes
This tweet -> 890729181411237888 has 15761 retweets and 56891 likes
This tweet -> 890609185150312448 has 3624 retweets and 24530 likes
This tweet -> 890240255349198849 has 6101 retweets and 27973 likes
This tweet -> 890006608113172480 has 6154 retweets and 27051 likes
This tweet -> 889880896479866881 has 4168 retweets and 24574 likes
This tweet -> 889665388333682689 has 8354 retweets and 42073 likes
This tweet -> 889638837579907072 has 3718 retweets and 23688 likes
This tweet -> 889531135344209921 has 1884 retweets and 13358 

This tweet -> 868622495443632128 has 4474 retweets and 23656 likes
This tweet -> 868552278524837888 has 1754 retweets and 8972 likes
This tweet -> 867900495410671616 has 3574 retweets and 21620 likes
This tweet -> 867774946302451713 has 6258 retweets and 30288 likes
This tweet -> 867421006826221569 has 2118 retweets and 14355 likes
This tweet -> 867072653475098625 has 101 retweets and 0 likes
This tweet -> 867051520902168576 has 6792 retweets and 28728 likes
This tweet -> 866816280283807744 has been deleted
This tweet -> 866720684873056260 has 4133 retweets and 17882 likes
This tweet -> 866686824827068416 has 2973 retweets and 17143 likes
This tweet -> 866450705531457537 has 30222 retweets and 108951 likes
This tweet -> 866334964761202691 has 12218 retweets and 46650 likes
This tweet -> 866094527597207552 has 7172 retweets and 0 likes
This tweet -> 865718153858494464 has 4853 retweets and 22948 likes
This tweet -> 865359393868664832 has 4295 retweets and 23627 likes
This tweet -> 86500

This tweet -> 844979544864018432 has 2318 retweets and 12701 likes
This tweet -> 844973813909606400 has 2862 retweets and 13955 likes
This tweet -> 844704788403113984 has been deleted
This tweet -> 844580511645339650 has 2822 retweets and 15313 likes
This tweet -> 844223788422217728 has 1994 retweets and 12721 likes
This tweet -> 843981021012017153 has 2703 retweets and 14155 likes
This tweet -> 843856843873095681 has 4176 retweets and 19998 likes
This tweet -> 843604394117681152 has 2485 retweets and 15740 likes
This tweet -> 843235543001513987 has 5430 retweets and 20013 likes
This tweet -> 842892208864923648 has been deleted
This tweet -> 842846295480000512 has 3319 retweets and 14245 likes
This tweet -> 842765311967449089 has 1177 retweets and 6304 likes
This tweet -> 842535590457499648 has 3191 retweets and 16953 likes
This tweet -> 842163532590374912 has 5274 retweets and 22782 likes
This tweet -> 842115215311396866 has 2764 retweets and 13061 likes
This tweet -> 8418339930205388

This tweet -> 828011680017821696 has 2000 retweets and 9786 likes
This tweet -> 827933404142436356 has 4843 retweets and 18879 likes
This tweet -> 827653905312006145 has 2798 retweets and 14611 likes
This tweet -> 827600520311402496 has 865 retweets and 7064 likes
This tweet -> 827324948884643840 has 2812 retweets and 14970 likes
This tweet -> 827228250799742977 has been deleted
This tweet -> 827199976799354881 has 2064 retweets and 10012 likes
This tweet -> 826958653328592898 has 4616 retweets and 20325 likes
This tweet -> 826848821049180160 has 9356 retweets and 34080 likes
This tweet -> 826615380357632002 has 3686 retweets and 0 likes
This tweet -> 826598799820865537 has 230 retweets and 4865 likes
This tweet -> 826598365270007810 has 2183 retweets and 9460 likes
This tweet -> 826476773533745153 has 3896 retweets and 17394 likes
This tweet -> 826240494070030336 has 2423 retweets and 12540 likes
This tweet -> 826204788643753985 has 856 retweets and 4602 likes
This tweet -> 8261152722

This tweet -> 813081950185472002 has 2598 retweets and 9422 likes
This tweet -> 813066809284972545 has 1830 retweets and 7545 likes
This tweet -> 813051746834595840 has 6867 retweets and 19931 likes
This tweet -> 812781120811126785 has 1787 retweets and 7158 likes
This tweet -> 812747805718642688 has been deleted
This tweet -> 812709060537683968 has 1353 retweets and 6297 likes
This tweet -> 812503143955202048 has 1148 retweets and 5745 likes
This tweet -> 812466873996607488 has 1802 retweets and 7569 likes
This tweet -> 812372279581671427 has 3427 retweets and 12946 likes
This tweet -> 811985624773361665 has 1312 retweets and 6920 likes
This tweet -> 811744202451197953 has 1479 retweets and 7165 likes
This tweet -> 811647686436880384 has 682 retweets and 5323 likes
This tweet -> 811627233043480576 has 2856 retweets and 12084 likes
This tweet -> 811386762094317568 has 5989 retweets and 19887 likes
This tweet -> 810984652412424192 has 1335 retweets and 5062 likes
This tweet -> 810896069

This tweet -> 794926597468000259 has 2164 retweets and 9694 likes
This tweet -> 794355576146903043 has 9674 retweets and 0 likes
This tweet -> 794332329137291264 has 2541 retweets and 9144 likes
This tweet -> 794205286408003585 has 3117 retweets and 8780 likes
This tweet -> 793962221541933056 has 4584 retweets and 15923 likes
This tweet -> 793845145112371200 has 1750 retweets and 8783 likes
This tweet -> 793614319594401792 has 2932 retweets and 0 likes
This tweet -> 793601777308463104 has 1519 retweets and 7589 likes
This tweet -> 793500921481273345 has 2237 retweets and 10137 likes
This tweet -> 793286476301799424 has 8604 retweets and 23371 likes
This tweet -> 793271401113350145 has 2250 retweets and 8263 likes
This tweet -> 793256262322548741 has 7815 retweets and 18957 likes
This tweet -> 793241302385262592 has 3082 retweets and 10004 likes
This tweet -> 793226087023144960 has 2721 retweets and 9293 likes
This tweet -> 793210959003287553 has 2614 retweets and 8489 likes
This tweet 

This tweet -> 778774459159379968 has 9165 retweets and 0 likes
This tweet -> 778764940568104960 has 338 retweets and 823 likes
This tweet -> 778748913645780993 has 1208 retweets and 6466 likes
This tweet -> 778650543019483137 has 1393 retweets and 5429 likes
This tweet -> 778624900596654080 has 943 retweets and 4346 likes
This tweet -> 778408200802557953 has 4011 retweets and 12876 likes
This tweet -> 778396591732486144 has 11350 retweets and 0 likes
This tweet -> 778383385161035776 has 1019 retweets and 5508 likes
This tweet -> 778286810187399168 has 3065 retweets and 9747 likes
This tweet -> 778039087836069888 has 2469 retweets and 7997 likes
This tweet -> 778027034220126208 has 1494 retweets and 6203 likes
This tweet -> 777953400541634568 has 3273 retweets and 0 likes
This tweet -> 777885040357281792 has 1517 retweets and 6000 likes
This tweet -> 777684233540206592 has 2728 retweets and 10632 likes
This tweet -> 777641927919427584 has 3924 retweets and 0 likes
This tweet -> 77762151

This tweet -> 760641137271070720 has 1190 retweets and 4685 likes
This tweet -> 760539183865880579 has 3349 retweets and 7086 likes
This tweet -> 760521673607086080 has 1288 retweets and 3929 likes
This tweet -> 760290219849637889 has 10582 retweets and 25113 likes
This tweet -> 760252756032651264 has 798 retweets and 3714 likes
This tweet -> 760190180481531904 has 1655 retweets and 5363 likes
This tweet -> 760153949710192640 has 28 retweets and 0 likes
This tweet -> 759943073749200896 has 1934 retweets and 5541 likes
This tweet -> 759923798737051648 has been deleted
This tweet -> 759846353224826880 has 1801 retweets and 6299 likes
This tweet -> 759793422261743616 has 1764 retweets and 5591 likes
This tweet -> 759566828574212096 has been deleted
This tweet -> 759557299618865152 has 1106 retweets and 4408 likes
This tweet -> 759447681597108224 has 2277 retweets and 7954 likes
This tweet -> 759446261539934208 has 454 retweets and 1566 likes
This tweet -> 759197388317847553 has 1804 retwe

This tweet -> 747651430853525504 has 144 retweets and 1276 likes
This tweet -> 747648653817413632 has 5342 retweets and 11992 likes
This tweet -> 747600769478692864 has 523 retweets and 2171 likes
This tweet -> 747594051852075008 has 951 retweets and 3428 likes
This tweet -> 747512671126323200 has 1478 retweets and 5142 likes
This tweet -> 747461612269887489 has 950 retweets and 3601 likes
This tweet -> 747439450712596480 has 1753 retweets and 5046 likes
This tweet -> 747242308580548608 has 2638 retweets and 0 likes
This tweet -> 747219827526344708 has 1444 retweets and 4902 likes
This tweet -> 747204161125646336 has 833 retweets and 3131 likes
This tweet -> 747103485104099331 has 3632 retweets and 8907 likes
This tweet -> 746906459439529985 has 271 retweets and 2726 likes
This tweet -> 746872823977771008 has 1953 retweets and 5585 likes
This tweet -> 746818907684614144 has 1599 retweets and 4956 likes
This tweet -> 746790600704425984 has 1467 retweets and 4531 likes
This tweet -> 7467

This tweet -> 729113531270991872 has 288 retweets and 1743 likes
This tweet -> 728986383096946689 has 742 retweets and 2897 likes
This tweet -> 728760639972315136 has 1547 retweets and 4304 likes
This tweet -> 728751179681943552 has 619 retweets and 2553 likes
This tweet -> 728653952833728512 has 949 retweets and 3068 likes
This tweet -> 728409960103686147 has 1827 retweets and 4511 likes
This tweet -> 728387165835677696 has 880 retweets and 3432 likes
This tweet -> 728046963732717569 has 1086 retweets and 3952 likes
This tweet -> 728035342121635841 has 1516 retweets and 4170 likes
This tweet -> 728015554473250816 has 993 retweets and 3798 likes
This tweet -> 727685679342333952 has 577 retweets and 2737 likes
This tweet -> 727644517743104000 has 1592 retweets and 5403 likes
This tweet -> 727524757080539137 has 1089 retweets and 4122 likes
This tweet -> 727314416056803329 has 673 retweets and 3073 likes
This tweet -> 727286334147182592 has 748 retweets and 2834 likes
This tweet -> 72717

This tweet -> 710588934686908417 has 1702 retweets and 4187 likes
This tweet -> 710296729921429505 has 672 retweets and 2209 likes
This tweet -> 710283270106132480 has 482 retweets and 2003 likes
This tweet -> 710272297844797440 has 1146 retweets and 4157 likes
This tweet -> 710269109699739648 has 1019 retweets and 2205 likes
This tweet -> 710153181850935296 has 814 retweets and 2692 likes
This tweet -> 710140971284037632 has 807 retweets and 2526 likes
This tweet -> 710117014656950272 has 1785 retweets and 5043 likes
This tweet -> 709918798883774466 has 996 retweets and 3238 likes
This tweet -> 709901256215666688 has 92 retweets and 610 likes
This tweet -> 709852847387627521 has 1092 retweets and 3245 likes
This tweet -> 709566166965075968 has 1078 retweets and 3260 likes
This tweet -> 709556954897764353 has 976 retweets and 3029 likes
This tweet -> 709519240576036864 has 219 retweets and 1389 likes
This tweet -> 709449600415961088 has 520 retweets and 2030 likes
This tweet -> 7094094

This tweet -> 700864154249383937 has 561 retweets and 2405 likes
This tweet -> 700847567345688576 has 453 retweets and 2283 likes
This tweet -> 700796979434098688 has 864 retweets and 2263 likes
This tweet -> 700747788515020802 has 8474 retweets and 21064 likes
This tweet -> 700518061187723268 has 723 retweets and 2401 likes
This tweet -> 700505138482569216 has 524 retweets and 2114 likes
This tweet -> 700462010979500032 has 1628 retweets and 3791 likes
This tweet -> 700167517596164096 has 657 retweets and 2439 likes
This tweet -> 700151421916807169 has 611 retweets and 2061 likes
This tweet -> 700143752053182464 has 2444 retweets and 6945 likes
This tweet -> 700062718104104960 has 615 retweets and 2454 likes
This tweet -> 700029284593901568 has 532 retweets and 1920 likes
This tweet -> 700002074055016451 has 1206 retweets and 3038 likes
This tweet -> 699801817392291840 has 852 retweets and 2780 likes
This tweet -> 699788877217865730 has 470 retweets and 2066 likes
This tweet -> 699779

This tweet -> 691483041324204033 has 502 retweets and 2208 likes
This tweet -> 691459709405118465 has 1028 retweets and 3743 likes
This tweet -> 691444869282295808 has 755 retweets and 2442 likes
This tweet -> 691416866452082688 has 6933 retweets and 17899 likes
This tweet -> 691321916024623104 has 602 retweets and 2392 likes
This tweet -> 691096613310316544 has 797 retweets and 2768 likes
This tweet -> 691090071332753408 has 307 retweets and 1606 likes
This tweet -> 690989312272396288 has 2621 retweets and 5457 likes
This tweet -> 690959652130045952 has 1120 retweets and 3326 likes
This tweet -> 690938899477221376 has 1757 retweets and 3781 likes
This tweet -> 690932576555528194 has 911 retweets and 3031 likes
This tweet -> 690735892932222976 has 1158 retweets and 3510 likes
This tweet -> 690728923253055490 has 469 retweets and 2009 likes
This tweet -> 690690673629138944 has 751 retweets and 2158 likes
This tweet -> 690649993829576704 has 250 retweets and 1202 likes
This tweet -> 6906

This tweet -> 684225744407494656 has 184 retweets and 1149 likes
This tweet -> 684222868335505415 has 1241 retweets and 3524 likes
This tweet -> 684200372118904832 has 946 retweets and 1988 likes
This tweet -> 684195085588783105 has 457 retweets and 1765 likes
This tweet -> 684188786104872960 has 1054 retweets and 3227 likes
This tweet -> 684177701129875456 has 594 retweets and 1867 likes
This tweet -> 684147889187209216 has 1376 retweets and 2748 likes
This tweet -> 684122891630342144 has 426 retweets and 1864 likes
This tweet -> 684097758874210310 has 1299 retweets and 3802 likes
This tweet -> 683857920510050305 has 1022 retweets and 3528 likes
This tweet -> 683852578183077888 has 316 retweets and 1797 likes
This tweet -> 683849932751646720 has 857 retweets and 2418 likes
This tweet -> 683834909291606017 has 1025 retweets and 2421 likes
This tweet -> 683828599284170753 has 973 retweets and 2589 likes
This tweet -> 683773439333797890 has 1221 retweets and 3079 likes
This tweet -> 6837

This tweet -> 678708137298427904 has 2194 retweets and 5137 likes
This tweet -> 678675843183484930 has 1321 retweets and 2608 likes
This tweet -> 678643457146150913 has 360 retweets and 1882 likes
This tweet -> 678446151570427904 has 1372 retweets and 3658 likes
This tweet -> 678424312106393600 has 2281 retweets and 5005 likes
This tweet -> 678410210315247616 has 1610 retweets and 3832 likes
This tweet -> 678399652199309312 has 28617 retweets and 73558 likes
This tweet -> 678396796259975168 has 375 retweets and 1460 likes
This tweet -> 678389028614488064 has 383 retweets and 1716 likes
This tweet -> 678380236862578688 has 822 retweets and 2249 likes
This tweet -> 678341075375947776 has 473 retweets and 1577 likes
This tweet -> 678334497360859136 has 224 retweets and 1184 likes
This tweet -> 678278586130948096 has 5497 retweets and 10426 likes
This tweet -> 678255464182861824 has 327 retweets and 1454 likes
This tweet -> 678023323247357953 has 341 retweets and 1745 likes
This tweet -> 6

This tweet -> 674790488185167872 has 219 retweets and 986 likes
This tweet -> 674788554665512960 has 181 retweets and 719 likes
This tweet -> 674781762103414784 has 1050 retweets and 1808 likes
This tweet -> 674774481756377088 has 412 retweets and 991 likes
This tweet -> 674767892831932416 has 692 retweets and 1634 likes
This tweet -> 674764817387900928 has 200 retweets and 700 likes
This tweet -> 674754018082705410 has 384 retweets and 1229 likes
This tweet -> 674752233200820224 has 402 retweets and 1314 likes
This tweet -> 674743008475090944 has 469 retweets and 1261 likes
This tweet -> 674742531037511680 has 45 retweets and 430 likes
This tweet -> 674739953134403584 has 327 retweets and 967 likes
This tweet -> 674737130913071104 has 80 retweets and 580 likes
This tweet -> 674690135443775488 has 413 retweets and 1027 likes
This tweet -> 674670581682434048 has 569 retweets and 1450 likes
This tweet -> 674664755118911488 has 205 retweets and 812 likes
This tweet -> 674646392044941312 h

This tweet -> 672231046314901505 has 876 retweets and 1624 likes
This tweet -> 672222792075620352 has 176 retweets and 703 likes
This tweet -> 672205392827572224 has 1010 retweets and 1994 likes
This tweet -> 672169685991993344 has 320 retweets and 908 likes
This tweet -> 672160042234327040 has 304 retweets and 766 likes
This tweet -> 672139350159835138 has 597 retweets and 1548 likes
This tweet -> 672125275208069120 has 993 retweets and 2120 likes
This tweet -> 672095186491711488 has 328 retweets and 879 likes
This tweet -> 672082170312290304 has 319 retweets and 840 likes
This tweet -> 672068090318987265 has 440 retweets and 1141 likes
This tweet -> 671896809300709376 has 3679 retweets and 7631 likes
This tweet -> 671891728106971137 has 484 retweets and 1176 likes
This tweet -> 671882082306625538 has 1215 retweets and 3117 likes
This tweet -> 671879137494245376 has 575 retweets and 1249 likes
This tweet -> 671874878652489728 has 479 retweets and 1107 likes
This tweet -> 6718663421826

This tweet -> 669993076832759809 has 70 retweets and 288 likes
This tweet -> 669972011175813120 has 134 retweets and 396 likes
This tweet -> 669970042633789440 has 43 retweets and 272 likes
This tweet -> 669942763794931712 has 133 retweets and 445 likes
This tweet -> 669926384437997569 has 81 retweets and 339 likes
This tweet -> 669923323644657664 has 50 retweets and 202 likes
This tweet -> 669753178989142016 has 341 retweets and 710 likes
This tweet -> 669749430875258880 has 55 retweets and 229 likes
This tweet -> 669684865554620416 has 75 retweets and 448 likes
This tweet -> 669683899023405056 has 94 retweets and 339 likes
This tweet -> 669682095984410625 has 116 retweets and 312 likes
This tweet -> 669680153564442624 has 245 retweets and 588 likes
This tweet -> 669661792646373376 has 367 retweets and 715 likes
This tweet -> 669625907762618368 has 1573 retweets and 3130 likes
This tweet -> 669603084620980224 has 316 retweets and 846 likes
This tweet -> 669597912108789760 has 133 retw

This tweet -> 667502640335572993 has 194 retweets and 485 likes
This tweet -> 667495797102141441 has 232 retweets and 458 likes
This tweet -> 667491009379606528 has 190 retweets and 459 likes
This tweet -> 667470559035432960 has 85 retweets and 225 likes
This tweet -> 667455448082227200 has 53 retweets and 171 likes
This tweet -> 667453023279554560 has 74 retweets and 279 likes
This tweet -> 667443425659232256 has 490 retweets and 682 likes
This tweet -> 667437278097252352 has 199 retweets and 396 likes
This tweet -> 667435689202614272 has 76 retweets and 266 likes
This tweet -> 667405339315146752 has 197 retweets and 412 likes
This tweet -> 667393430834667520 has 50 retweets and 168 likes
This tweet -> 667369227918143488 has 145 retweets and 328 likes
This tweet -> 667211855547486208 has 210 retweets and 425 likes
This tweet -> 667200525029539841 has 228 retweets and 546 likes
This tweet -> 667192066997374976 has 86 retweets and 332 likes
This tweet -> 667188689915760640 has 333 retwe

[{'tweet_id': 888202515573088257},
 {'tweet_id': 873697596434513921},
 {'tweet_id': 872668790621863937},
 {'tweet_id': 872261713294495745},
 {'tweet_id': 869988702071779329},
 {'tweet_id': 866816280283807744},
 {'tweet_id': 861769973181624320},
 {'tweet_id': 856602993587888130},
 {'tweet_id': 856330835276025856},
 {'tweet_id': 851953902622658560},
 {'tweet_id': 851861385021730816},
 {'tweet_id': 845459076796616705},
 {'tweet_id': 844704788403113984},
 {'tweet_id': 842892208864923648},
 {'tweet_id': 837366284874571778},
 {'tweet_id': 837012587749474308},
 {'tweet_id': 829374341691346946},
 {'tweet_id': 827228250799742977},
 {'tweet_id': 812747805718642688},
 {'tweet_id': 802247111496568832},
 {'tweet_id': 779123168116150273},
 {'tweet_id': 775096608509886464},
 {'tweet_id': 771004394259247104},
 {'tweet_id': 770743923962707968},
 {'tweet_id': 766864461642756096},
 {'tweet_id': 759923798737051648},
 {'tweet_id': 759566828574212096},
 {'tweet_id': 754011816964026368},
 {'tweet_id': 699370

* We will then read the data from `tweet_json.txt` using either one of Pandas' `read_csv()` or `read_table()` methods. I will specify the column tags using the `names` parameter.

In [12]:
df_tw_data = open_set('data/tweet_json.txt', names=['tweet_id', 'retweet_count', 'favorite_count', 'geo_data', 'lang_data'], header=None)

In [13]:
df_tw_data

Unnamed: 0,tweet_id,retweet_count,favorite_count,geo_data,lang_data
0,892420643555336193,7010,33829,,en
1,892177421306343426,5301,29340,,en
2,891815181378084864,3482,22070,,en
3,891689557279858688,7228,36953,,en
4,891327558926688256,7765,35324,,en
...,...,...,...,...,...
2320,666049248165822465,37,89,,en
2321,666044226329800704,115,247,,en
2322,666033412701032449,36,100,,en
2323,666029285002620928,39,112,,en


----

### Assess

#### `df_tw_arch` dataset

##### Visual Assessment
----

##### Programmatic Assessment
----

In [243]:
df_tw_arch.head()

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,
1,892177421306343426,,,2017-08-01 00:17:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Tilly. She's just checking pup on you....,,,,https://twitter.com/dog_rates/status/892177421...,13,10,Tilly,,,,
2,891815181378084864,,,2017-07-31 00:18:03 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Archie. He is a rare Norwegian Pouncin...,,,,https://twitter.com/dog_rates/status/891815181...,12,10,Archie,,,,
3,891689557279858688,,,2017-07-30 15:58:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Darla. She commenced a snooze mid meal...,,,,https://twitter.com/dog_rates/status/891689557...,13,10,Darla,,,,
4,891327558926688256,,,2017-07-29 16:00:24 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Franklin. He would like you to stop ca...,,,,https://twitter.com/dog_rates/status/891327558...,12,10,Franklin,,,,


In [244]:
df_tw_arch.head()

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,
1,892177421306343426,,,2017-08-01 00:17:27 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Tilly. She's just checking pup on you....,,,,https://twitter.com/dog_rates/status/892177421...,13,10,Tilly,,,,
2,891815181378084864,,,2017-07-31 00:18:03 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Archie. He is a rare Norwegian Pouncin...,,,,https://twitter.com/dog_rates/status/891815181...,12,10,Archie,,,,
3,891689557279858688,,,2017-07-30 15:58:51 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Darla. She commenced a snooze mid meal...,,,,https://twitter.com/dog_rates/status/891689557...,13,10,Darla,,,,
4,891327558926688256,,,2017-07-29 16:00:24 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Franklin. He would like you to stop ca...,,,,https://twitter.com/dog_rates/status/891327558...,12,10,Franklin,,,,


In [275]:
df_tw_arch.dtypes

tweet_id                        int64
in_reply_to_status_id         float64
in_reply_to_user_id           float64
timestamp                      object
source                         object
text                           object
retweeted_status_id           float64
retweeted_status_user_id      float64
retweeted_status_timestamp     object
expanded_urls                  object
rating_numerator                int64
rating_denominator              int64
name                           object
doggo                          object
floofer                        object
pupper                         object
puppo                          object
dtype: object

In [None]:
json.load('tweet-')