# Project Motivation

The goal is to wrangle WeRateDogs Twitter data to create interesting and trustworthy analyses and visualizations. The Twitter archive is great, but it only contains very basic tweet information. Additional gathering, then assessing and cleaning is required for "Wow!"-worthy analyses and visualizations.

In [3]:
import pandas as pd
import numpy as np
import requests
import tweepy
import json

In [2]:
json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])

'["foo", {"bar": ["baz", null, 1.0, 2]}]'

## Gathering Data for the project

The twitter-archive-enhanced.csv file was provided. Here I am loading the file into a pandas dataframe.

In [4]:
twitter_archive = pd.read_csv('twitter-archive-enhanced.csv')

In [6]:
twitter_archive.sample()

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
789,773985732834758656,,,2016-09-08 20:45:53 +0000,"<a href=""http://twitter.com/download/iphone"" r...",Meet Winnie. She just made awkward eye contact...,,,,https://twitter.com/dog_rates/status/773985732...,11,10,Winnie,,,pupper,


The tweet image predictions, i.e., what breed of dog (or other object, animal, etc.) is present in each tweet according to a neural network. This file (image_predictions.tsv) is hosted on Udacity's servers and should be downloaded programmatically using the Requests library and the following URL: https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv

In [8]:
prediction_file_url = 'https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv'

In [10]:
response = requests.get(prediction_file_url)
with open('prediction_file.tsv', 'wb') as file:
    file.write(response.content)

In [11]:
image_predictions = pd.read_csv('prediction_file.tsv', sep='\t')

In [12]:
image_predictions.sample()

Unnamed: 0,tweet_id,jpg_url,img_num,p1,p1_conf,p1_dog,p2,p2_conf,p2_dog,p3,p3_conf,p3_dog
588,679111216690831360,https://pbs.twimg.com/ext_tw_video_thumb/67911...,1,kelpie,0.189423,True,beagle,0.121988,True,basset,0.121171,True


Each tweet's retweet count and favorite ("like") count at minimum, and any additional data you find interesting. Using the tweet IDs in the WeRateDogs Twitter archive, query the Twitter API for each tweet's JSON data using Python's Tweepy library and store each tweet's entire set of JSON data in a file called tweet_json.txt file. Each tweet's JSON data should be written to its own line. Then read this .txt file line by line into a pandas DataFrame with (at minimum) tweet ID, retweet count, and favorite count. Note: do not include your Twitter API keys, secrets, and tokens in your project submission.

In [65]:
import tweepy
from settings import *
import json

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)

api = tweepy.API(auth)

In [66]:
tweet = api.get_status(1263258342325182464, tweet_mode='extended')
print(tweet._json)


{'created_at': 'Thu May 21 00:00:33 +0000 2020', 'id': 1263258342325182464, 'id_str': '1263258342325182464', 'full_text': "You may have noticed that we're getting less snow in winter as the climate warms. But how much? https://t.co/5ACAhGsVxr", 'truncated': False, 'display_text_range': [0, 119], 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [{'url': 'https://t.co/5ACAhGsVxr', 'expanded_url': 'http://cbc.ca/1.5577270', 'display_url': 'cbc.ca/1.5577270', 'indices': [96, 119]}]}, 'source': '<a href="https://buffer.com" rel="nofollow">Buffer</a>', 'in_reply_to_status_id': None, 'in_reply_to_status_id_str': None, 'in_reply_to_user_id': None, 'in_reply_to_user_id_str': None, 'in_reply_to_screen_name': None, 'user': {'id': 6433472, 'id_str': '6433472', 'name': 'CBC News', 'screen_name': 'CBCNews', 'location': 'Canada', 'description': 'Canadian breaking news and analysis from CBCNews.ca, TV and radio.', 'url': 'http://t.co/ZLZObOnET1', 'entities': {'url': {'urls': [{

In [67]:
with open('tweet.txt', 'a') as outfile:
    json.dump(tweet._json, outfile)
    outfile.write('\n')
    

In [68]:
df = pd.read_json('tweet.txt', lines=True)

In [69]:
df

Unnamed: 0,created_at,id,id_str,full_text,truncated,display_text_range,entities,source,in_reply_to_status_id,in_reply_to_status_id_str,...,place,contributors,is_quote_status,retweet_count,favorite_count,favorited,retweeted,possibly_sensitive,possibly_sensitive_appealable,lang
0,2020-05-21 00:00:33+00:00,1263258342325182464,1263258342325182464,You may have noticed that we're getting less s...,False,"[0, 119]","{'hashtags': [], 'symbols': [], 'user_mentions...","<a href=""https://buffer.com"" rel=""nofollow"">Bu...",,,...,,,False,89,202,False,False,False,False,en
1,2020-05-21 00:00:33+00:00,1263258342325182464,1263258342325182464,You may have noticed that we're getting less s...,False,"[0, 119]","{'hashtags': [], 'symbols': [], 'user_mentions...","<a href=""https://buffer.com"" rel=""nofollow"">Bu...",,,...,,,False,89,202,False,False,False,False,en
2,2020-05-21 00:00:33+00:00,1263258342325182464,1263258342325182464,You may have noticed that we're getting less s...,False,"[0, 119]","{'hashtags': [], 'symbols': [], 'user_mentions...","<a href=""https://buffer.com"" rel=""nofollow"">Bu...",,,...,,,False,94,218,False,False,False,False,en
