Key Points
Key points to keep in mind when data wrangling for this project:

- You only want original ratings (no retweets) that have images. Though there are 5000+ tweets in the dataset, not all are dog ratings and some are retweets.
- Assessing and cleaning the entire dataset completely would require a lot of time, and is not necessary to practice and demonstrate your skills in data wrangling. Therefore, the requirements of this project are only to assess and clean at least 8 quality issues and at least 2 tidiness issues in this dataset.
- Cleaning includes merging individual pieces of data according to the rules of tidy data.
- The fact that the rating numerators are greater than the denominators does not need to be cleaned. This unique rating system is a big part of the popularity of WeRateDogs.
- You do not need to gather the tweets beyond August 1st, 2017. You can, but note that you won't be able to gather the image predictions for these tweets since you don't have access to the algorithm used.

In [21]:
# data gathering imports
import os
import requests
import time
import tweepy
import json

# standard data manipulation libraries
import pandas as pd
import numpy as np
import matplotlib as plt
import seaborn as sns

## Data Gathering

In [15]:
# import WeRateDogs twitter archive (provided by Udacity)
archive = pd.read_csv('twitter-archive-enhanced.csv')

# download Udacity's tweet image predictions
preds = requests.get('https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv',
                 auth=('user', 'pass'))

In [18]:
# save tweet IDs to a variable
tweet_id = archive['tweet_id']

In [8]:
# Create API object to gather twitter data

consumer_key = os.environ.get("CONSUMER_KEY")
consumer_secret = os.environ.get("CONSUMER_SECRET")
access_token = os.environ.get("ACCESS_TOKEN")
access_secret = os.environ.get("ACCESS_SECRET")

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth,
                 wait_on_rate_limit = True,
                 wait_on_rate_limit_notify = True,)

# Using the tweet IDs in the WeRateDogs Twitter archive, 
# query the Twitter API for each tweet's JSON data
tweet = api.get_status(tweet_id, tweet_mode='extended')
print(tweet.text)

# Write JSON to txt file
data = {}
