# Data Wrangling Template

In [1]:
import pandas as pd
import numpy as np
import requests
import tweepy
import json

## Key Points

* You only want `original ratings` (no retweets) that have images. Though there are 5000+ tweets in the dataset, not all are dog ratings and some are retweets.
* Assessing and cleaning the entire dataset completely would require a lot of time, and is not necessary to practice and demonstrate your skills in data wrangling. Therefore, the requirements of this project are only to assess and clean at least `8 quality issues` and at least `2 tidiness issues` in this dataset.
* Cleaning includes merging individual pieces of data according to the rules of tidy data.
* The fact that the rating numerators are greater than the denominators does not need to be cleaned. This unique rating system is a big part of the popularity of WeRateDogs.
* You do not need to gather the tweets beyond `August 1st, 2017`. You can, but note that you won't be able to gather the image predictions for these tweets since you don't have access to the algorithm used.

## Data Source

* `twitter_archive_enhanced.csv` --> download locally 
* `image_predictions.tsv`--> downloaded programmatically using the Requests library and the following URL: https://d17h27t6h515a5.cloudfront.net/topher/2017/August/599fd2ad_image-predictions/image-predictions.tsv 
* `tweet_json.txt` -->  query the Twitter API for each tweet's JSON data using Python's Tweepy library and store each tweet's entire set of JSON data in a file called tweet_json.txt file. Each tweet's JSON data should be written to its own line. Then read this .txt file line by line into a pandas DataFrame with **(at minimum) tweet ID, retweet count, and favorite count**. Note: do not include your Twitter API keys, secrets, and tokens in your project submission.
    * moosial@gmx.de (Moosial@Moosial2) 
    * Kw310373
    * Consumer API keys
        * Dev Name: Moosial
        * XQR1HOWwQQPHKUle9iMrranzE (API key)
        * Q5bx16nOaJ4pwB9Zzd0nIwgoQsYnE766Zt7rRLDn7rEmYldRhi (API secret key)
    * Access token & access token secret
        * 756561302395420672-M3qYKALNJWwRQbh6alzKvmzYWpBaXVn (Access token)
        * n6lWY11FMEsbKeUrlkZIXYKfUhg4D5AHxzAFaDdcmH5GK (Access token secret)

### Test Twitter

In [37]:
# read the secrets 
access_data = pd.read_csv('tweety_auth.csv')

In [40]:
#
twitter_archive = pd.read_csv('twitter-archive-enhanced.csv')

In [41]:
consumer_key = access_data.consumer_key.iloc[0]
consumer_secret = access_data.consumer_secret.iloc[0]
access_token = access_data.access_token.iloc[0]
access_secret = access_data.access_secret.iloc[0]
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)

In [42]:
test = api.get_status(twitter_archive.tweet_id.iloc[0], tweet_mode='extended')

In [44]:
print('Test is sucessful, returned ID: {}'.format(test.id));

Test is sucessful, returned ID: 892420643555336193


## Gather

In [8]:
# 


## Assess

In [36]:
twitter_archive.head(1)

Unnamed: 0,tweet_id,in_reply_to_status_id,in_reply_to_user_id,timestamp,source,text,retweeted_status_id,retweeted_status_user_id,retweeted_status_timestamp,expanded_urls,rating_numerator,rating_denominator,name,doggo,floofer,pupper,puppo
0,892420643555336193,,,2017-08-01 16:23:56 +0000,"<a href=""http://twitter.com/download/iphone"" r...",This is Phineas. He's a mystical boy. Only eve...,,,,https://twitter.com/dog_rates/status/892420643...,13,10,Phineas,,,,


## Clean

#### Define

#### Code

#### Test

## Analyze and visualize your wrangled data

At least `three (3) insights` and one `(1) visualization` must be produced.