# Guided Project: API and Web Data Scraping
## Part 1: API
- I started the project by looking for an API in [Public APIs](https://github.com/toddmotto/public-apis) and I selected the twitter API.
- Went to Twitter and created a developer account
- I setup a new app in twitter and got my credentials
- In order to keep my credentials safe, I created the following documents:
  - a file called .env in visual code were I created variables to assign tockens to
  - a file called .gitignore in visual code were I included the .env file to "protect" the tokens and for this info to not be uploaded to GitHub 
  - a file called loadCredentials.py to read the .env file
- With the files above created, I imported tweepy and loaded my credentials to the jupyter notebook

In [1]:
import tweepy

In [2]:
from loadCredentials import loadCredentials

cred = loadCredentials(["TWITTER_API_KEY","TWITTER_API_SECRET","TWITTER_ACCESS_TOKEN","TWITTER_ACCESS_TOKEN_SECRET"])
auth = tweepy.OAuthHandler(cred["TWITTER_API_KEY"], cred["TWITTER_API_SECRET"])
auth.set_access_token(cred["TWITTER_ACCESS_TOKEN"], cred["TWITTER_ACCESS_TOKEN_SECRET"])
api = tweepy.API(auth)

- I consulted the twitter API for my personal account information using the me method
- I imported pandas and json
- I created a data frame of the information in my account

In [3]:
mytw = api.me()

In [4]:
import pandas as pd
from pandas.io.json import json_normalize

In [5]:
mytw = api.me()
mytwit = pd.DataFrame([pd.Series(mytw._json)])
mytwit

Unnamed: 0,id,id_str,name,screen_name,location,profile_location,description,url,entities,protected,...,profile_use_background_image,has_extended_profile,default_profile,default_profile_image,following,follow_request_sent,notifications,translator_type,suspended,needs_phone_verification
0,360391229,360391229,Maris Font,marisfont,Miami,,,,{'description': {'urls': []}},True,...,True,False,True,False,False,False,False,none,False,False


- I consulted the twitter API for tweets containing "friyay" using the search method
- I created a data frame of the tweets that have "friyay" on them
- Since the data frame was huge, I printed all the columns to figure out what to work with

In [6]:
friyay = api.search("friyay")

In [7]:
tweets = pd.DataFrame([pd.Series(tweet._json) for tweet in friyay])
print(type(tweets))
tweets.head()

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,created_at,id,id_str,text,truncated,entities,metadata,source,in_reply_to_status_id,in_reply_to_status_id_str,...,contributors,is_quote_status,retweet_count,favorite_count,favorited,retweeted,possibly_sensitive,lang,retweeted_status,extended_entities
0,Fri Nov 02 15:13:01 +0000 2018,1058376438515142657,1058376438515142657,Finishing the week strong with a 5-star review...,True,"{'hashtags': [], 'symbols': [], 'user_mentions...","{'iso_language_code': 'en', 'result_type': 're...","<a href=""http://twitter.com"" rel=""nofollow"">Tw...",,,...,,False,0,0,False,False,False,en,,
1,Fri Nov 02 15:12:56 +0000 2018,1058376416402624513,1058376416402624513,FRIYAY 🍿,False,"{'hashtags': [], 'symbols': [], 'user_mentions...","{'iso_language_code': 'en', 'result_type': 're...","<a href=""http://twitter.com/download/iphone"" r...",,,...,,False,0,0,False,False,,en,,
2,Fri Nov 02 15:12:48 +0000 2018,1058376386061180928,1058376386061180928,RT @BerlitzLanguage: See y'all on Monday Learn...,False,"{'hashtags': [{'text': 'friyay', 'indices': [5...","{'iso_language_code': 'en', 'result_type': 're...","<a href=""http://twitter.com/download/android"" ...",,,...,,False,2,0,False,False,,en,{'created_at': 'Fri Nov 02 15:11:22 +0000 2018...,
3,Fri Nov 02 15:12:46 +0000 2018,1058376376775008257,1058376376775008257,because 2 drinks are better than one and its #...,False,"{'hashtags': [{'text': 'friyay', 'indices': [4...","{'iso_language_code': 'en', 'result_type': 're...","<a href=""http://twitter.com/download/iphone"" r...",,,...,,False,0,0,False,False,False,en,,"{'media': [{'id': 1058376368206069762, 'id_str..."
4,Fri Nov 02 15:12:45 +0000 2018,1058376371980910597,1058376371980910597,RT @MistralKDawn: Will Petri be able to #prote...,False,"{'hashtags': [{'text': 'protect', 'indices': [...","{'iso_language_code': 'en', 'result_type': 're...","<a href=""http://twitter.com/download/android"" ...",,,...,,False,5,0,False,False,False,en,{'created_at': 'Fri Nov 02 11:45:01 +0000 2018...,


In [8]:
tweets.columns

Index(['created_at', 'id', 'id_str', 'text', 'truncated', 'entities',
       'metadata', 'source', 'in_reply_to_status_id',
       'in_reply_to_status_id_str', 'in_reply_to_user_id',
       'in_reply_to_user_id_str', 'in_reply_to_screen_name', 'user', 'geo',
       'coordinates', 'place', 'contributors', 'is_quote_status',
       'retweet_count', 'favorite_count', 'favorited', 'retweeted',
       'possibly_sensitive', 'lang', 'retweeted_status', 'extended_entities'],
      dtype='object')

- In order to get the 'user' I had to first normalize the column using json_normalize
  - I tried normalizing 'entities' several times, but since it was nested more than two times, the method we currently know did not work. I tried using other tools, but continusly failed.
  - I wanted to normalize 'entities' because I wanted to influde the 'hashtags' information on my data frame. 
- With this, I created a second Data Frame with the 'user', 'id', 'text', and 'retweet_count' columns to understand which were the most popular tweets
- I as well sorted the data frame by 'retweet_count' to see which were the most popular tweets
- Then I proceeded to export the output as .csv

In [9]:
tweets['user'] = json_normalize(tweets['user'])['screen_name']

In [10]:
tweets_final = tweets[['user','id','text','retweet_count']].sort_values(['retweet_count'],ascending=[0])
print(type(tweets_final))
tweets_final

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,user,id,text,retweet_count
5,sethsonika___,1058376323825946624,RT @AliQuliMirzaAQM: Friday is now FRIYAY! Lov...,38
6,Jessicasquared9,1058376260286627840,RT @HighheelsDes: It’s Friday..\nTime to make ...,16
4,AuthorMichael57,1058376371980910597,RT @MistralKDawn: Will Petri be able to #prote...,5
2,nkay_yo,1058376386061180928,RT @BerlitzLanguage: See y'all on Monday Learn...,2
7,AbjWeekendVibe,1058376243110907904,RT @BerlitzLanguage: See y'all on Monday Learn...,2
10,RonicaDladloti,1058376213302075393,RT @ForRadioLovers: @tshiamo_nkocee @DbnNytsSA...,2
8,smileyscorpius,1058376241609367552,RT @MissIssyClifton: Listening to @HamiltonWes...,1
0,RentItNetwork,1058376438515142657,Finishing the week strong with a 5-star review...,0
1,Giaaa__n,1058376416402624513,FRIYAY 🍿,0
3,LucysCantinaNYC,1058376376775008257,because 2 drinks are better than one and its #...,0


In [11]:
# tweets_final.to_csv('output/API.csv', index=False)