# Guided Project: API and Web Data Scraping
## Part 1: API
- I started the project by looking for an API in [Public APIs](https://github.com/toddmotto/public-apis) and I selected the twitter API.
- Went to Twitter and created a developer account
- I setup a new app in twitter and got my credentials
- In order to keep my credentials safe, I created the following documents:
  - a file called .env in visual code were I created variables to assign tockens to
  - a file called .gitignore in visual code were I included the .env file to "protect" the tokens and for this info to not be uploaded to GitHub 
  - a file called loadCredentials.py to read the .env file
- With the files above created, I imported tweepy and loaded my credentials to the jupyter notebook

In [1]:
import tweepy

In [2]:
from loadCredentials import loadCredentials

cred = loadCredentials(["TWITTER_API_KEY","TWITTER_API_SECRET","TWITTER_ACCESS_TOKEN","TWITTER_ACCESS_TOKEN_SECRET"])
auth = tweepy.OAuthHandler(cred["TWITTER_API_KEY"], cred["TWITTER_API_SECRET"])
auth.set_access_token(cred["TWITTER_ACCESS_TOKEN"], cred["TWITTER_ACCESS_TOKEN_SECRET"])
api = tweepy.API(auth)

- I consulted the twitter API for my personal account information using the me method
- I imported pandas and json
- I created a data frame of the information in my account

In [3]:
mytw = api.me()

In [4]:
import pandas as pd
from pandas.io.json import json_normalize

In [5]:
mytw = api.me()
mytwit = pd.DataFrame([pd.Series(mytw._json)])
mytwit

Unnamed: 0,id,id_str,name,screen_name,location,profile_location,description,url,entities,protected,...,profile_use_background_image,has_extended_profile,default_profile,default_profile_image,following,follow_request_sent,notifications,translator_type,suspended,needs_phone_verification
0,360391229,360391229,Maris Font,marisfont,Miami,,,,{'description': {'urls': []}},True,...,True,False,True,False,False,False,False,none,False,False


- I consulted the twitter API for tweets containing "friyay" using the search method
- I created a data frame of the tweets that have "friyay" on them
- Since the data frame was huge, I printed all the columns to figure out what to work with

In [6]:
friyay = api.search("friyay")

In [7]:
tweets = pd.DataFrame([pd.Series(tweet._json) for tweet in friyay])
print(type(tweets))
tweets.head()

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,created_at,id,id_str,text,truncated,entities,metadata,source,in_reply_to_status_id,in_reply_to_status_id_str,...,contributors,is_quote_status,retweet_count,favorite_count,favorited,retweeted,lang,retweeted_status,extended_entities,possibly_sensitive
0,Fri Nov 02 15:09:39 +0000 2018,1058375591299174400,1058375591299174400,@radenrauf Thanks god its friyay,False,"{'hashtags': [], 'symbols': [], 'user_mentions...","{'iso_language_code': 'en', 'result_type': 're...","<a href=""http://twitter.com/download/android"" ...",1.05834e+18,1.0583395304182088e+18,...,,False,0,0,False,False,en,,,
1,Fri Nov 02 15:09:37 +0000 2018,1058375584710045696,1058375584710045696,RT @AliQuliMirzaAQM: Friday is now FRIYAY! Lov...,False,"{'hashtags': [{'text': 'AamirOnKBC', 'indices'...","{'iso_language_code': 'en', 'result_type': 're...","<a href=""http://twitter.com/download/android"" ...",,,...,,False,29,0,False,False,en,{'created_at': 'Fri Nov 02 14:57:43 +0000 2018...,,
2,Fri Nov 02 15:09:25 +0000 2018,1058375534655234049,1058375534655234049,RT @SharkMontauk: Happy #friyay friends! 😍🦈🎉 h...,False,"{'hashtags': [{'text': 'friyay', 'indices': [2...","{'iso_language_code': 'en', 'result_type': 're...","<a href=""http://twitter.com"" rel=""nofollow"">Tw...",,,...,,False,16,0,False,False,en,{'created_at': 'Fri Nov 02 11:32:43 +0000 2018...,"{'media': [{'id': 1058320610282024960, 'id_str...",False
3,Fri Nov 02 15:09:25 +0000 2018,1058375531639529472,1058375531639529472,Lucah is a handsome fellow. He has a large col...,False,"{'hashtags': [{'text': 'friyay', 'indices': [6...","{'iso_language_code': 'en', 'result_type': 're...","<a href=""http://twitter.com"" rel=""nofollow"">Tw...",,,...,,False,0,0,False,False,en,,"{'media': [{'id': 1058374764958441473, 'id_str...",False
4,Fri Nov 02 15:09:24 +0000 2018,1058375529362059264,1058375529362059264,RT @TeamTillett: Blimey!!!! That's a HUGE Till...,False,"{'hashtags': [], 'symbols': [], 'user_mentions...","{'iso_language_code': 'en', 'result_type': 're...","<a href=""http://twitter.com/download/iphone"" r...",,,...,,False,76,0,False,False,en,{'created_at': 'Fri Nov 02 12:34:23 +0000 2018...,,


In [8]:
tweets.columns

Index(['created_at', 'id', 'id_str', 'text', 'truncated', 'entities',
       'metadata', 'source', 'in_reply_to_status_id',
       'in_reply_to_status_id_str', 'in_reply_to_user_id',
       'in_reply_to_user_id_str', 'in_reply_to_screen_name', 'user', 'geo',
       'coordinates', 'place', 'contributors', 'is_quote_status',
       'retweet_count', 'favorite_count', 'favorited', 'retweeted', 'lang',
       'retweeted_status', 'extended_entities', 'possibly_sensitive'],
      dtype='object')

- In order to get the 'user' I had to first normalize the column using json_normalize
  - I tried normalizing 'entities' several times, but since it was nested more than two times, the method we currently know did not work. I tried using other tools, but continusly failed.
  - I wanted to normalize 'entities' because I wanted to influde the 'hashtags' information on my data frame. 
- With this, I created a second Data Frame with the 'user', 'id', 'text', and 'retweet_count' columns to understand which were the most popular tweets
- I as well sorted the data frame by 'retweet_count' to see which were the most popular tweets
- Then I proceeded to export the output as .csv

In [9]:
tweets['user'] = json_normalize(tweets['user'])['screen_name']

In [10]:
tweets_final = tweets[['user','id','text','retweet_count']].sort_values(['retweet_count'],ascending=[0])
print(type(tweets_final))
tweets_final

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,user,id,text,retweet_count
4,fr0styfr0sty,1058375529362059264,RT @TeamTillett: Blimey!!!! That's a HUGE Till...,76
1,Rashid8Patel,1058375584710045696,RT @AliQuliMirzaAQM: Friday is now FRIYAY! Lov...,29
9,imsaurav55,1058375446302085120,RT @AliQuliMirzaAQM: Friday is now FRIYAY! Lov...,29
2,WhaleSharkRocky,1058375534655234049,RT @SharkMontauk: Happy #friyay friends! 😍🦈🎉 h...,16
14,HHSRegion5,1058375380992778241,RT @HealthCareGov: Happy #Friyay! Whatever you...,5
5,MayKingTea,1058375516670046208,RT @Irish_IreneB: @MayKingTea @ZalkaB @GenePet...,1
6,olamideolakun10,1058375511053885441,RT @powersoftonline: I come in different shape...,1
7,TheSwansonSquad,1058375476866174981,RT @IvyHillAP_MrsJ: Hearing our students say t...,1
10,shyamranjan93,1058375432272142338,RT @officialluteri1: Hey people Friday is now ...,1
0,larasrizky3,1058375591299174400,@radenrauf Thanks god its friyay,0


In [11]:
# tweets_final.to_csv('output/API.csv', index=False)