# Notebook 2: Requesting information

After getting all the access token as well as refreshing the token, we started requesting information for our analysis. Just to remind, our four goals are to find out which are the top twenty friends that like our post the most, demographic for places we have been tagged, reactions for every Trump's post, and lastly, the events on Facebook. My partner and I divide the  tasks in half and each of us worked on each half in order to save times. I started working on getting top friends that like our post the most and the places that we have beend tagged and my partner worked on the other half. 

Before starting getting the information, we played around with the Graph API Explorer to see what kind of information we can get from Facebook. Then, we started with the first question by using python to get the list of photo and see what kind of format it looked like. We used GET method to get the list of photo by parsing the url "https://graph.facebook.com/me?fields=photos.limit(200)". We would then decode the result and convert it into json. As soon as it was in json, we created a list that contain or the photo ids so that we could get number of reactions out of them. This was due to the fact that by parsing only the post id, we can get any type of information such as likes, reactions, and even comments over the Facebook Graph API. 

After creating a list, we then started to parse in each item to the list(which are the photo ids) into a GET request to obtain the reaction types. We then struggled with counting the total number of likes for each friend for every photo. Therefore, we had to stop for a moment and design an algorithm that could count total number of likes from each friend. Eventually, we figured that out and created a dictionary in which for every friend, it shows the total number of likes that they give for evey of our photo. Lastly, we wrapped up the first question by generate a dictionary that contains top friends who like our post the most. Furthermore, we also imported the dictionary into dataframe then into csv to prepared for the third notebook.

As explained in Notebook 1: 

Facebook does not provide a way to refresh their tokens once expired. To get a new token the login flow must be followed again to obtain a short-lived token which needs to be exhanged, once again, for a long lived token. This is expressed in the facebook documentations as follows:

"Even the long-lived access token will eventually expire. At any point, you can generate a new long-lived token by sending the person back to the login flow used by your web app - note that the person will not actually need to login again, they have already authorized your app, so they will immediately redirect back to your app from the login flow with a refreshed token" 


For this notebook, we are using the long-lived token which lasts for over 2 months. 

In [3]:
import requests
import importlib
import json
import pandas as pd
import keys_project
importlib.reload(keys_project)
keychain = keys_project.keychain

d={}
d['access_token']=keychain['facebook']['access_token'] # Getting the long-lived access token

Below are all of the helper functions that we have used. The return type of a response from the graph api is not easy to parse and hence we convert all repsonses to JSON. The other functions are supplementing our data requests and modifications as described in the program level docs. 

In [5]:
def response_to_json(response):
    '''
    This function converts the response into json format
    Parameter:
        response: the request response to convert to json
    Return: 
        the response in json
    
    ''' 
    string_response = response.content.decode('utf-8') #decoding the response to string
    return json.loads(string_response) # converting the string to json

def get_reaction_count(object_id,reaction_type):
    '''
    This function gets the total reactions for each post 
    Parameter:
        object_id: the id of the object to get reaction data
        reaction_type: the reaction_type to retrieve from NONE, LIKE, LOVE, WOW, HAHA, SAD, ANGRY, THANKFUL
    Return: 
        the number of reactions on the request object of type reaction_type
    '''
    request_url="https://graph.facebook.com/"+str(object_id)+\
                           "/reactions?summary=true&type="+reaction_type # getting reaction summary data

    response= requests.get(request_url,params=d)
    response_json=response_to_json(response)
    return response_json['summary']['total_count'] #getting the count for reaction reaction_type

def most_frequent(myDict,number_top):
    '''
    This function creates a dictionary which includes the friend's name and the number of likes
    Parameter:
        myDict: A dictionary with the key as facebook friend's name and value of the number of times they liked the upload type
        number_top: The number of top friends who have made likes
    Return: 
        A dictionary of the top 20 friends
    
    '''
    
    # Frequency for top 20 people who like your upload_type
    value = []

    for key in myDict:
        value.append(myDict[key])
    value = sorted(value,reverse=True)
    values = value[0:number_top]
    most_liked_Dict = {}
    
    for key in myDict:
        if myDict[key] in values:
            most_liked_Dict[key] = myDict[key]
            
    return most_liked_Dict

def feed_(feed_id):
    '''
    This function get the feed data from Facebook
    Parameter:
        feed_id:the id of the feed in string
    Return: 
        a dictionary of feed data
    '''
    
    
    request_url="https://graph.facebook.com/"+feed_id+\
    "?fields=type,name,created_time,status_type,shares" #creating the url based on the feed_id
    
    response= requests.get(request_url,params=d)
    response_json=response_to_json(response)
    
    return response_json

def to_csv(filename,df):
    '''
    This function creates a CSV file. It exports data from a pandas dataframe to the file. 
    
    Parameters: 
        String of filename desired, pandas dataframe
    Returns: 
        None
    '''
    df.to_csv(filename,encoding='utf-8') # exporting to a csv file
    
    

Last but not least, we imported the dictionary into csv file for later analysis in Notebook 3. This question took us quite long time. However, the questions later on were pretty straightforward and similar to this question.

### Question: Getting the number of facebook reactions of each reaction type for a particular upload type. 

This function takes a user_id which can be any facebook user or page, a limit which is the number of upload types we want to check for and upload type which are facebook ulpoad objects such as pictures or posts. By offereing these paramteres, we offer flexibity on the kind of data recieved. Inititally, we used Facebook's Graph API explorer to test our requests. The link to the explorer is : https://developers.facebook.com/tools/explorer/. 

In the facebook graph, information is composed in the following format: 

1. nodes: "things" such as a User, a Photo, a Page, a Comment
2. edges: the connections between those "things", such as a Page's Photos, or a Photo's Comments
3. fields:info about those "things", such as a person's birthday, or the name of a Page

Understanding how to query all three of these parts of the social graph were important in obtaining good data. For this question, we had to first had to get a 'User' or 'Page' node. From which we had to query the user's edges to find its ulploads (posts or photos). Once we got the ID asscoiated with each edge, we used the fields of those edges to get reaction counts. 

For our anaylasis, get the reaction counts for Donald Trump and Hillary Clinton to compare their social media presence and following.  

For each of our questions, we also had to modify our JSON response to clear it of noise and get it in the format to be accepted by a pandas dataframe


In [6]:
def reaction_statistics(id_,limit,fb_upload_type):
    '''
    This function gets the total reactions of each feed  
    ParameterL
        id_: a string id to a facebook object such as a page or person
        limit: the limit to the numner of posts obtained from the request in string
        fb_upload_type: a valid type of upload as specified in FB docs: photo, post, videos etc in string
    Return: 
        a list of dictionary of the number of each different kind of reaction for each post
    '''
    request_url="https://graph.facebook.com/"+id_+"?fields="+fb_upload_type+".limit("+limit+"){id}" #creating request url
    
    response= requests.get(request_url,params=d)
    response_json=response_to_json(response) # converting response to json
    user=[]
    reaction_type=['LIKE','LOVE','WOW','HAHA','SAD','ANGRY','THANKFUL']

    for object_ in response_json[fb_upload_type]['data']:
        buffer={}
        for type_ in reaction_type:
            buffer[type_]=get_reaction_count(object_['id'],type_) #getting the count of each reaction
            
        buffer['id']=object_['id']
        user.append(buffer)
        
    return user

In [7]:
donald_trump=pd.DataFrame(reaction_statistics('153080620724','5','posts'))
hillary_clinton=pd.DataFrame(reaction_statistics('889307941125736','5','posts'))

donald_trump.head(5)

Unnamed: 0,ANGRY,HAHA,LIKE,LOVE,SAD,THANKFUL,WOW,id
0,676,131,33867,4263,2822,0,132,889307941125736_1752482928141562
1,21,280,5276,798,9,0,58,889307941125736_1745231725533349
2,25,119,2321,402,16,0,35,889307941125736_1725441390845716
3,68,189,44905,10030,31,0,213,889307941125736_1720108711378984
4,16,130,12660,1228,2,0,30,889307941125736_1718925254830663


In [8]:
hillary_clinton.head(5)

Unnamed: 0,ANGRY,HAHA,LIKE,LOVE,SAD,THANKFUL,WOW,id
0,676,131,33867,4263,2822,0,132,889307941125736_1752482928141562
1,21,280,5276,798,9,0,58,889307941125736_1745231725533349
2,25,119,2321,402,16,0,35,889307941125736_1725441390845716
3,68,189,44905,10030,31,0,213,889307941125736_1720108711378984
4,16,130,12660,1228,2,0,30,889307941125736_1718925254830663


Hence, for each cell we can see the upload_type ID to identify the post or photo and the number of reactions for each upload. 

### QUESTION: Obtaining feed data to anaylize the kinds, times and popularity of a user or page's feed. 

In this question, we get feed information for the artist Bon Dylan (though are function us abstracted to get information for any user whose feed is publically available or a user who has authenticated us though OAUTH 2.0)

After obtaining the user ID, we used the Facebook Graph API explorer to see the response contents of a request to the fields to the user's feed. There were various kind of data available whihc can also be found on FB's docs (https://developers.facebook.com/docs/graph-api/reference/v2.11/user/feed).From the different fields we picked ones which would be interesting to look at such as number of shares on the feed post, the times and dates of the posts to see the frequency of the user's FB usage, the kind of post(status,story,video etc) and other such information. Once again, we had to modify the JSON response so that it would be accepted by a pandas DF. 

In [10]:
def feed_data(object_id,limit):
    '''
    This function generates a list of dictionaries for each feed of information
    Parameters:
        object_id: the id of the object posting events in string
        limit: the number of most recent events in string
    Return: 
        a list of dictionaries where each data is a single feed of information
    
    '''
    request_url="https://graph.facebook.com/"+object_id+"?fields=feed.limit("+str(limit)+"){id}"
    
    response= requests.get(request_url,params=d)
    response_json=response_to_json(response) # converting response to json
    
    feed_list=[] #creaing an empty list to hold feed dictionaries
    for feed_id in response_json['feed']['data']:
        
        feed_info={}
        feed_info= feed_(feed_id['id'])
        feed_info['share_count']=feed_info['shares']['count']
        del feed_info['shares']
        feed_list.append(feed_info)
        
    return feed_list #returning the feed list

In [11]:
Bob_Dylan=pd.DataFrame(feed_data('153080620724','10'))
Bob_Dylan.head(5)

Unnamed: 0,created_time,id,name,share_count,status_type,type
0,2017-12-12T12:18:10+0000,153080620724_10160285573170725,,262,mobile_status_update,status
1,2017-12-11T20:11:00+0000,153080620724_10160282509845725,MAGA!,3133,added_video,video
2,2017-12-11T18:14:50+0000,153080620724_10160282276790725,More Fake News,3055,added_video,video
3,2017-12-11T16:25:39+0000,153080620724_10160281903875725,,1353,added_photos,photo
4,2017-12-11T12:15:00+0000,153080620724_10160280109815725,,685,mobile_status_update,status


### Question: Get the top twenty frequency of friends who like our post 

In the cell below, it is our code for the first question, which is the top friends who like our post the most. First, we created a function to convert the response into json format since we would be making a lot of requests and create dictionary from them. This was quite easy and did not take a lot of our time. Next, we wrote a function to get the total number of reactions. We did this by parsing the ids of the object(which are the posts) into a GET request so that it can get the information from all objects. Then, we wrote another function called most_frequent to get the number of likes from each friend. This function took most of our time since we had to design an algorithm to sum up the total of likes from every friend. When this function worked, the rest was easier since we only had to put them in a dictionary and get the top 20 frequency. Lastly, we imported the dataframe of top 20 frequency into dataframe and to csv. 

In this question overall, another problem that we also struggled with was getting the top 20 frequency. First, after getting the total likes from everyone, we had to append the likes into a list. Then , we sorted the list from the most likes to the least likes and got the top 20. Then, we checked if the names and likes in the total likes dictionary were also in the top 20 list. If they were, we would put them into a new dictionary, whose keys were names and values were number of likes. Beside the frequency and the total likes algorithm that we designed, the other functions were quite straightforward. The function takes a facebook object id which could be a user or page, the numner of posts or photos we want to check for and the type of the post we want to check for. Hence, we offer a good amount of flexibility.

In [63]:
def friend_likes(id_,limit,fb_upload_type):
    '''
    This function gets  a dictionary for each kind of reactions for each post
    Parameter:
        id_: a string id to a facebook object such as a page or person
        limit: the limit to the numner of posts obtained from the request in string
        fb_upload_type: a valid type of upload as specified in FB docs: photo, post, videos etc in string
    Return: 
        a list of dictionary of the number of each different kind of reaction for each post
    '''
    request_url="https://graph.facebook.com/"+id_+"?fields="+fb_upload_type+".limit("+limit+"){id}"
    
    response= requests.get(request_url,params=d)
    photoID_list=response_to_json(response) # converting response to json
      
    myDict={} # Dictionary that contains the frequency of likes for each friend
    
    
    for object_ in photoID_list[fb_upload_type]['data']:
        
        response=requests.get("https://graph.facebook.com/"+object_['id']+"/reactions",params=d) # Get the likes data
                                                                                            
        response_json=response_to_json(response)
        # For each ulpoad_type, let's get the list of friends and the number of time they like the 
        for name_dict in response_json['data']:
            name=name_dict['name'] 
            if name not in myDict.keys() : # Check if the friends have already like the photo
                myDict[name] = 1
            else:
                myDict[name]= myDict[name]+1
                
    return most_frequent(myDict,20)

friend_likes('me','200','posts')

{'Ananya Nigam': 18,
 'Ashray Shome': 19,
 'Ayan Sarkar': 12,
 'Chirag Varun Shukla': 30,
 'Ishan Shah': 15,
 'Jastej Singh': 11,
 'Karan Tibrewal': 15,
 'Keshav Khemka': 40,
 'Mallika Kapur': 13,
 'Nikhil Shrestha': 23,
 'Pragya Chopra': 18,
 'Rahul Ganguly': 14,
 'Rhea Arora': 14,
 'Riyan Vatcha': 11,
 'Sadashiv Mitra': 13,
 'Shrivats Modi': 14,
 'Varun Shah': 53,
 'Vivan Bhagat': 10,
 'Yamir Tainwala': 13,
 'Yash Bajaj': 11}

In [69]:
# Getting the like frequency for top 20 friends for past 200 posts
df_likes_posts= pd.DataFrame([friend_likes('me','200','posts')])
# Getting the like frequency for top 20 friends for past 200 posts
df_likes_photo= pd.DataFrame([friend_likes('me','200','photos')])

In [72]:
to_csv('df_likes_posts.csv',df_likes_posts)
to_csv('df_likes_photos.csv',df_likes_photo)

In [85]:
df_likes_posts

Unnamed: 0,Ananya Nigam,Ashray Shome,Ayan Sarkar,Chirag Varun Shukla,Ishan Shah,Jastej Singh,Karan Tibrewal,Keshav Khemka,Mallika Kapur,Nikhil Shrestha,Pragya Chopra,Rahul Ganguly,Rhea Arora,Riyan Vatcha,Sadashiv Mitra,Shrivats Modi,Varun Shah,Vivan Bhagat,Yamir Tainwala,Yash Bajaj
0,18,19,12,30,15,11,15,40,13,23,18,14,14,11,13,14,53,10,13,11


In [86]:
df_likes_photo

Unnamed: 0,Alex Rivera Fajardo,Asesha Dayal,Ashmita Das,Farhan Zaki,Gargi H Malhotra,Hai Nghiem,Hussein Bakry,Keshav Khemka,Mallika Kapur,Mihika Raj,Moe Kyaw Thu,Neel Kejriwal,Nikhil Shrestha,Norma Sance,Sherief Magdy Shahin,Sherif Mohsen,Soham Chopra,Varun Shah,Viraj Bhatia,Yassein Ahmed
0,7,5,5,26,5,6,5,5,11,7,5,6,9,6,6,7,5,20,8,7


### Question: Demographic analysis for place that we have been tagged 
In this question, we want to explore the places that we have travelled and been tagged on Facebook. We want to create a demographic plot that show where we have been based on the latitudes and longitudes. Since we already know how to perform a GET request from the previous questions, this question did not take us a lot of times. We did this question by writing a function called tagged_data.

First, this function took object_id, which was the id for places, as a parameter. The parameter then would be parsed into the GET request to get the locations for each id. Once the request was successful, we converted the response into json format and perform an iteration. We used list apprehension to create a list that include the data for places that we have been tagged. Then, we created a dictionary such that for each location data in the list, we would put the latitudes, longitudes and location names as the keys for the dictionary, and their values are the values in the dictionary. We later on appended each tagged location dictionary to a list.

In [78]:
def tagged_data(object_id):
    '''
    This function generates a dictionary which includes the longitudes, latitudes, and names for places.
    Parameter:
        id_: a string id to a facebook object such as a page or person
    Return: 
        a list of dictionaries of latitude,longitude, country and name of tagged places
    '''
    
    request_url="https://graph.facebook.com/"+object_id+"?fields=tagged_places.limit(200)"
    response= requests.get(request_url,params=d)
    place_list=response_to_json(response) # converting response to json

    tagged_place_list = [element['place'] for element in place_list['tagged_places']['data']] # Create a list of photo id
    tagged_list=[]
    for place in tagged_place_list:
        buffer_dict={} #creating a buffer dictionary
        
        buffer_dict['latitude']= place['location']['latitude']
        buffer_dict['longitude']= place['location']['longitude']
        buffer_dict['name']=place['name']
        
        tagged_list.append(buffer_dict) # appending each tagged location dictionary to a list
        
    return tagged_list

We will import a dataframe that contains the data about latitude, longitude, and name. Then, we created a csv file out of this dataframe.

In [83]:
df_tagged_places= pd.DataFrame(tagged_data('me'))
to_csv('df_tagged_places.csv',df_tagged_places) 

Unnamed: 0,latitude,longitude,name
0,40.087238,-82.427376,Steak 'n Shake
1,39.98578,-82.89909,Brio Tuscan Grille
2,51.835402,-2.220937,Taylor House
3,40.0675,-82.5122,"Granville, Ohio"
4,22.542297,88.385868,The GRID
5,22.542297,88.385868,The GRID
6,55.969382,-3.159516,"Canton Hill, Edinburgh"
7,22.54806,88.35329,Monkey Bar Kolkata
8,40.072166,-82.5225,Denison University
9,22.533726,88.364239,ZEST at CCFC


We then showed the first ten row in this dataframe.

In [None]:
df_tagged_places.head(10)

### Conclusion
This notebook is the major part for this project, where we try to succeed the goals that we have set out. The four goals are getting the top 20 frequency of friends who like our post, Donald Trump's posts reactions, events on Facebook, and the places that we have been tagged in. Throughout this API Project, the second notebook is the most time-consuming notebook, and also the most complicated, in which we have to figure out so many things. First, we have to play around with the Graph API Explorer to learn the syntax for our GET request. Then, we have to design many algorithms so that it would retun what we want to analyze. For example, the algorithm for Trump's post reactions, all the algorithms for the total likes for every friend on Facebook. Lastly, we have to manipulate the data that we get to make it turn into dataframe for our third notebook. By and large, this notebook is the most complicated, but it is also the most fun notebook. I learn a lot from this notebook, not just about computer science or the API itself, but also about how to work with a partner and how to self-explore. Throughout this notebook, I have sharpened many skills for my future career.