# Connecting to Foursquare and Yelp APIs

### Task 1: Connect to the [Foursquare](https://developer.foursquare.com/places) API

The *Foursquare* website offers a self-explanatory documentation about how to get API at the link <https://location.foursquare.com/developer/reference/search-data>. Under the section *Search and Data*, they offer the possibility to return places based on what especific "user-submitted keywords" by calling out the search from specific categories, such as: 

- "place search" based on a location and other parameters;
- "get place details" based on information and metadata of a place queried by using the fsq_id;
- "get place photos" based on retrieving photos of a place by using the fsq_id;
- "get place tips" based on retrieving tips for a place using the fsq_id;
- "place match" based on returning the record of a POI (via FSQ_ID) when it is given a name and a location.

Therefore, there are specific keywords and parameters which can be used in the queries to match names, categories and tips not only to retrieve the information but to filter the one desired. As an example, you can use such keywords as the *latitude* and *longitude* ("ll" string), *price range* ("min_price" and "max_price"), *time of the venue* ("open_at" and "open_now"), *sorting order* ("sort" string by distance, rating or relevance) among others.

In [1]:
# First, let's import sobre libraries to explore the Foursquare API:

import requests
from IPython.display import JSON 
import pandas as pd
import numpy as np
import os

In [10]:
# print(os.environ)

In [7]:
FSQ_API_KEY=os.environ['FOURSQUARE_API_KEY']

In [14]:
YELP_API_KEY=os.environ['YELP_API_KEY']

In [15]:
# As we have a population of 90 stations, it is important to do a sample of this distribution.
# Also, as this project must be done in a short time, let's first extract a sample from the population of citybikes stations in Recife with a random function.
# But, for that, let's download our dataframe created during Task 1 and then use a for loop:

df_city_bikes_recife = pd.read_json("http://api.citybik.es/v2/networks/bikerecife")
total_city_bikes_stations_recife = df_city_bikes_recife["network"]["stations"]

def array_of_parsed_stations():
    array_of_elements = []

    for index, each_station in enumerate(total_city_bikes_stations_recife):
        # print(index % 10 == 0)
        if index % 10 == 0:
            id = each_station["id"]
            uid = each_station["extra"]["uid"]
            name = each_station["name"]
            address = each_station["extra"]["address"]
            empty_slots = each_station["empty_slots"]
            free_bikes = each_station["free_bikes"]
            last_updated = each_station["extra"]["last_updated"]
            payment = each_station["extra"]["payment"]		
            payment_terminal = each_station["extra"]["payment-terminal"]
            renting_bikes = each_station["extra"]["renting"]
            returning_bikes = each_station["extra"]["returning"]
            latitude = each_station["latitude"]
            longitude = each_station["longitude"]
            timestamp = each_station["timestamp"]
            parsed_station_obj = {"Id": id, "Uid": uid, "Name": name, "Address": address, "Empty Slots": empty_slots, "Free Bikes": free_bikes, "Last Updated": last_updated, "Payment": payment, "Payment Terminal": payment_terminal, "Renting Bikes": renting_bikes, "Returning Bikes": returning_bikes, "Latitude": latitude, "Longitude": longitude, "Timestamp": timestamp} 
            array_of_elements.append(parsed_station_obj)
            
    stations_count = len(array_of_elements)
    print(stations_count, "stations returned")
    
    return array_of_elements

parsed_stations = array_of_parsed_stations()


9 stations returned


In [16]:
# Now, let's save the sample in a data frame in a .csv file:

df = pd.DataFrame(parsed_stations)
df.to_csv("df_city_bikes_recife_network_sample.csv", index = True)


In [17]:
first_station_lat = parsed_stations[0]["Latitude"]
first_station_lon = parsed_stations[0]["Longitude"]
# print(parsed_stations)
# print(first_station_lon)


In [18]:
# Then, let's explore the structure of the data from the object requested by trying to get a request from the station of the index 0.
# For that get request, let's use the "ll" string (latitude and longitude) keywords:

foursquare_url = "https://api.foursquare.com/v3/places/search?ll=" + str(first_station_lat) + "," + str(first_station_lon)

headers = {
    "accept": "application/json",
    "Authorization": FSQ_API_KEY
}

response = requests.get(foursquare_url, headers = headers)
foursquare_response = response.json()
# print(foursquare_response)


With this API, we can see that there a huge number of points of interest (POIs) and entertainment around the station of index 0. Although is is not well organized, it is possible to pin point museums, coffe shops, theaters, bakeries and many others POI. This could be related with the number of bikes available in the station (it might be 11 in total if we consider the variables *empty_slots* and *free_bike* in our model afterwards as counting the number of total bikes available in each station). But, to be sure of that, we would need to start our research with a sample before moving on to do our search with the Yelp API. 

### Task 2: Connect to the [Yelp](https://www.yelp.com/developers/documentation/v3/get_started) API

Yelp offers a clear documentation with an example about how to use their API at their website: <https://docs.developer.yelp.com/docs/fusion-intro>. However, they also require you to create an app to obtain an API key otherwise you won't be able to request their information. Because of that, I had to create a "test" app to generate an API key in order to start using it. 

In [21]:
yelp_url = "https://api.yelp.com/v3/businesses/search?sort_by=best_match&latitude="+ str(first_station_lat) + "&longitude=" + str(first_station_lon)

headers = { 
    "accept": "application/json", 
    "Authorization": YELP_API_KEY
}

response = requests.get(yelp_url, headers = headers)
yelp_response = response.json()
# print(yelp_response)


As we can see, this API also offers similar services covered by Foursquare. With that, museums, coffe shops, diners, entertainment places, among others, were retrieved. On the top of that, Yelp provide their developers or users like me to retrieve the API with a filter called *sort by best* which is a good keyword to be used in the API to retrieve information about only especific places marked with the best metrics (ranked probably after a very especific amount of positive users reviewers that positively marked those places. 

### Task 3: For each of the bike stations in Part 1, query both APIs to retrieve information for the following in that location:

 - Restaurants or bars
 - Various POIs (points of interest) of your choice

In [22]:
# Let's search for places under the category "sort by rating" in Foursquare wich includes restaurants, coffee shops and more retrived by users rating:

def get_pois_from_foursquare():
    foursquare_pois = []
    for each_station in parsed_stations:
    
        foursquare_url = "https://api.foursquare.com/v3/places/search?ll=" + str(each_station["Latitude"]) + "," + str(each_station["Longitude"]) + "&sort=RATING"

        headers = {
        "accept": "application/json",
        "Authorization": FSQ_API_KEY
        }

        response = requests.get(foursquare_url, headers = headers)
        foursquare_response = response.json()
        foursquare_pois.append(foursquare_response)
    return foursquare_pois

# print(get_pois_from_foursquare())


In [23]:
foursquare_pois = get_pois_from_foursquare()
# print(foursquare_pois)


In [24]:
# # Let's search for places under the category "sort by best" in Yelp which includes restaurantes, bars and bakeries:

def get_pois_from_yelp():
    yelp_pois = []
    for each_station in parsed_stations:
        
        yelp_url = "https://api.yelp.com/v3/businesses/search?sort_by=best_match&latitude=" + str(each_station["Latitude"]) + "&longitude=" + str(each_station["Longitude"])
       
        headers = { 
        "accept": "application/json", 
        "Authorization": YELP_API_KEY
        }
        
        response = requests.get(yelp_url, headers = headers)
        yelp_response = response.json()
        yelp_pois.append(yelp_response)
    return yelp_pois

# print(get_pois_from_yelp())


In [25]:
yelp_pois = get_pois_from_yelp()
# print(yelp_pois)


### Task 4: Send a request to Foursquare and Yelp with a small radius (1000m) for all the bike stations in your city of choice. 

In [26]:
# Let's send a request to Foursquare considering the small radius of 1000m considering our sample of bike stations:

radius = 1000

def get_pois_with_radius_foursquare():
    foursquare_radius_1000 = []
    for each_station in parsed_stations:
    
        foursquare_url = "https://api.foursquare.com/v3/places/search?ll=" + str(each_station["Latitude"]) + "," + str(each_station["Longitude"]) + "&radius=" + str(radius) + "&sort=RATING"
        headers = {
        "accept": "application/json",
        "Authorization": FSQ_API_KEY
        }

        response = requests.get(foursquare_url, headers = headers)
        foursquare_response = response.json()
        foursquare_radius_1000.append(foursquare_response)
    return foursquare_radius_1000


In [27]:
foursquare_pois_with_radius = get_pois_with_radius_foursquare()
# print(foursquare_pois_with_radius)


In [28]:
# Let's send a request to Yelp considering the small radius of 1000m considering our sample of bike stations:

def get_pois_with_radius_yelp():
    yelp_radius_1000 = []
    for each_station in parsed_stations:
    
        yelp_url = "https://api.yelp.com/v3/businesses/search?radius="+ str(radius) + "&latitude="+ str(each_station["Latitude"]) + "&longitude=" + str(each_station["Longitude"])
       
        headers = { 
        "accept": "application/json", 
        "Authorization": YELP_API_KEY
        }

        response = requests.get(yelp_url, headers = headers)
        yelp_response = response.json()
        yelp_radius_1000.append(yelp_response)
    return yelp_radius_1000

# print(get_pois_with_radius_yelp())


In [29]:
yelp_pois_with_radius = get_pois_with_radius_yelp()
# print(yelp_pois_with_radius)


### Task 5: Parse through the response to get the POI (such as restaurants, bars, etc) details you want (ratings, name, location, etc.)

In [30]:
# At Foursquare, it was retrieved 10 results by station (9 stations). Also, per station it is shown the context of the radius from a geolocation key.
# This means, it was found 90 places of interest in total found in a radius of 1000m per station.
# However, for this project it was only retrieved the first 10 results (station of index 0):

import pandas

def foursquare_array_of_pois():
    final_array = []
    foursquare_array_of_elements = []
    
    for each_poi in foursquare_pois_with_radius:
        results = each_poi["results"]
        for each_result in results:
            id = each_result["fsq_id"]
            name = each_result["name"]
            category = each_result["categories"]
            distance = each_result["distance"]
            address = each_result["location"]
            latitude = each_result["geocodes"]["main"]["latitude"]
            longitude = each_result["geocodes"]["main"]["longitude"]
            dict_of_each_poi = {"Id": id, "Name": name, "Category": category, "Distance": distance, "Address": address, "Latitude": latitude, "Longitude": longitude}
            foursquare_array_of_elements.append(dict_of_each_poi)
        final_array.append(foursquare_array_of_elements)
        
    return final_array

fsq_results_to_convert = foursquare_array_of_pois()
# print(fsq_results_to_convert)


In [31]:
# The function sum is used to flatten the nested list (https://stackoverflow.com/a/20113075): 
fsq_flattened = sum(fsq_results_to_convert, [])
# print(fsq_flattened)

fsq_dataframe = pd.DataFrame(fsq_flattened, columns=["Id", "Name", "Category", "Distance", "Address", "Latitude", "Longitude"])
display(fsq_dataframe)


Unnamed: 0,Id,Name,Category,Distance,Address,Latitude,Longitude
0,4c052782187ec92880fbb77b,Entre Amigos o Bode,"[{'id': 13087, 'name': 'Northeastern Brazilian...",559,"{'address': 'Rua da Hora, 695', 'country': 'BR...",-8.047191,-34.894740
1,57f70444498e2bb7c521914f,Confraria da Barba,"[{'id': 11061, 'name': 'Health and Beauty Serv...",909,"{'address': 'Rua do Cupim, 53', 'country': 'BR...",-8.047324,-34.898247
2,5022acf5e4b057124f5c6d21,Villa Sandino,"[{'id': 13030, 'name': 'Buffet', 'icon': {'pre...",671,"{'address': 'Rua Nicarágua, 139', 'country': '...",-8.048720,-34.894997
3,4be5a8decf200f47b881133c,Dalena,"[{'id': 13040, 'name': 'Dessert Shop', 'icon':...",786,"{'address': 'Avenida Cons. Rosa e Silva, 431',...",-8.047569,-34.897101
4,54986bb9498eccc96c75144e,Lala Cafe & Cozinha Afetiva,"[{'id': 13034, 'name': 'Café', 'icon': {'prefi...",289,"{'address': 'Rua Amelia, 470', 'country': 'BR'...",-8.043648,-34.893257
...,...,...,...,...,...,...,...
805,5064df26e4b03541b86d9797,Subway,"[{'id': 13039, 'name': 'Deli', 'icon': {'prefi...",841,"{'country': 'BR', 'cross_street': 'R. Jose Bon...",-8.040979,-34.909780
806,558b4cd9498e09c09e19740c,FriSabor Madalena,"[{'id': 13046, 'name': 'Ice Cream Parlor', 'ic...",613,"{'address': 'Avenida Visconde de Albuquerque, ...",-8.052216,-34.911827
807,508dc7a2e4b0b69fe94d8799,Mak Burguer,"[{'id': 13031, 'name': 'Burger Joint', 'icon':...",529,"{'country': 'BR', 'cross_street': '', 'formatt...",-8.045932,-34.918789
808,507c19dce4b0112fc4f473de,Academia Corpore,"[{'id': 18000, 'name': 'Sports and Recreation'...",291,"{'address': 'Rua Tomaz Gonzaga', 'country': 'B...",-8.049831,-34.914409


In [32]:
columns = ['Name', 'Distance', 'Latitude', 'Longitude']
new_fsq_df_parsed = fsq_dataframe[columns]
display(new_fsq_df_parsed)


Unnamed: 0,Name,Distance,Latitude,Longitude
0,Entre Amigos o Bode,559,-8.047191,-34.894740
1,Confraria da Barba,909,-8.047324,-34.898247
2,Villa Sandino,671,-8.048720,-34.894997
3,Dalena,786,-8.047569,-34.897101
4,Lala Cafe & Cozinha Afetiva,289,-8.043648,-34.893257
...,...,...,...,...
805,Subway,841,-8.040979,-34.909780
806,FriSabor Madalena,613,-8.052216,-34.911827
807,Mak Burguer,529,-8.045932,-34.918789
808,Academia Corpore,291,-8.049831,-34.914409


In [33]:
new_fsq_df_parsed_and_grouped_by_name = new_fsq_df_parsed.sort_values(by = 'Name', inplace = False)
print(new_fsq_df_parsed_and_grouped_by_name)

                 Name  Distance  Latitude  Longitude
808  Academia Corpore       291 -8.049831 -34.914409
718  Academia Corpore       291 -8.049831 -34.914409
178  Academia Corpore       291 -8.049831 -34.914409
538  Academia Corpore       291 -8.049831 -34.914409
88   Academia Corpore       291 -8.049831 -34.914409
..                ...       ...       ...        ...
602     Villa Sandino       875 -8.048720 -34.894997
782     Villa Sandino       875 -8.048720 -34.894997
152     Villa Sandino       875 -8.048720 -34.894997
362     Villa Sandino       671 -8.048720 -34.894997
452     Villa Sandino       671 -8.048720 -34.894997

[810 rows x 4 columns]


In [34]:
# identify duplicates
duplicates = new_fsq_df_parsed_and_grouped_by_name.duplicated(subset='Name')

# drop duplicates
new_fsq_df_parsed_and_grouped_by_name.drop_duplicates(subset='Name', keep='first', inplace=True)

In [35]:
# Now that we identified the duplicates and dropped them, let's create our fsq dataset of pois .csv file:

new_fsq_df_parsed_and_grouped_by_name.to_csv("new_fsq_df_parsed_and_grouped_by_name.csv")

In [37]:
# At Yelp, different quantity of points of interest per station were found.

def yelp_array_of_pois():
    final_array = []
    yelp_array_of_elements = []
    
    for each_poi in yelp_pois_with_radius:
        businesses = each_poi["businesses"]
        for each_business in businesses:
            id = each_business["id"]
            name = each_business["name"]
            category = each_business["categories"]
            distance = int(each_business["distance"])
            address = each_business["location"]["address1"]
            lat = each_business["coordinates"]["latitude"]
            lon = each_business["coordinates"]["longitude"]
            review_count = each_business["review_count"]
            rating = each_business["rating"]
            dict_of_each_poi = {"Id": id, "Name": name, "Address": address, "Latitude": lat, "Longitude": lon, "Review Count": review_count, "Rating": rating}
            yelp_array_of_elements.append(dict_of_each_poi)
            
        final_array.append(yelp_array_of_elements)
   
    return final_array

yelp_pois_to_convert = yelp_array_of_pois()
# print(yelp_pois)


In [38]:
yelp_flattened = sum(yelp_pois_to_convert, [])
# print(yelp_converted)

yelp_dataframe = pd.DataFrame(yelp_flattened, columns=["Id", "Name", "Category", "Distance", "Address", "Latitude", "Longitude", "Rating", "Review Count"])
display(yelp_dataframe)



Unnamed: 0,Id,Name,Category,Distance,Address,Latitude,Longitude,Rating,Review Count
0,pSJjWwWclIojJHdpmZ9nIQ,Empório Sertanejo,,,"Rua da Hora, 34",-8.042300,-34.891411,4.5,12
1,sLM_e4kcUdNj7A5jfrJocA,O Rei das Coxinhas,,,"Rua Santo Elias, 223",-8.044310,-34.894249,4.5,11
2,5VIe3yjg05FEkUT1VOLQ4w,Dalena,,,"Av. Conselheiro Rosa e Silva, 431",-8.047616,-34.897091,4.5,9
3,zmR2B9wNwLuxMlI3ygm1yw,Chiwake,,,"Rua da Hora, 820",-8.047924,-34.895447,5.0,5
4,M85R9TH9EkvvtAJPRrs5eQ,ZEN,,,"R. da Hora, 295",-8.044134,-34.892897,3.5,13
...,...,...,...,...,...,...,...,...,...
1471,WWOxHgs5wiVd4NzUDRVbeg,Irajá,,,"R. Conde de Irajá, 811",-8.046530,-34.910920,4.0,3
1472,4TusYTIMZyMNxjmnRvxMUw,Pizzaria La Cuca,,,"Rua Vitoriano Palhares, 144, Torre",-8.043670,-34.909302,4.5,3
1473,YNCY-YmXDhGSAU5C_P46tw,Santa Pizza,,,"Rua José Bonifácio, 667",-8.042166,-34.908485,4.0,4
1474,_vxHUiL9pH8XC1qxKC6VOA,Nova Torre,,,R. Visconde de Itaparica 99,-8.042292,-34.909802,5.0,2


In [39]:
columns = ['Name', 'Latitude', 'Longitude', 'Rating', 'Review Count']
new_yelp_df_parsed = yelp_dataframe[columns]
display(new_yelp_df_parsed)

Unnamed: 0,Name,Latitude,Longitude,Rating,Review Count
0,Empório Sertanejo,-8.042300,-34.891411,4.5,12
1,O Rei das Coxinhas,-8.044310,-34.894249,4.5,11
2,Dalena,-8.047616,-34.897091,4.5,9
3,Chiwake,-8.047924,-34.895447,5.0,5
4,ZEN,-8.044134,-34.892897,3.5,13
...,...,...,...,...,...
1471,Irajá,-8.046530,-34.910920,4.0,3
1472,Pizzaria La Cuca,-8.043670,-34.909302,4.5,3
1473,Santa Pizza,-8.042166,-34.908485,4.0,4
1474,Nova Torre,-8.042292,-34.909802,5.0,2


In [40]:
# new_yelp_df_parsed.to_csv("new_yelp_df_parsed.csv")

In [41]:
new_yelp_df_parsed_and_grouped_by_name = new_yelp_df_parsed.sort_values(by = 'Name', inplace = False)
print(new_yelp_df_parsed_and_grouped_by_name)

              Name  Latitude  Longitude  Rating  Review Count
192      Alphaiate -8.104383 -34.887301     4.5             4
356      Alphaiate -8.104383 -34.887301     4.5             4
520      Alphaiate -8.104383 -34.887301     4.5             4
1340     Alphaiate -8.104383 -34.887301     4.5             4
1176     Alphaiate -8.104383 -34.887301     4.5             4
...            ...       ...        ...     ...           ...
858   Ça Va Bistrô -8.095550 -34.884041     4.5             4
1022  Ça Va Bistrô -8.095550 -34.884041     4.5             4
1186  Ça Va Bistrô -8.095550 -34.884041     4.5             4
694   Ça Va Bistrô -8.095550 -34.884041     4.5             4
530   Ça Va Bistrô -8.095550 -34.884041     4.5             4

[1476 rows x 5 columns]


In [42]:
# identify duplicates
duplicates = new_yelp_df_parsed_and_grouped_by_name.duplicated(subset='Name')

# drop duplicates
new_yelp_df_parsed_and_grouped_by_name.drop_duplicates(subset='Name', keep='first', inplace=True)

In [43]:
# Now that we identified the duplicates and drop them,let's create our yelp dataset of pois .csv file:

new_yelp_df_parsed_and_grouped_by_name.to_csv("new_yelp_df_parsed_and_grouped_by_name.csv")


In [44]:
yelp_dataframe_grouped_by_lat_and_lon = yelp_dataframe.grouped = yelp_dataframe.groupby(['Latitude', 'Longitude']).sum()
# print(yelp_dataframe_grouped_by_lat_and_lon)

# yelp_dataframe_grouped_by_lat_and_lon.to_csv("yelp_dataframe_grouped_by_lat_and_lon.csv")


  yelp_dataframe_grouped_by_lat_and_lon = yelp_dataframe.grouped = yelp_dataframe.groupby(['Latitude', 'Longitude']).sum()
  yelp_dataframe_grouped_by_lat_and_lon = yelp_dataframe.grouped = yelp_dataframe.groupby(['Latitude', 'Longitude']).sum()


In [45]:
yelp_dataframe_sorted_by_name = yelp_dataframe.sort_values(by = 'Name', inplace = False)
# print(yelp_dataframe_sorted_by_name)

yelp_counts_of_pois = yelp_dataframe_sorted_by_name['Name'].value_counts()
display(yelp_counts_of_pois)

Bar Central                  27
Empório Sertanejo            27
Papaya Verde                 18
Pão de Açúcar                18
Restaurante Puxinanã         18
                             ..
Ganache Tortas Finas          9
Frontal Ponto de Encontro     9
Famiglia Lucco Pizzas         9
Famiglia Lucco                9
Ça Va Bistrô                  9
Name: Name, Length: 126, dtype: int64

In [46]:
# yelp_counts_of_pois.to_csv("yelp_counts_of_pois.csv")


### Task 6: Put your parsed results into a DataFrame

In [47]:
df_fsq_and_yelp_merged_by_name = pd.merge(fsq_dataframe, yelp_dataframe, on = "Name", how = "inner")

fqs_and_yelp_merged_by_concat = pd.concat([fsq_dataframe, yelp_dataframe], ignore_index=True, sort=False)
display(df_fsq_and_yelp_merged_by_name)


Unnamed: 0,Id_x,Name,Category_x,Distance_x,Address_x,Latitude_x,Longitude_x,Id_y,Category_y,Distance_y,Address_y,Latitude_y,Longitude_y,Rating,Review Count
0,4be5a8decf200f47b881133c,Dalena,"[{'id': 13040, 'name': 'Dessert Shop', 'icon':...",786,"{'address': 'Avenida Cons. Rosa e Silva, 431',...",-8.047569,-34.897101,5VIe3yjg05FEkUT1VOLQ4w,,,"Av. Conselheiro Rosa e Silva, 431",-8.047616,-34.897091,4.5,9
1,4be5a8decf200f47b881133c,Dalena,"[{'id': 13040, 'name': 'Dessert Shop', 'icon':...",786,"{'address': 'Avenida Cons. Rosa e Silva, 431',...",-8.047569,-34.897101,5VIe3yjg05FEkUT1VOLQ4w,,,"Av. Conselheiro Rosa e Silva, 431",-8.047616,-34.897091,4.5,9
2,4be5a8decf200f47b881133c,Dalena,"[{'id': 13040, 'name': 'Dessert Shop', 'icon':...",786,"{'address': 'Avenida Cons. Rosa e Silva, 431',...",-8.047569,-34.897101,5VIe3yjg05FEkUT1VOLQ4w,,,"Av. Conselheiro Rosa e Silva, 431",-8.047616,-34.897091,4.5,9
3,4be5a8decf200f47b881133c,Dalena,"[{'id': 13040, 'name': 'Dessert Shop', 'icon':...",786,"{'address': 'Avenida Cons. Rosa e Silva, 431',...",-8.047569,-34.897101,5VIe3yjg05FEkUT1VOLQ4w,,,"Av. Conselheiro Rosa e Silva, 431",-8.047616,-34.897091,4.5,9
4,4be5a8decf200f47b881133c,Dalena,"[{'id': 13040, 'name': 'Dessert Shop', 'icon':...",786,"{'address': 'Avenida Cons. Rosa e Silva, 431',...",-8.047569,-34.897101,5VIe3yjg05FEkUT1VOLQ4w,,,"Av. Conselheiro Rosa e Silva, 431",-8.047616,-34.897091,4.5,9
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1372,510226c4e4b0b1a7982b2980,Parque Santana,"[{'id': 10000, 'name': 'Arts and Entertainment...",765,"{'address': 'Rua Astério Rufino Alves, Recife'...",-8.040998,-34.916865,H2BcFkCm3bFNfZ6XxLjk4g,,,R. Jorge Gomes de Sá,-8.006160,-34.927231,5.0,4
1373,510226c4e4b0b1a7982b2980,Parque Santana,"[{'id': 10000, 'name': 'Arts and Entertainment...",765,"{'address': 'Rua Astério Rufino Alves, Recife'...",-8.040998,-34.916865,H2BcFkCm3bFNfZ6XxLjk4g,,,R. Jorge Gomes de Sá,-8.006160,-34.927231,5.0,4
1374,510226c4e4b0b1a7982b2980,Parque Santana,"[{'id': 10000, 'name': 'Arts and Entertainment...",765,"{'address': 'Rua Astério Rufino Alves, Recife'...",-8.040998,-34.916865,H2BcFkCm3bFNfZ6XxLjk4g,,,R. Jorge Gomes de Sá,-8.006160,-34.927231,5.0,4
1375,510226c4e4b0b1a7982b2980,Parque Santana,"[{'id': 10000, 'name': 'Arts and Entertainment...",765,"{'address': 'Rua Astério Rufino Alves, Recife'...",-8.040998,-34.916865,H2BcFkCm3bFNfZ6XxLjk4g,,,R. Jorge Gomes de Sá,-8.006160,-34.927231,5.0,4


In [48]:
# Finally, let's save our merged data frame in a .csv file:

df_fsq_and_yelp_merged_by_name.to_csv("df_fsq_and_yelp_merged_by_name.csv", index = True)
# print(df_fsq_and_yelp_merged_by_name)


In [49]:
# Also, let's try to do a merge by grouping our pois information from fsq and yelp:

df_fsq_and_yelp_parsed_and_grouped_by_name = pd.merge(new_fsq_df_parsed_and_grouped_by_name, new_yelp_df_parsed_and_grouped_by_name, on = "Name", how = "outer")

fqs_and_yelp_merged_by_concat = pd.concat([fsq_dataframe, yelp_dataframe], ignore_index=True, sort=False)
display(df_fsq_and_yelp_parsed_and_grouped_by_name)



Unnamed: 0,Name,Distance,Latitude_x,Longitude_x,Latitude_y,Longitude_y,Rating,Review Count
0,Academia Corpore,291.0,-8.049831,-34.914409,,,,
1,Altar Cozinha Ancestral,447.0,-8.046033,-34.881433,,,,
2,Açaí,518.0,-8.051654,-34.887046,,,,
3,Baillar Escola de Dança,644.0,-8.047550,-34.888216,,,,
4,Bar do Neno,526.0,-8.032669,-34.909412,,,,
...,...,...,...,...,...,...,...,...
186,Wiella Bistrô,,,,-8.100730,-34.887611,4.5,3.0
187,Winner Sports Bar,,,,-8.047380,-34.893791,4.5,5.0
188,Yantai Express,,,,-8.032550,-34.904148,3.0,6.0
189,ZEN,,,,-8.044134,-34.892897,3.5,13.0


In [50]:
df_fsq_and_yelp_parsed_and_grouped_by_name.to_csv("df_fsq_and_yelp_parsed_and_grouped_by_name.csv", index = False)
# print(df_fsq_and_yelp_parsed_and_grouped_by_name)


### Task 7: Comparing results between Foursquare and Yelp

#### 7.1 Which API provided you with more complete data? Provide an explanation. 

To answer that question, it is important to do interpret some of the data above by a summary of key points:

- First, it is noted that each API provided different locations or points of interest (POIs) by a radium of 1000m. Although it was used the same combination of geolocation (latitude and longitude) data by station, the results retrieved of the first station (index 0 station) were different, except by one restaurant called *Papaya Verde* which appeared in both of them.

- Second, while Foursquare API provides different information of the locations, such as *related places* and *timezones* which weren't selected to appear in this final data frame, parameters such as *name*, *distance*, *latitude* and *longitude* were part of the commonalities found between Foursquare and Yelp APIs. At the same time, those parameters are classified with some differences. Due to that, the distance parameter from the first API had its data type changed to become an integer (it was previosly a float) to make it more compatible to be read at the data frame as the distance parameter from the Yelp API was formatted already as a integer number.

- Third, with the Yelp API, we have the parameters of *reviews* and *rating count* besides name and location. Those could give a broader idea of ranking of the locations and why they were selected to appear in the API requested. As it was used a filter *sort by best match* associated with the station geolocation, the results retrieved would fit this filter. A similar filter called *popularity* was used in the Foursquare API. At this website, it wasn't possible to visualize the results by rating or reviews.

With all those keypoints in mind, I would say that both APIs provided a list of POIs that would complete each other and, therefore, by putting them together in a same dataframe helped us to visualize that there are many more possibilities of choices provided in the area. 

#### 7.2 Get the top 10 restaurants according to their rating.

In [51]:
# As the Foursquare API couldn't provide the POIs by rating, the information below was gotten from the Yelp API. 
# Below is a list of the top 10 rated POIs privded by the get request of the Yelp API:

yelp_dataframe_sorted_by_the_top_10 = new_yelp_df_parsed_and_grouped_by_name.sort_values(by = "Rating", ascending = False).head(10)

display(yelp_dataframe_sorted_by_the_top_10)


Unnamed: 0,Name,Latitude,Longitude,Rating,Review Count
112,Com.Pão,-8.04157,-34.898178,5.0,4
947,Parque da Jaqueira,-8.03734,-34.90303,5.0,4
1420,Chiwake,-8.047924,-34.895447,5.0,5
242,Cioccolatte Gelateria,-8.054922,-34.869442,5.0,4
371,Club Bardot,-8.11014,-34.892849,5.0,2
318,Recife Cocos,-8.008492,-34.936516,5.0,3
417,Praça da República,-8.06134,-34.87772,5.0,5
17,Ponte Nova,-8.046531,-34.898122,5.0,4
148,Parque Santana,-8.00616,-34.927231,5.0,4
1170,Capitania Forneria & Mar,-7.995275,-34.838913,5.0,1
