# Mini-project II
The details for the miniproject from Week 2


In this miniproject, we will combine and practice topics that we have learned in previous two weeks:
- APIs
- Databases (SQL)
- Pandas
- Processing special data types in Python

We will work with these APIs:
1. [Foursquare](https://developer.foursquare.com/places) - we have already come across this one
2. [Yelp](https://www.yelp.com/developers/documentation/v3/get_started) - this API offers similar services as Foursquare.
3. (Stretch) [Google Places API](https://developers.google.com/places/web-service/intro) - this google api offers similar service as well.

The main goal of the mini-project is to build the database of restaurants, bars and various points of interest (POIs) in the area of your choice and find out which API has better coverage in the selected area. The APIs have limited number of requests for free, so start with the smaller area. The project consists of following tasks:

- pull the data about various POIs in the area through API. (Search Yelp for companiees that are in the area using [these instructions](https://www.yelp.com/developers/documentation/v3/business_search)). If you run out of requests for any of the APIs, don't worry, it's ok to use only sample data you already have or the POIs from the Yelp API. It's approach and process that counts, not the actual number of places we were able to get.
- create own SQLite database and store the data about the POIs. Think about what will be the best structure of the database. We've used and created sqlite3 databases before in the activity [**SQL in Python**](https://data.compass.lighthouselabs.ca/b9e08cd5-68c6-490c-a32b-a66f01bf53e1).
- compare the results using SQL or Pandas (it's up to you:)) and see which API has a better coverage of the area.
- choose the top 10 POIs based on the popularity (number of reviews or average rating) ([Yelp](https://www.yelp.com/developers/documentation/v3/business), [Foursquare](https://developer.foursquare.com/docs/api-reference/venues/details/)).
- (Stretch) By implementing [travelling salesman problem (TSP)](https://en.wikipedia.org/wiki/Travelling_salesman_problem), how much time would it take to visit all of these places? ([Directions API](https://developers.google.com/maps/documentation/directions/start) from google will be helpful here). We will have to find travel time between all places (top 10). We can use [ortools](https://developers.google.com/optimization/routing/tsp) from Google to effectively implement TSP. These tools are very powerful and [easy to install](https://developers.google.com/optimization/install).

We have a lot of work so let's start right away. Enjoy!!


# Importing Libraries

In [33]:
import json
import requests
import pandas as pd
from functools import reduce

# Importing Dataframe from Yelp

In [29]:
#yelp can only return 50 bussiness at a time, banff has a total of 153 so well have to make 4 requests

api_key='YR8ml2OwoNPZ_T_78EEnJfvfSCuoe27242AgQhPinyHXImDP-tvk5PlUwxclYvSHVUnZYeTbcC-C2UzGNpEsvYB9YolbhlLBKZ5mnBN3aelSCsqf1oP3JwaRmE1MYXYx'
headers = {'Authorization': 'Bearer %s' % api_key}
url='https://api.yelp.com/v3/businesses/search'

# In the dictionary, term can take values like food, cafes etc lets just get all yelp has to offer
params0 = {'longitude':'-115.5708', 'latitude':'51.1784', 'radius':'30000', 'limit':'50'}
params1 = {'longitude':'-115.5708', 'latitude':'51.1784', 'radius':'30000', 'limit':'50', 'offset':'51'} #50-100
params2 = {'longitude':'-115.5708', 'latitude':'51.1784', 'radius':'30000', 'limit':'50', 'offset':'101'} #100-150
params3 = {'longitude':'-115.5708', 'latitude':'51.1784', 'radius':'30000', 'limit':'50', 'offset':'151'} #150-153

# Making a get request to the API
req0=requests.get(url, params=params0, headers=headers)
req1=requests.get(url, params=params1, headers=headers)
req2=requests.get(url, params=params2, headers=headers)
req3=requests.get(url, params=params3, headers=headers)

# proceed only if the status code is 200
print('The status code is {}'.format(req0.status_code))
print('The status code is {}'.format(req1.status_code))
print('The status code is {}'.format(req2.status_code))
print('The status code is {}'.format(req3.status_code))

# Storing the text from the response 
yelp_data0 = json.loads(req0.text)
yelp_data1 = json.loads(req1.text)
yelp_data2 = json.loads(req2.text)
yelp_data3 = json.loads(req3.text)

# Converting to Dataframes
yelp_df0 = pd.json_normalize(yelp_data0['businesses'])
yelp_df1 = pd.json_normalize(yelp_data1['businesses'])
yelp_df2 = pd.json_normalize(yelp_data2['businesses'])
yelp_df3 = pd.json_normalize(yelp_data3['businesses'])

The status code is 200
The status code is 200
The status code is 200
The status code is 200


In [95]:
#Compile all Dataframes
yelp_data_frames = [yelp_df0, yelp_df1, yelp_df2, yelp_df3]

In [96]:
#Merging Dataframes
yelp_df = reduce(lambda  left,right: pd.merge(left,right,on=['id'],
                                            how='outer'), yelp_data_frames)

In [97]:
yelp_df.head(5) 

Unnamed: 0,id,alias_x,name_x,image_url_x,is_closed_x,url_x,review_count_x,categories_x,rating_x,transactions_x,...,coordinates.longitude_y,location.address1_y,location.address2_y,location.address3_y,location.city_y,location.zip_code_y,location.country_y,location.state_y,location.display_address_y,price_y
0,xwT3xNz0Vxyr9Y1diOky9w,block-kitchen-bar-banff,Block Kitchen + Bar,https://s3-media3.fl.yelpcdn.com/bphoto/kHLpQz...,False,https://www.yelp.com/biz/block-kitchen-bar-ban...,445.0,"[{'alias': 'sandwiches', 'title': 'Sandwiches'...",4.5,[],...,,,,,,,,,,
1,mp_NFG4X7md3ejFXnroVLw,banff-national-park-banff,Banff National Park,https://s3-media3.fl.yelpcdn.com/bphoto/8H8OD1...,False,https://www.yelp.com/biz/banff-national-park-b...,137.0,"[{'alias': 'parks', 'title': 'Parks'}]",5.0,[],...,,,,,,,,,,
2,ybq66bJSDIhRz-2_0tGt_g,park-distillery-banff,Park Distillery,https://s3-media3.fl.yelpcdn.com/bphoto/OZp0W5...,False,https://www.yelp.com/biz/park-distillery-banff...,588.0,"[{'alias': 'newcanadian', 'title': 'Canadian (...",4.0,[],...,,,,,,,,,,
3,dCCAne5Kkxn8LBjrDVYwHg,bear-street-tavern-banff,Bear Street Tavern,https://s3-media3.fl.yelpcdn.com/bphoto/oVfpTX...,False,https://www.yelp.com/biz/bear-street-tavern-ba...,411.0,"[{'alias': 'pizza', 'title': 'Pizza'}, {'alias...",4.0,[],...,,,,,,,,,,
4,hKE5JbNakbXWRO0MN1nvpw,tooloulous-banff,Tooloulou's,https://s3-media4.fl.yelpcdn.com/bphoto/THQ1hw...,False,https://www.yelp.com/biz/tooloulous-banff?adju...,424.0,"[{'alias': 'cajun', 'title': 'Cajun/Creole'}, ...",4.0,[],...,,,,,,,,,,


In [98]:
list(yelp_df.columns) #There are duplicate columns

['id',
 'alias_x',
 'name_x',
 'image_url_x',
 'is_closed_x',
 'url_x',
 'review_count_x',
 'categories_x',
 'rating_x',
 'transactions_x',
 'price_x',
 'phone_x',
 'display_phone_x',
 'distance_x',
 'coordinates.latitude_x',
 'coordinates.longitude_x',
 'location.address1_x',
 'location.address2_x',
 'location.address3_x',
 'location.city_x',
 'location.zip_code_x',
 'location.country_x',
 'location.state_x',
 'location.display_address_x',
 'alias_y',
 'name_y',
 'image_url_y',
 'is_closed_y',
 'url_y',
 'review_count_y',
 'categories_y',
 'rating_y',
 'transactions_y',
 'price_y',
 'phone_y',
 'display_phone_y',
 'distance_y',
 'coordinates.latitude_y',
 'coordinates.longitude_y',
 'location.address1_y',
 'location.address2_y',
 'location.address3_y',
 'location.city_y',
 'location.zip_code_y',
 'location.country_y',
 'location.state_y',
 'location.display_address_y',
 'alias_x',
 'name_x',
 'image_url_x',
 'is_closed_x',
 'url_x',
 'review_count_x',
 'categories_x',
 'rating_x',
 't

In [99]:
# #removing duplicate columns
# yelp_df = yelp_df.loc[:,~yelp_df.columns.duplicated()]
# # yelp_df["name"] = DataFrame["column1"] + DataFrame["column2"]

In [100]:
# #combine x and y columns
# yelp_df["name"] = yelp_df["name_x"] + yelp_df["name_y"]
# yelp_df["review_count"] = yelp_df["review_count_x"] + yelp_df["review_count_y"]
# yelp_df["rating"] = yelp_df["rating_x"] + yelp_df["rating_y"]

In [101]:
yelp_df = yelp_df[['name_x','review_count_x','rating_x','name_y','review_count_y','rating_y']]
yelp_df

Unnamed: 0,name_x,name_x.1,review_count_x,review_count_x.1,rating_x,rating_x.1,name_y,name_y.1,review_count_y,review_count_y.1,rating_y,rating_y.1
0,Block Kitchen + Bar,,445.0,,4.5,,,,,,,
1,Banff National Park,,137.0,,5.0,,,,,,,
2,Park Distillery,,588.0,,4.0,,,,,,,
3,Bear Street Tavern,,411.0,,4.0,,,,,,,
4,Tooloulou's,,424.0,,4.0,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...
148,,Cascade Mountain,,1.0,,5.0,,,,,,
149,,Cave N Basin,,3.0,,2.5,,,,,,
150,,,,,,,,Northern Lights Alpine Kitchen,,5.0,,3.5
151,,,,,,,,Castle Pantry,,1.0,,5.0


In [107]:
#Tidying data
yelp_df = yelp_df.dropna(how='all')
yelp_df = yelp_df.dropna(how='all', axis=1)
yelp_df = yelp_df.fillna(value = 0)

yelp_df.tail()

Unnamed: 0,name_x,name_x.1,review_count_x,review_count_x.1,rating_x,rating_x.1,name_y,name_y.1,review_count_y,review_count_y.1,rating_y,rating_y.1
148,0,Cascade Mountain,0.0,1.0,0.0,5.0,0,0,0.0,0.0,0.0,0.0
149,0,Cave N Basin,0.0,3.0,0.0,2.5,0,0,0.0,0.0,0.0,0.0
150,0,0,0.0,0.0,0.0,0.0,0,Northern Lights Alpine Kitchen,0.0,5.0,0.0,3.5
151,0,0,0.0,0.0,0.0,0.0,0,Castle Pantry,0.0,1.0,0.0,5.0
152,0,0,0.0,0.0,0.0,0.0,0,Panorama Restaurant,0.0,1.0,0.0,4.0


In [116]:
yelp_df1 = yelp_df.loc[:,~yelp_df.columns.duplicated()]
yelp_df1.sort_values(by=['review_count_x'], axis=0)
yelp_df1.head(20)

Unnamed: 0,name_x,review_count_x,rating_x,name_y,review_count_y,rating_y
0,Block Kitchen + Bar,445.0,4.5,0,0.0,0.0
1,Banff National Park,137.0,5.0,0,0.0,0.0
2,Park Distillery,588.0,4.0,0,0.0,0.0
3,Bear Street Tavern,411.0,4.0,0,0.0,0.0
4,Tooloulou's,424.0,4.0,0,0.0,0.0
5,Wild Flour - Banff's Artisan Bakery Cafe,290.0,4.0,0,0.0,0.0
6,The Bison Restaurant,263.0,4.0,0,0.0,0.0
7,Indian Curry House,233.0,4.0,0,0.0,0.0
8,Ramen Arashi,177.0,4.5,0,0.0,0.0
9,Whitebark Cafe,162.0,4.5,0,0.0,0.0


In [124]:
yelp_df1 = yelp_df1.sort_values(['review_count_x'], ascending=False)
yelp_df1.head(10)

Unnamed: 0,name_x,review_count_x,rating_x,name_y,review_count_y,rating_y
2,Park Distillery,588.0,4.0,0,0.0,0.0
0,Block Kitchen + Bar,445.0,4.5,0,0.0,0.0
4,Tooloulou's,424.0,4.0,0,0.0,0.0
3,Bear Street Tavern,411.0,4.0,0,0.0,0.0
13,The Maple Leaf,301.0,4.0,0,0.0,0.0
5,Wild Flour - Banff's Artisan Bakery Cafe,290.0,4.0,0,0.0,0.0
24,Grizzly House Restaurant,272.0,3.5,0,0.0,0.0
6,The Bison Restaurant,263.0,4.0,0,0.0,0.0
15,Eddie Burger + Bar,254.0,4.0,0,0.0,0.0
14,Saltlik,250.0,4.0,0,0.0,0.0


# Importing data from Foursquare

In [71]:
import json, requests
url2 = 'https://api.foursquare.com/v2/venues/explore'

params = dict(
client_id='EFWV2G0ZXXI5SEHKD2BRP15QZHHDEE5ZXUVZ5OUT0AUHLJGH',
client_secret='JUDUS2HYIXJN4N5W1ZETPKTWPSRPNLQVXU3AGUXKRSR2WRYG',
v='20180323', query="trails",
near='Banff, AB', radius = '30000',
limit=1
)
resp = requests.get(url=url2, params=params)
foursquare_data = json.loads(resp.text)

foursquare_data


{'meta': {'code': 200, 'requestId': '6184b9859fd95162fef39c98'},
 'response': {'geocode': {'what': '',
   'where': 'banff ab',
   'center': {'lat': 51.17622, 'lng': -115.56982},
   'displayString': 'Banff, AB, Canada',
   'cc': 'CA',
   'geometry': {'bounds': {'ne': {'lat': 51.247463, 'lng': -115.490334},
     'sw': {'lat': 51.118393, 'lng': -115.732536}}},
   'slug': 'banff-alberta-canada',
   'longId': '72057594043820468'},
  'headerLocation': 'Banff',
  'headerFullLocation': 'Banff',
  'headerLocationGranularity': 'city',
  'query': 'trails',
  'totalResults': 27,
  'suggestedBounds': {'ne': {'lat': 51.17852581666324,
    'lng': -115.55841597139674},
   'sw': {'lat': 51.17582615687779, 'lng': -115.56282424262925}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4d9cf