## Finding Best Location For Halal Restaurant
### Halal is Arabic for permissible. Halal food is that which adheres to Islamic law, as defined in the Qura'an. By ensuring food meets halal criteria, retailers and restaurants are making their products suitable for the Muslim population. Many restaurants have chosen to move to halal processes so they do not lose Muslim customers. A 2017 study estimated that 3.45 million Muslims were living in the United States, about 1.1 percent of the total U.S. population. Our stakeholder wants to open a halal restaurant in Virginia.
## Business Problem:
The issue at hand is selecting a suitable location for a halal restaurant. The client wants to select the perfect location.
Points to consider:
1. Client would like to stay within the state of Virginia.
2. Client would like to compare and see which area would be most suitable as much as cost effective.Â¶
3. Client would like to have the least amount of competition.

##### Let's start by importing all neccesary libraries

In [1]:
import pandas as pd
import numpy as np

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import requests
from bs4 import BeautifulSoup
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
#!conda install -c conda-forge folium

import os
import folium # map rendering library
# Matplotlib and associated plotting modules

import requests
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib as mp
import re
import csv
%matplotlib inline


print('Libraries imported.')

Libraries imported.


### Rental DataSet
##### I have gathered data and uploaded to my github repository, it contains rent amounts on VA neighborhoods for July 2019.

In [52]:
git = 'https://raw.githubusercontent.com/mahumz/DataScience_Capstone/master/Neighborhoods_VA.csv'
VA_Data = pd.read_csv(git)
VA_Data

Unnamed: 0,Column1,Column2,Column3,Column4
0,Neighborhood,City,State,2019-06
1,Bayside,Virginia Beach,VA,1395
2,Alexandria West,Alexandria,VA,1600
3,Landmark,Alexandria,VA,1850
4,Seatack,Virginia Beach,VA,1325
5,Ballston-Virginia Square,Arlington,VA,2350
6,Fairlington-Shirlington,Arlington,VA,2250
7,Sunset Hills,Reston,VA,1980
8,Lynnhaven Shores,Virginia Beach,VA,1700
9,Kempsville,Virginia Beach,VA,1300


### We will retrieve coordinates of each neighborhood.

In [32]:
#creating a dataframe for storing co-ordinates details.
coordinates = pd.DataFrame(columns=['Latitude','Longitude'])

# Using 'for loop' to get pass each Neighborhood name and get co-ordinates details through geocoding.
for row,neighborhood in VA_Data.iterrows():
    address = neighborhood['Neighborhood'] + ',' + neighborhood['City'] + ',' + neighborhood['State'] 
    try:
        geolocator = Nominatim(user_agent="my-data")
        location = geolocator.geocode(address)
        latitude = location.latitude
        longitude = location.longitude
        # appending latitude and longitude values on coordiantes dataframe.
        coordinates = coordinates.append({'Latitude':latitude,'Longitude':longitude},ignore_index=True)
    except:
        print(address)
    

coordinates

Unnamed: 0,Latitude,Longitude
0,36.902925,-76.13438
1,38.80511,-77.047023
2,38.80511,-77.047023
3,36.841538,-75.996319
4,38.881977,-77.111549
5,38.83677,-77.097439
6,38.954564,-77.350123
7,36.910703,-76.0816
8,36.826815,-76.160214


### Let's shrink our DataFrame to limit to neighborhood and rent info and add coordinates.

In [33]:
#Let's now take only Neighorhood and rent data for further testing.
VA_Data= VA_Data[['Neighborhood','2019-06']]

# Adding the VA_Data and coordinates in one new dataframe.
VA_Neighborhood = VA_Data.join(coordinates, how='outer')
VA_Neighborhood

Unnamed: 0,Neighborhood,2019-06,Latitude,Longitude
0,Bayside,1395,36.902925,-76.13438
1,Alexandria West,1600,38.80511,-77.047023
2,Landmark,1850,38.80511,-77.047023
3,Seatack,1325,36.841538,-75.996319
4,Ballston-Virginia Square,2350,38.881977,-77.111549
5,Fairlington-Shirlington,2250,38.83677,-77.097439
6,Sunset Hills,1980,38.954564,-77.350123
7,Lynnhaven Shores,1700,36.910703,-76.0816
8,Kempsville,1300,36.826815,-76.160214


#### Let's calculate rentScore and RatingScore from our formaula i.e finalScore = (rentScore)0.6+(ratingScore)0.4
#### We will use the following formula to calculate rentScore (maxrentofN-currentrentofN)/(maxrentofN-minrentofN). Maximum and minimum values can be retrieved using max and min built in functions.
#### Closer the finalScore value to 1, better choice of neighborhood for the client to decide. 

In [35]:
# Max value in '2019-06'
maxrentofN = max(VA_Neighborhood['2019-06'])
print('Max rent value of VA Neighborhood is $',maxrentofN)
minrentofN = min(VA_Neighborhood['2019-06'])
print('Min rent value VA Neighborhood is $',minrentofN)

Max rent value of VA Neighborhood is $ 2350
Min rent value VA Neighborhood is $ 1300


In [36]:
#create a new dataframe to store rent score.
rent_Score = pd.DataFrame(columns=['RentScore'])
for index, Neighborhood in sf_Neighborhood.iterrows():
    currentrentofN = Neighborhood['2019-06']
    rentscore =  (maxrentofN-currentrentofN)/(maxrentofN-minrentofN)
    rent_Score = rent_Score.append({'RentScore' : rentscore},ignore_index = True)
    
rent_Score

Unnamed: 0,RentScore
0,0.909524
1,0.714286
2,0.47619
3,0.97619
4,0.0
5,0.095238
6,0.352381
7,0.619048
8,1.0


In [37]:
#adding rent score information in VA_Neighborhood dataframe. 
VA_Neighborhood = VA_Neighborhood.join(rent_Score, how='outer')
VA_Neighborhood

Unnamed: 0,Neighborhood,2019-06,Latitude,Longitude,RentScore
0,Bayside,1395,36.902925,-76.13438,0.909524
1,Alexandria West,1600,38.80511,-77.047023,0.714286
2,Landmark,1850,38.80511,-77.047023,0.47619
3,Seatack,1325,36.841538,-75.996319,0.97619
4,Ballston-Virginia Square,2350,38.881977,-77.111549,0.0
5,Fairlington-Shirlington,2250,38.83677,-77.097439,0.095238
6,Sunset Hills,1980,38.954564,-77.350123,0.352381
7,Lynnhaven Shores,1700,36.910703,-76.0816,0.619048
8,Kempsville,1300,36.826815,-76.160214,1.0


### Calculate ratingScore use data for halal resturant only.
#### Co-ordinates or latitude & longitude can be obtained by passing the Neighborhood name value through geocoding. The restaurant details can be retrieved using search endpoint of foursquare location. I only Halal restaurant data, and in search endpoint there is a attribute called category id, through which one can specify category(like Indian or Mexican or even Halal Restaurant) foursquare has a defined categoryid which will help us to get the desired data. In this search response, we'll retrieve the venue id of all the Halal venues. And then pass these venue id's through venue_id endpoint to get rating of each Halal restaurant. Let's save the data in a dataframe, for further testing.

In [16]:
#set variables
LIMIT=100
CLIENT_ID = 'WMSWX5A0CUF43HBHPBM1BSNOMZ2X11WWBZQF0W5GSHJY5PIU'# Foursquare ID, note there is a daily call quota limit 
CLIENT_SECRET ='LFZOKRJBO4NJB3ARALFUCXTHMOSRDBR1OZRUNXQ1P1M1ZXQQ' # Foursquare Secret, note there is a daily call quota it
VERSION = '20180605' # Foursquare API version

In [38]:
# Halal restaurant categoryId on foursquare
categoryId= '52e81612bcbc57f1066b79ff'
# url willl search in 500 meters radius of latitude and longitude.
radius=500
api_endpoint = 'https://api.foursquare.com/'
#used below url to cache the request made to foursquare api
api_endpoint = 'http://cladiusfernando-eval-test.apigee.net/foursquare/'

#dataframe to save venue information.
venue_Details = pd.DataFrame(columns=['VenueNeighborhoodName','VenueName','VenueRating'])
#dataframe to save count of good rating information.
goodRating =pd.DataFrame(columns=['Neighborhood','GoodRatingRestaurant'])

# 'for loop' to get venue information search endpoint
for row,neighborhood in sf_Neighborhood.iterrows():
    venue_NeighborhoodName = neighborhood['Neighborhood']
    lat = neighborhood['Latitude']
    long = neighborhood['Longitude']
    url = '{}v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&categoryId={}'.format(
    api_endpoint,
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    lat, 
    long, 
    radius,
    categoryId)
    
    venues = requests.get(url).json()['response']['venues']
    goodrating = 0
    for venue in venues:
        venuen = venue['name']         #retrieving venue name neighborhood-wise
        VENUE_ID = venue['id']         #retrieving venue id neighborhood-wise  
        url2 = '{}v2/venues/{}?client_id={}&client_secret={}&v={}'.format(
        api_endpoint,
        VENUE_ID,
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION
        )
        ven = requests.get(url2).json()['response']['venue']
        
         #if statement to get rating values for each venue.
        rating = 0
        if 'rating' in ven:
            rating = ven['rating']
            venue_Details = venue_Details.append({'VenueNeighborhoodName': venue_NeighborhoodName,'VenueName' : venuen, 'VenueRating': rating}, ignore_index = True)
       
            #if statement to get good restaurant count neighborhood-wise.
            if rating >= 7.0:
                goodrating = goodrating+1
    #appending values in goodrating dataframe          
    goodRating =goodRating.append({'Neighborhood': venue_NeighborhoodName , 'GoodRatingRestaurant' : goodrating}, ignore_index = True)

goodRating

Unnamed: 0,Neighborhood,GoodRatingRestaurant
0,Bayside,0
1,Alexandria West,0
2,Landmark,0
3,Seatack,0
4,Ballston-Virginia Square,0
5,Fairlington-Shirlington,0
6,Sunset Hills,0
7,Lynnhaven Shores,0
8,Kempsville,0


### Venues with good rated Halal restaurants by neighborhood should be displayed.

In [40]:
venue_Details

Unnamed: 0,VenueNeighborhoodName,VenueName,VenueRating


There are no Halal restaurants with any ratings in these areas.

In [42]:
# Let's join/add good rating count restaurants on VA_Neighborhood dataframe.
VA_Neighborhood = VA_Neighborhood.merge(goodRating, how='outer')
VA_Neighborhood

Unnamed: 0,Neighborhood,2019-06,Latitude,Longitude,RentScore,GoodRatingRestaurant
0,Bayside,1395,36.902925,-76.13438,0.909524,0
1,Alexandria West,1600,38.80511,-77.047023,0.714286,0
2,Landmark,1850,38.80511,-77.047023,0.47619,0
3,Seatack,1325,36.841538,-75.996319,0.97619,0
4,Ballston-Virginia Square,2350,38.881977,-77.111549,0.0,0
5,Fairlington-Shirlington,2250,38.83677,-77.097439,0.095238,0
6,Sunset Hills,1980,38.954564,-77.350123,0.352381,0
7,Lynnhaven Shores,1700,36.910703,-76.0816,0.619048,0
8,Kempsville,1300,36.826815,-76.160214,1.0,0


#### Formula for calculating ratingScore is (maxgoodrest-currentrestofN)/(maxgoodrest-mingoodrest). Since we are now aware from above that there are no halal restauants with good or bad ratings it is obvious our ratingScore will be zero, but let's test it anyway.

In [45]:
maxgoodrest = max(VA_Neighborhood['GoodRatingRestaurant'])
print('Maximum good restaurant count',maxgoodrest)
mingoodrest = min(VA_Neighborhood['GoodRatingRestaurant'])
print('Minimum good restaurant count',mingoodrest)

rating_Score = pd.DataFrame(columns=['RatingScore'])
for index, Neighborhood in VA_Neighborhood.iterrows():
    currentrentofN = Neighborhood['GoodRatingRestaurant']
    ratingScore =   (maxgoodrest-currentrentofN)/(maxgoodrest-mingoodrest) if maxgoodrest-mingoodrest != 0 else 0
    rating_Score = rating_Score.append({'RatingScore' : ratingScore},ignore_index = True)
    
VA_Neighborhood = VA_Neighborhood.join(rating_Score, how='outer')
VA_Neighborhood

Maximum good restaurant count 0
Minimum good restaurant count 0


Unnamed: 0,Neighborhood,2019-06,Latitude,Longitude,RentScore,GoodRatingRestaurant,RatingScore
0,Bayside,1395,36.902925,-76.13438,0.909524,0,0
1,Alexandria West,1600,38.80511,-77.047023,0.714286,0,0
2,Landmark,1850,38.80511,-77.047023,0.47619,0,0
3,Seatack,1325,36.841538,-75.996319,0.97619,0,0
4,Ballston-Virginia Square,2350,38.881977,-77.111549,0.0,0,0
5,Fairlington-Shirlington,2250,38.83677,-77.097439,0.095238,0,0
6,Sunset Hills,1980,38.954564,-77.350123,0.352381,0,0
7,Lynnhaven Shores,1700,36.910703,-76.0816,0.619048,0,0
8,Kempsville,1300,36.826815,-76.160214,1.0,0,0


### Now that we have rent score and rating scores, let's calculate the Final Score to decide which neighborhoods  are suitable for a new Halal Restaurant.

In [47]:
final_Score = pd.DataFrame(columns=['FinalScore'])
for index, Neighborhood in VA_Neighborhood.iterrows():
    rentScore = Neighborhood['RentScore']
    ratingScore = Neighborhood['RatingScore']
    finalScore = (rentScore)*0.6+(ratingScore)*0.4
    final_Score = final_Score.append({'FinalScore' : finalScore},ignore_index = True)
    
VA_Neighborhood = VA_Neighborhood.join(final_Score, how='outer')
VA_Neighborhood

Unnamed: 0,Neighborhood,2019-06,Latitude,Longitude,RentScore,GoodRatingRestaurant,RatingScore,FinalScore
0,Bayside,1395,36.902925,-76.13438,0.909524,0,0,0.545714
1,Alexandria West,1600,38.80511,-77.047023,0.714286,0,0,0.428571
2,Landmark,1850,38.80511,-77.047023,0.47619,0,0,0.285714
3,Seatack,1325,36.841538,-75.996319,0.97619,0,0,0.585714
4,Ballston-Virginia Square,2350,38.881977,-77.111549,0.0,0,0,0.0
5,Fairlington-Shirlington,2250,38.83677,-77.097439,0.095238,0,0,0.057143
6,Sunset Hills,1980,38.954564,-77.350123,0.352381,0,0,0.211429
7,Lynnhaven Shores,1700,36.910703,-76.0816,0.619048,0,0,0.371429
8,Kempsville,1300,36.826815,-76.160214,1.0,0,0,0.6


### Results

In [51]:
Results = VA_Neighborhood[(VA_Neighborhood['FinalScore'] <= 0.8)].sort_values('FinalScore', ascending=False)
Results

Unnamed: 0,Neighborhood,2019-06,Latitude,Longitude,RentScore,GoodRatingRestaurant,RatingScore,FinalScore
8,Kempsville,1300,36.826815,-76.160214,1.0,0,0,0.6
3,Seatack,1325,36.841538,-75.996319,0.97619,0,0,0.585714
0,Bayside,1395,36.902925,-76.13438,0.909524,0,0,0.545714
1,Alexandria West,1600,38.80511,-77.047023,0.714286,0,0,0.428571
7,Lynnhaven Shores,1700,36.910703,-76.0816,0.619048,0,0,0.371429
2,Landmark,1850,38.80511,-77.047023,0.47619,0,0,0.285714
6,Sunset Hills,1980,38.954564,-77.350123,0.352381,0,0,0.211429
5,Fairlington-Shirlington,2250,38.83677,-77.097439,0.095238,0,0,0.057143
4,Ballston-Virginia Square,2350,38.881977,-77.111549,0.0,0,0,0.0


#### Above are our results for neighborhoods best suited in Virginia to open a new Halal restaurant! Since there are no Halal restaurants in these areas the only factor to consider is rentScore. So our results show all locations according to cost effective order.