#### Capstone - # Capstone Project - Find the best location to open a car wash center (Week 2)
### Applied Data Science Capstone by IBM/Coursera

1. Business Problem 
2. Data 
3. Methodology
4. Analysis
5. Results and discussion 
6. Conclusion 

#### 1.Introduction: Business problem 

Starting a car wash business can be a interesting, and profitable business. With the right location, and top-notch service, you can draw in numerous customers who need their cars washed quickly, efficiently, and at a good price. Considering this, In this project we will find a best location to open a Car wash center. This project is targeted for the stackholders/Business people who wants to open a **Car wash center**.
 

 1. **Find the existing car wash centers in the entire toronto city**. 
 2. **Find the boroughs which has dense population and less car wash centers**.
 3. **Analyse the nearby venues of the existing car wash centers to find which is the better location for new car wash center**.
 

#### 2.Data

a) Toronto city neighborhood information is scraped from this link
 https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
 
b) The boroughs which has more **population** is gathered from https://en.wikipedia.org/wiki/Demographics_of_Toronto.

c) Geographical information is required to explore the neighborhoods. This is information taken from **Geospatial_Coordinates.csv**

d) Using Foursquare API calls to get the nearest venues**(Car wash center)** of neighborhoods.

#### Install the required packages 

In [128]:
!pip install beautifulsoup4



#### Import the packages 

In [129]:
import requests               # To get the web page (response)
import pandas as pd           # For Data analysis
import matplotlib.pyplot as plt
from bs4 import BeautifulSoup # For Web scraping
from pandas.io.json import json_normalize #to convert json format to data frame

#### Web scrapping, Data wrangling 

##### Get the neighborhoods of toronto 

In [130]:
def get_toronto_neighborhoods(): 
    url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
    page = requests.get(url)
    soup = BeautifulSoup(page.text, 'html.parser')
    neighborhoods = soup.find('table', attrs={'class': 'wikitable sortable'})
    
    rows = neighborhoods.find_all('tr')
    data = []

    cols = rows[0].find_all('th')
    headers = [ele.text.strip() for ele in cols]

    for row in rows[1:]:
        cols = row.find_all('td')
        cols = [ele.text.strip() for ele in cols]
        data.append([ele for ele in cols if ele])
        
    neighborhoods = pd.DataFrame(data, columns=headers)   
    neighborhoods['Postcode'] = neighborhoods['Postcode'].astype(str)
    print(neighborhoods.dtypes)
    neighborhoods = cleanNeighborhoods(neighborhoods)
    neighborhoods = mergeAllDuplicatePostalCodes(neighborhoods)
    return neighborhoods

#### Clean the neighbourhoods. 
##### 1.Remove the boroughs which have value "Not assigned".
##### 2.Remove the neighbourhoods for which boroughs are "Not assigned".

In [131]:
#Clean the neighborhood data which was received from the web page
def cleanNeighborhoods(neighborhoods):
    neighborhoods = neighborhoods[neighborhoods['Borough']!="Not assigned"].reset_index(drop=True)
    for ind in range(len(neighborhoods)):
        if (neighborhoods.loc[ind, "Neighbourhood"] == "Not assigned"):
            neighborhoods.at[ind, 'Neighbourhood'] = neighborhoods.loc[ind, "Borough"] 
            
    return neighborhoods

#### Merge the boroughs which has same postal code by combining the neighbourhoods 

In [1]:
#merge all duplicate postal code, because the lotitude and longitude values are same for those neighborhoods
def mergeAllDuplicatePostalCodes(neighborhoods):
    dict = {} #postal code, neighborhood list
    
    #Store the neighborhoods of unique postal code in dictionary
    for ind in range(len(neighborhoods)):
        if (neighborhoods.loc[ind, "Postcode"] not in dict.keys()):
            dict[neighborhoods.loc[ind, "Postcode"]] = []
        dict[neighborhoods.loc[ind, "Postcode"]].append(neighborhoods.loc[ind, "Neighbourhood"])
    
    #Drop the duplicate postal code rows and keep the first row in the duplicates
    neighborhoods.drop_duplicates(subset="Postcode", keep='first', inplace = True)
    neighborhoods.reset_index(inplace=True, drop=True)

    for k, v in dict.items() :
        combined_nws = ','.join(v)
        ind = neighborhoods.index[neighborhoods['Postcode']==k]
        neighborhoods.at[ind, 'Neighbourhood'] = combined_nws # needs to be replaced with iat/at 
  
    return neighborhoods

In [133]:
neighborhoods = get_toronto_neighborhoods()

Postcode         object
Borough          object
Neighbourhood    object
dtype: object


#### Read Geospatial_Coordinates.csv
##### 1. Read the latitude and longitude information from the Geospatial_Coordinates.csv
##### 2. Add the latitude and longitude information to the neighbourhoods data frame.

In [134]:
def addLongLatToNeighborhoods(neighborhoods):
    geo_df = pd.read_csv("Geospatial_Coordinates.csv")
    longlat_dict = {}
    lat=[]
    long=[]
    for code, lt, ln in zip(geo_df['Postal Code'], geo_df['Latitude'], geo_df['Longitude']):
        longlat_dict[code] = [lt,ln]

    for code in neighborhoods['Postcode']:
        lonlat = longlat_dict[code]
        lat.append(lonlat[0])
        long.append(lonlat[1])
    neighborhoods['Latitude'] = lat 
    neighborhoods['Longitude'] = long
    return neighborhoods

In [135]:
neighborhoods = addLongLatToNeighborhoods(neighborhoods)

### View the existing car wash centers using folium map 

#### Foursquare credentials 

In [136]:
CLIENT_ID = 'HQFIE2LGCMTESBTMPHN23QEPWMUYLRZDZ3AMZD4FWF5Z3QJA' # your Foursquare ID
CLIENT_SECRET = 'FAYQ2B50PC42VERLNA1MDG0HWDKMKRPC4ZAVJREEBDD13HEM' # your Foursquare Secret
VERSION = '20191210' # Foursquare API version

#### Get the existing **Car wash centers** by neighbourhoods

In [137]:
def get_existing_carwash_centers_by_search(neighborhoods):
    radius = 10000 #1 km
    limit =100 
    car_wash = '4f04ae1f2fb6e1c99f3db0ba'
    
    venues_list =[]
    for borough, name, lat, long in zip(neighborhoods['Borough'],neighborhoods['Neighbourhood'], neighborhoods['Latitude'], neighborhoods['Longitude']):
        url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&limit={}&radius={}&categoryId={}'.format(
                CLIENT_ID,CLIENT_SECRET, lat, long, VERSION, limit, radius, car_wash)
        
        response = requests.get(url).json()
        response = response['response']['venues']
        
        venues_list.append([(borough,name, venue['name'], 
                          venue['id'],
                          venue['categories'][0]['name'], 
                          venue['location']['lat'],
                          venue['location']['lng']) for venue in response])
    grocery_stores = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    grocery_stores.columns = ['Borough', 'Neighbourhood', 'Venue', 'Venue_id','Category', 'Location.lat', 'Location.lng']
    return grocery_stores

#### Get the near by venues of each car wash center of toronto city, And analyse where exactly these car centers are located and find the commonality of them. 

In [None]:
def getNearbyVenues(vid, names, latitudes, longitudes, radius=500):
    venues_list=[]
    for vid,name, lat, lng in zip(vid, names, latitudes, longitudes):
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            30)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        # return only relevant information for each nearby venue
        venues_list.append([(vid,
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Venue_id', 
                             'Venue',
                             'Venue Latitude', 
                             'Venue Longitude', 
                             'Nearby Venue',
                             'Nearby Venue Category']
    
    return(nearby_venues)

In [138]:
exsiting_carwash_centers = get_existing_grocery_stores_by_search(neighborhoods) 
exsiting_carwash_centers

Unnamed: 0,Borough,Neighbourhood,Venue,Venue_id,Category,Location.lat,Location.lng
0,North York,Parkwoods,Petro-Canada,4c361e9118e72d7fca4714f5,Gas Station,43.757950,-79.315187
1,North York,Parkwoods,Petro-Canada,4b940518f964a520dc6134e3,Gas Station,43.800817,-79.296738
2,North York,Parkwoods,Canadian Tire Gas+,4db48b6f4df05e5aaae21a1f,Gas Station,43.770526,-79.373126
3,North York,Parkwoods,Petro-Canada,4d3f1dc83ec9a35df6916081,Gas Station,43.775024,-79.333001
4,North York,Parkwoods,Petro-Canada,4c1aab3ee9c4ef3bad7f45aa,Gas Station,43.819847,-79.326214
...,...,...,...,...,...,...,...
4391,Etobicoke,"Kingsway Park South West,Mimico NW,The Queensw...",Petro Canada Car Wash,58cc148514fb41604232c278,Car Wash,43.616960,-79.546160
4392,Etobicoke,"Kingsway Park South West,Mimico NW,The Queensw...",Weston Coin Car Wash,4becab5a8bbcc9283ca98cb1,Car Wash,43.708600,-79.534276
4393,Etobicoke,"Kingsway Park South West,Mimico NW,The Queensw...",Original Six Car Wash,4d56ddd6fb65236a88390bb4,Car Wash,43.603296,-79.518927
4394,Etobicoke,"Kingsway Park South West,Mimico NW,The Queensw...",Popular Car Wash & Detailing - FREE VACUUMS,4e2ad147d22d3f83c887b668,Car Wash,43.619797,-79.562225


In [139]:
exsiting_carwash_centers.shape

(4396, 7)

#### Remove the duplicates from the existing car wash centers data frame and create the unique car wash centers data frame. 

In [140]:
exsiting_carwash_centers_unqiue = exsiting_carwash_centers.drop_duplicates(subset=['Venue_id'], keep ='first')
exsiting_carwash_centers_unqiue.reset_index(drop=True)
exsiting_carwash_centers_unqiue

Unnamed: 0,Borough,Neighbourhood,Venue,Venue_id,Category,Location.lat,Location.lng
0,North York,Parkwoods,Petro-Canada,4c361e9118e72d7fca4714f5,Gas Station,43.757950,-79.315187
1,North York,Parkwoods,Petro-Canada,4b940518f964a520dc6134e3,Gas Station,43.800817,-79.296738
2,North York,Parkwoods,Canadian Tire Gas+,4db48b6f4df05e5aaae21a1f,Gas Station,43.770526,-79.373126
3,North York,Parkwoods,Petro-Canada,4d3f1dc83ec9a35df6916081,Gas Station,43.775024,-79.333001
4,North York,Parkwoods,Petro-Canada,4c1aab3ee9c4ef3bad7f45aa,Gas Station,43.819847,-79.326214
...,...,...,...,...,...,...,...
3828,Etobicoke,"Albion Gardens,Beaumond Heights,Humbergate,Jam...",In N Out Car Wash,4c0bb886340720a160798893,Car Wash,43.754805,-79.709485
3974,Etobicoke,"Alderwood,Long Branch",Petro-Pass Truck Stop,57f535b5498e072826b23699,Gas Station,43.660486,-79.646277
3988,Etobicoke,"Alderwood,Long Branch",Ovation Car Wash,4fa7423fe4b0e7038ca31501,Car Wash,43.587890,-79.641162
4006,Etobicoke,"Alderwood,Long Branch",Shell Car Wash,4f391b03e4b02a70e2f845cc,Car Wash,43.629402,-79.668731
