## Capstone Project
#### Title: Find areas for an ethnic cousine
<hr>

#### Introduction

If somebody is going to open a restaurant with ethnic cousine, they should 
look for a location in areas, where many ethnic restaurants exist (which 
means that there is enough demand), but no restaurant of the cousine they 
are going to open.

For example, I'm looking for a place for a new thai restaurant. If I open 
it in area, where two other thai restaurants exist, I risk to meet strong 
competition. If I open it on the city suburbs without other thai restaurants 
nearby, my risk is low demand for such restaurant. However, if I open it in 
a neighborhood with several chineese, indian etc. restaurants, but no thai 
restaurant, the chance of success will be higher.

Therefore, my application will create a map of a city and mark all areas 
with more then N different etnic restaurants but no restaurant of selected 
ethnic cousine.

In [1]:
# Step 1. Initialize folium for maps
!conda install -c conda-forge folium=0.5.0 --yes
import folium

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    branca-0.4.0               |             py_0          26 KB  conda-forge
    altair-4.0.1               |             py_0         575 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.0 MB

The following NEW packages will be 

In [2]:
# Step 2. Import libraries

import pandas as pd
import numpy as np
import requests # library to handle requests
from bs4 import BeautifulSoup # library to decode HTML pages
from geopy.geocoders import Nominatim
import matplotlib.cm as cm # Matplotlib and associated plotting modules
import matplotlib.colors as colors

print("Libraries are ready!")

Libraries are ready!


In [3]:
# Step 3. Load Wikipedia page with areas

wiki_url = 'https://en.wikipedia.org/wiki/List_of_areas_of_London'
wiki_page = requests.get(wiki_url).text
soup = BeautifulSoup(wiki_page,'lxml')

print("Wiki page is loaded!")

Wiki page is loaded!


In [42]:
# Step 4. Load areas and their coordinates

areas_table = soup.find('table',{'class':'wikitable sortable'})
areas_lines = areas_table.findAll('tr')

col_Area = []
col_Latitude = []
col_Longitude = []

for tr in areas_lines:
    tds = tr.find_all('td')
    if not tds:
        continue

    # Load cells. Check if a link exists in column 5
    area = tds[0].text
    links = tds[5].find_all('a')
    if not links:
        continue

    # Extract coordinates from the link
    url = links[0]['href']
    coordinates = url.split('/en/')[1].split('_region')[0]
    latitude, longitude = [float(i) for i in coordinates.split(';')]
    col_Area.append(area)
    col_Latitude.append(latitude)
    col_Longitude.append(longitude)

# Create pandas
dfAreas = pd.DataFrame()
dfAreas['Area'] = col_Area
dfAreas['Latitude'] = col_Latitude
dfAreas['Longitude'] = col_Longitude
dfAreas.head()

Unnamed: 0,Area,Latitude,Longitude
0,Abbey Wood,51.48648,0.108592
1,Acton,51.510588,-0.264989
2,Addington,51.362931,-0.026374
3,Addiscombe,51.381622,-0.068682
4,Albany Park,51.434926,0.124921


In [58]:
# Step 5. Count restaurants in each area

CLIENT_ID = 'XHNVTFBKXSJXY3GRSW2J32Q03315DXPHYWDJKDL53SBU1VHH' # your Foursquare ID
CLIENT_SECRET = '5ATQKE5HEZTODFOBUUBXVJDAW5NR40WEGNFTPKR24QSOCFKX' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
RADIUS = 500
LIMIT = 100

MY_CATEGORY = 'Thai Restaurant'
SIMILAR_CATEGORY = ['Afghan Restaurant',
'African Restaurant',
'Cambodian Restaurant',
'Filipino Restaurant',
'Himalayan Restaurant',
'Japanese Restaurant',
'Korean Restaurant',
'Malaysian Restaurant',
'Mongolian Restaurant',
'Australian Restaurant',
'Austrian Restaurant',
'Belgian Restaurant',
'Brazilian Restaurant',
'Caribbean Restaurant',
'Chinese Restaurant',
'Czech Restaurant',
'Eastern European Restaurant',
'French Restaurant',
'German Restaurant',
'Greek Restaurant',
'Hawaiian Restaurant',
'Hungarian Restaurant',
'Indian Restaurant',
'Indonesian Restaurant',
'Italian Restaurant',
'Jewish Restaurant',
'Latin American Restaurant',
'Moroccan Restaurant',
'Mexican Restaurant',
'Persian Restaurant',
'Pakistani Restaurant',
'Polish Restaurant',
'Portuguese Restaurant',
'Russian Restaurant',
'Scandinavian Restaurant',
'Argentinian Restaurant',
'Peruvian Restaurant',
'Spanish Restaurant',
'Sri Lankan Restaurant',
'Swiss Restaurant',
'Turkish Restaurant',
'Ukrainian Restaurant']

nSameTotal = 0
nSimilarTotal = 0

col_SameRest = []
col_SimilarRest = []

for area, lat, lng in zip(dfAreas['Area'], dfAreas['Latitude'], dfAreas['Longitude']):

    nSameRest = 0
    nSimilarRest = 0

    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        lat, 
        lng,
        RADIUS, 
        LIMIT)
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    for venue in results:
        for category in venue['venue']['categories']:
            
            if category['name'] == MY_CATEGORY:
                nSameRest = nSameRest + 1
                nSameTotal = nSameTotal + 1
                continue
            if category['name'] in SIMILAR_CATEGORY:
                nSimilarRest = nSimilarRest + 1
                nSimilarTotal = nSimilarTotal + 1
                continue
        
    col_SameRest.append(nSameRest)
    col_SimilarRest.append(nSimilarRest)
    
dfAreas['Same'] = col_SameRest
dfAreas['Similar'] = col_SimilarRest

print('Total number of the same restaurants:', nSameTotal, "and other ethnic restaurants:", nSimilarTotal)

Total number of the same restaurants: 89 and other ethnic restaurants: 1327


In [65]:
# Step 6. Draw the map. Red dot = recommended area, Grey dot - other areas

MIN_SIMILAR = 10

# Create map of the city
address = 'London, UK'
geolocator = Nominatim(user_agent="city_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
city_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# Add markers to map
for lat, lng, area, same, similar in zip(dfAreas['Latitude'], dfAreas['Longitude'], dfAreas['Area'], dfAreas['Same'], dfAreas['Similar']):

    label = folium.Popup(area, parse_html=True)
    lbl_color = 'grey'
    lbl_opacity = 0.4
    if same == 0 and similar >= MIN_SIMILAR:
        lbl_color = 'red'
        lbl_opacity = 0.8
    
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=lbl_color,
        fill=True,
        fill_color=lbl_color,
        fill_opacity=lbl_opacity,
        parse_html=False).add_to(city_map)

# Draw the map
city_map