<h1>Maps can not be seen on Github.<br />You can check the <a href="https://eu-gb.dataplatform.cloud.ibm.com/analytics/notebooks/v2/3d642503-55d2-4707-a269-1b848e2b6031/view?access_token=6c7584512d5ff60808afd00ae896cbf5b8f711e2e0b075f75001609af8d7d1bb">link for Watson</a> to take a look at the maps I generated for this project.</h1>

<h1><strong>Introduction</strong></h1>
<h2>Background Information</h2>
<p>Recognised as the capital of Scotland since at least the 15th century, Edinburgh is the seat of the Scottish Government, the Scottish Parliament and the highest courts in Scotland. The city's Palace of Holyroodhouse is the official residence of the monarch in Scotland. The city has long been a centre of education, particularly in the fields of medicine, Scots law, literature, philosophy, the sciences and engineering. It is the second-largest financial centre in the United Kingdom, and the city's historical and cultural attractions have made it the UK's second-most visited tourist destination attracting 4.9 million visits, including 2.4 million from overseas in 2018.</p>
<p>Edinburgh's official population estimates are 488,050 (mid-2016) for the Edinburgh locality, 518,500 (mid-2019) for the City of Edinburgh council area, and 1,339,380 (2014) for the wider city region. Edinburgh lies at the heart of the Edinburgh and South East Scotland city region comprising East Lothian, Edinburgh, Fife, Midlothian, Scottish Borders and West Lothian.</p>
<p>There are several business opportunities around Edinburgh, but the food-and-beverage (F&amp;B) sector has long been an attractive target for investors. In this context, coffee shop businesses have been booming F&amp;B business in Edinburgh. The study, from the Local Data Company, found that Scotland overall has seen a 7.4 per cent increase in the number of coffee outlets, resulting in a six per cent share of the UK&rsquo;s dedicated cafes. The study reflected a significant rise in number of outlets and domestic coffee consumptions in recent years and included only outlets which are dedicated specifically to coffee, including independents and chains such as Costa and Starbucks.</p>
<h2>Problem Statement</h2>
<p>Karabal is a company which is specialized in third-wave coffee equipment, roasting techniques and distribution of various types of coffee beans from around the world. With the aforementioned prospect, they are interested to explore coffee shop business opportunities in Edinburgh. This data science project is thus carried out to help them answer the following question: Which regions of Edinburgh are strategic for their operations and where they can find their potential customers.</p>
<h2><strong>Data</strong></h2>
<p>In order to provide the necessary information, the following data sets are required:</p>
<ul>
<li>Zone information of Edinburgh. (Obtained from <a href="https://martinjc.github.io/UK-GeoJSON/">https://martinjc.github.io/UK-GeoJSON/</a>, extracted from JSON file.)</li>
<li>Coordinates of zones. (Obtained through geocoder.)</li>
<li>Data of nearby coffee shops and cafes around each zone was obtained from &nbsp;<strong>Foursquare API.</strong></li>
</ul>
<p>The data will be used to visualize the distribution of coffee and cafe businesses around Edinburgh to give the potential investors an insight.</p>

<h2>Methodology</h2>
<p>This section represents the main components of the report. It starts with extraction of Edinburgh&rsquo;s zone data from a JSON file &nbsp;(web scraping) and then location data (latitude and longitude) of those zones. After that, we explore venues within Edinburgh&rsquo;s zones through Foursquare API.</p>
<p>One-hot encoding is used to analyze the most common venues in each of the zones. Finally, all the collected data helped us to visualize coffee shops and cafe businesses around zones of Edinburgh to give insights to potential investors.</p>
<h2>The following cell contains all the necessary Python libraries.</h2>

In [3]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

! pip install geocoder
import geocoder as geocoder
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

! pip install folium
import folium # map rendering library

print('Libraries imported.')

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[K     |████████████████████████████████| 98 kB 14.2 MB/s eta 0:00:01
[?25hCollecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting folium
  Downloading folium-0.12.1-py2.py3-none-any.whl (94 kB)
[K     |████████████████████████████████| 94 kB 996 kB/s eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.1
Libraries imported.


<h2>We extract zone names of Edinburgh from a  <a href="https://raw.githubusercontent.com/martinjc/UK-GeoJSON/master/json/statistical/sco/idz_by_lad/S12000036.json">JSON file</a>.&nbsp;</h2>

In [4]:
url = 'https://raw.githubusercontent.com/martinjc/UK-GeoJSON/master/json/statistical/sco/idz_by_lad/S12000036.json'

data = json.loads(requests.get("https://raw.githubusercontent.com/martinjc/UK-GeoJSON/master/json/statistical/sco/idz_by_lad/S12000036.json").text)

df = pd.DataFrame(data["features"])
liste = df['properties'].values.tolist()
liste

[{'IZ_CODE': 'S02000330',
  'IZ_NAME': 'Balerno',
  'STDAREA_HA': 4810.563479,
  'Shape_Leng': 46774.0138662,
  'Shape_Area': 48096010.4342},
 {'IZ_CODE': 'S02000331',
  'IZ_NAME': 'Bonaly and Pentlands',
  'STDAREA_HA': 2016.463533,
  'Shape_Leng': 27083.3016086,
  'Shape_Area': 20153941.918},
 {'IZ_CODE': 'S02000332',
  'IZ_NAME': 'South East Bypass',
  'STDAREA_HA': 648.52097,
  'Shape_Leng': 20234.4503264,
  'Shape_Area': 6479605.99718},
 {'IZ_CODE': 'S02000333',
  'IZ_NAME': 'Gracemount, Southouse and Burdiehouse',
  'STDAREA_HA': 115.75742,
  'Shape_Leng': 7044.62009187,
  'Shape_Area': 1121372.01068},
 {'IZ_CODE': 'S02000334',
  'IZ_NAME': 'Mortonhall',
  'STDAREA_HA': 171.284239,
  'Shape_Leng': 7188.59443301,
  'Shape_Area': 1706759.85172},
 {'IZ_CODE': 'S02000335',
  'IZ_NAME': 'Fairmilehead',
  'STDAREA_HA': 248.243334,
  'Shape_Leng': 11720.7395023,
  'Shape_Area': 2456465.12855},
 {'IZ_CODE': 'S02000336',
  'IZ_NAME': 'Comiston and Swanston',
  'STDAREA_HA': 131.841032,
  

In [5]:
areas = []
for dic in liste:
    areas.append(dic['IZ_NAME'])
print(areas)

['Balerno', 'Bonaly and Pentlands', 'South East Bypass', 'Gracemount, Southouse and Burdiehouse', 'Mortonhall', 'Fairmilehead', 'Comiston and Swanston', 'Currie East', 'Hyvots and Gilmerton Dykes', 'Currie West', 'Baberton and Juniper Green', 'Oxgangs and Firrhill', 'Liberton East', 'Ferniehill,  South Moredun and Craigour', 'Colinton and Kingsknowe', 'Moredun', 'Liberton West', 'Braids', 'Clovenstone and Drumbryden', 'The Inch', 'Calders', 'Craiglockhart', 'Craighouse and South Morningside', 'Parkhead', 'Greendykes and Niddrie Mains', 'Blackford', 'Longstone and Saughton Mains', 'Morningside', 'Broomhouse and Sighthill', 'Hutchison and Moat', 'Merchiston and Greenhill', 'Grange', 'Niddrie', 'South Gyle', 'Prestonfield', 'Shandon', 'Forrester Park and Broomhall', 'Stenhouse', 'Jewel, Brunstane and Newcraighall', 'Ratho, Gogarburn and Airport', 'Dalkeith Rd', 'Marchmont West', 'Carrick Knowe', 'Gorgie West', 'Marchmont East and Sciennes', 'Polwarth', 'Bruntsfield', 'Gorgie East', 'Bingh

<h2>We started to populate our dataframe with zone names (Areas).</h2>

In [6]:
df = pd.DataFrame(areas, columns =['Areas'])
df

Unnamed: 0,Areas
0,Balerno
1,Bonaly and Pentlands
2,South East Bypass
3,"Gracemount, Southouse and Burdiehouse"
4,Mortonhall
5,Fairmilehead
6,Comiston and Swanston
7,Currie East
8,Hyvots and Gilmerton Dykes
9,Currie West


<h2>Let's get location data of our areas from geocoder library.</h2>

In [7]:
def getLatLong(row):
    #print('post :{}'.format(row[:]))
    #print('neigh :{}'.format(row[1]))
    # initialize your variable to None
    lat_lng_coords = None
    search_query = '{}, Edinburgh,UK'.format(row)
    # loop until you get the coordinates
    try:
        while(lat_lng_coords is None):
            #g = geocoder.here(search_query,app_id=app_id,app_code=app_code)
            g = geocoder.arcgis(search_query)
            lat_lng_coords = g.latlng
            #print('FIRST')
    except IndexError:
        latitude = 0.0
        longitude = 0.0
        print('BACKUP')
        return [latitude,longitude]

    latitude = lat_lng_coords[0]
    longitude = lat_lng_coords[1]
    print(latitude, longitude)
    return [latitude, longitude]

In [8]:
coords_list = df['Areas'].apply(getLatLong).tolist()

55.95333950100007 -3.1891068959999416
55.90368544700004 -3.264008652999962
55.93017503662509 -3.1308299317633868
55.89559002588297 -3.1616199310690263
55.899734168000066 -3.17582131599994
55.90115642200004 -3.2034376379999685
55.907725175768114 -3.209938005970912
55.925299208955934 -3.3126655982126163
55.90369004565463 -3.139319877670772
55.91259000000008 -3.3145599999999718
55.90873674800008 -3.272587567999949
55.911240044939916 -3.2313499055948682
55.91481935200005 -3.1617513729999587
55.91358006275891 -3.137239908397788
55.91595046200007 -3.26402388799994
55.913701228000036 -3.136184081999943
55.91481935200005 -3.1617513729999587
55.91742736300006 -3.2036223029999746
55.91655310226167 -3.278743846031478
55.91897906500003 -3.150323404999938
55.92439000000007 -3.2778299999999376
55.92182941400006 -3.2381469619999734
55.9220700505758 -3.2232299368936594
55.92316174600006 -3.2713961419999578
55.9322900212986 -3.1300099301757314
55.92635099100005 -3.186839066999937
55.927310039255445 -3.

<h2>We add location data to our dataframe.</h2>

In [9]:
df[['Latitude','Longitude']]=pd.DataFrame(coords_list,columns=['Latitude', 'Longitude'])
df.head()

Unnamed: 0,Areas,Latitude,Longitude
0,Balerno,55.95334,-3.189107
1,Bonaly and Pentlands,55.903685,-3.264009
2,South East Bypass,55.930175,-3.13083
3,"Gracemount, Southouse and Burdiehouse",55.89559,-3.16162
4,Mortonhall,55.899734,-3.175821


<h2>Let's check our zones on the map.</h2>

In [10]:
address = 'Edinburgh'

geolocator = Nominatim(user_agent="ldn_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Edinburgh are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Edinburgh are 55.9533456, -3.1883749.


In [11]:
import folium

map_edinburgh = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, Areas in zip(df['Latitude'], df['Longitude'], df['Areas']):
    label = '{}'.format(Areas)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=2,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_edinburgh)  
    
map_edinburgh

<h2>Let's get the venues close to our zones through FourSquare API.</h2>

In [12]:
# The code was removed by Watson Studio for sharing.

In [13]:
def getNearbyVenues(names, latitudes, longitudes, radius=1500, limit=200):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [14]:
edi_venues = getNearbyVenues(names=df['Areas'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude'])

Balerno
Bonaly and Pentlands
South East Bypass
Gracemount, Southouse and Burdiehouse
Mortonhall
Fairmilehead
Comiston and Swanston
Currie East
Hyvots and Gilmerton Dykes
Currie West
Baberton and Juniper Green
Oxgangs and Firrhill
Liberton East
Ferniehill,  South Moredun and Craigour
Colinton and Kingsknowe
Moredun
Liberton West
Braids
Clovenstone and Drumbryden
The Inch
Calders
Craiglockhart
Craighouse and South Morningside
Parkhead
Greendykes and Niddrie Mains
Blackford
Longstone and Saughton Mains
Morningside
Broomhouse and Sighthill
Hutchison and Moat
Merchiston and Greenhill
Grange
Niddrie
South Gyle
Prestonfield
Shandon
Forrester Park and Broomhall
Stenhouse
Jewel, Brunstane and Newcraighall
Ratho, Gogarburn and Airport
Dalkeith Rd
Marchmont West
Carrick Knowe
Gorgie West
Marchmont East and Sciennes
Polwarth
Bruntsfield
Gorgie East
Bingham, Magdalene and the Christians
Meadows
Balgreen and Roseburn
Dalry and Fountainbridge
Willowbrae and Duddingston Village
Corstorphine
Duddingsto

<h2>Since some of our zones close to each other, there might be duplicate data. Let's find and remove them.</h2>

In [15]:
edi_venues_clean = edi_venues.drop_duplicates(subset=['Venue','Venue Latitude','Venue Longitude'])

In [16]:
print(edi_venues.shape)
print(edi_venues_clean.shape)

(5129, 7)
(1208, 7)


<h2>Checking the number of venues for each zone.</h2>

In [17]:
edi_venues_clean.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Abbeyhill, Meadowbank and Marionville",34,34,34,34,34,34
Baberton and Juniper Green,7,7,7,7,7,7
Balerno,100,100,100,100,100,100
Balgreen and Roseburn,23,23,23,23,23,23
Barnton and Cammo,7,7,7,7,7,7
"Bingham, Magdalene and the Christians",26,26,26,26,26,26
Blackford,26,26,26,26,26,26
Blackhall,5,5,5,5,5,5
Bonaly and Pentlands,4,4,4,4,4,4
Bonnington and Pilrig,2,2,2,2,2,2


<h2>Time to explore most common venues of each zone.</h2>

In [18]:
# one hot encoding
edi_onehot = pd.get_dummies(edi_venues_clean[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
edi_onehot['Neighborhood'] = edi_venues_clean['Neighborhood'] 

# move neighborhood column to the first column
cols=list(edi_onehot.columns.values)
cols.pop(cols.index('Neighborhood'))
edi_onehot=edi_onehot[['Neighborhood']+cols]

edi_onehot.head()

Unnamed: 0,Neighborhood,African Restaurant,American Restaurant,Aquarium,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Bagel Shop,Bakery,Bar,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Bike Shop,Bike Trail,Bistro,Boat or Ferry,Bookstore,Botanical Garden,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bridge,Burger Joint,Burrito Place,Bus Stop,Butcher,Cafeteria,Café,Campground,Canal,Candy Store,Casino,Castle,Cheese Shop,Chinese Restaurant,Chocolate Shop,Church,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,College Gym,Comedy Club,Concert Hall,Convenience Store,Cosmetics Shop,Cupcake Shop,Curling Ice,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Doner Restaurant,Donut Shop,Eastern European Restaurant,Electronics Store,English Restaurant,Escape Room,Event Service,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Food & Drink Shop,Food Court,Food Truck,Forest,Fountain,French Restaurant,Furniture / Home Store,Garden,Gas Station,Gastropub,Gay Bar,Gelato Shop,Gift Shop,Go Kart Track,Golf Course,Golf Driving Range,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Hardware Store,Hill,Historic Site,Home Service,Hostel,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irish Pub,Italian Restaurant,Japanese Restaurant,Juice Bar,Korean Restaurant,Lake,Latin American Restaurant,Lawyer,Lebanese Restaurant,Light Rail Station,Liquor Store,Malay Restaurant,Market,Martial Arts School,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Mini Golf,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Motorcycle Shop,Mountain,Movie Theater,Multiplex,Museum,Nature Preserve,Newsagent,Optical Shop,Organic Grocery,Outdoor Sculpture,Outdoor Supply Store,Outlet Store,Palace,Paper / Office Supplies Store,Park,Parking,Performing Arts Venue,Pet Store,Pharmacy,Piano Bar,Pier,Pizza Place,Planetarium,Platform,Playground,Plaza,Pool,Pool Hall,Portuguese Restaurant,Pub,Ramen Restaurant,Recreation Center,Rental Car Location,Reservoir,Rest Area,Restaurant,Road,Rugby Pitch,Rugby Stadium,Sandwich Place,Scenic Lookout,School,Science Museum,Scottish Restaurant,Sculpture Garden,Seafood Restaurant,Shopping Mall,Shopping Plaza,Skate Park,Skating Rink,Smoothie Shop,Soccer Field,Soccer Stadium,Soup Place,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Steakhouse,Street Food Gathering,Supermarket,Sushi Restaurant,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Tourist Information Center,Toy / Game Store,Track,Trail,Train Station,Tram Station,Turkish Restaurant,Vegetarian / Vegan Restaurant,Veterinarian,Video Game Store,Vietnamese Restaurant,Warehouse Store,Waterfall,Waterfront,Well,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Yoga Studio,Zoo,Zoo Exhibit
0,Balerno,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Balerno,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Balerno,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Balerno,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Balerno,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [19]:
edi_grouped = edi_onehot.groupby('Neighborhood').mean().reset_index()

In [20]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [21]:
import numpy as np

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = edi_grouped['Neighborhood']

for ind in np.arange(edi_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(edi_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Abbeyhill, Meadowbank and Marionville",Hotel,French Restaurant,Bar,Restaurant,Pizza Place,Mexican Restaurant,Escape Room,Scottish Restaurant,Sculpture Garden,Liquor Store
1,Baberton and Juniper Green,Grocery Store,Home Service,Shopping Mall,Discount Store,Train Station,Multiplex,Supermarket,Food & Drink Shop,Food Court,Food Truck
2,Balerno,Pub,Bar,Hotel,Café,Coffee Shop,Cocktail Bar,Bakery,Museum,Theater,Historic Site
3,Balgreen and Roseburn,Hotel,Coffee Shop,Indian Restaurant,Art Gallery,Café,Bar,Japanese Restaurant,French Restaurant,Pharmacy,Pool Hall
4,Barnton and Cammo,Park,Steakhouse,Diner,Bus Stop,Café,Pharmacy,Zoo Exhibit,Farmers Market,Forest,Food Truck


<h2>Now we have a general information about Edinburgh. Let's filter the venues and get data of coffee shops and cafes for each zone.</h2>

In [22]:
#Let's clean our database to find coffee shops and cafes.
#delete rows which its category is not coffee shop.
array= ['Coffee Shop','Café']
cs_data1 = edi_venues_clean.copy()
cs_data = cs_data1.loc[cs_data1['Venue Category'].isin(array)]
cs_data.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
4,Balerno,55.95334,-3.189107,The Milkman,55.95065,-3.19101,Coffee Shop
5,Balerno,55.95334,-3.189107,Fortitude,55.955844,-3.192544,Café
17,Balerno,55.95334,-3.189107,The Edinburgh Larder,55.95008,-3.186088,Café
24,Balerno,55.95334,-3.189107,Artisan Roast,55.957839,-3.189027,Coffee Shop
26,Balerno,55.95334,-3.189107,Lowdown,55.953386,-3.197936,Café


<h2>Let's see those coffee shops and cafes on the map.</h2>

In [23]:
import folium

map_edinburgh_coffee = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, Venue in zip(cs_data['Venue Latitude'], cs_data['Venue Longitude'], cs_data['Venue']):
    label = '{}'.format(Venue)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=2,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_edinburgh_coffee)  
    
map_edinburgh_coffee

<h2>Heatmap for spotting potential customer areas.</h2>

In [24]:
#Heatmap for spotting potential customer areas.
from folium.plugins import HeatMap

#Make the list of Lat an Lng
latt = cs_data['Venue Latitude'].tolist()
lngt = cs_data['Venue Longitude'].tolist()

#Create the Map
map = folium.Map(
    location=[latitude, longitude],
    tiles='cartodbdark_matter',
    zoom_start=12
)
HeatMap(list(zip(latt, lngt))).add_to(map)
map

<h2>Let's create a proper dataframe to be used in choropleth map.</h2>

In [25]:
#Let's see coffee shop and cafe numbers of venues.
cs_data.groupby('Neighborhood').count().reset_index()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Balerno,9,9,9,9,9,9
1,Balgreen and Roseburn,5,5,5,5,5,5
2,Barnton and Cammo,1,1,1,1,1,1
3,"Bingham, Magdalene and the Christians",2,2,2,2,2,2
4,Blackford,3,3,3,3,3,3
5,Blackhall,1,1,1,1,1,1
6,Braids,4,4,4,4,4,4
7,Bughtlin and Parkgrove,1,1,1,1,1,1
8,Clerwood and Corstorphine (Hillview),1,1,1,1,1,1
9,Clovenstone and Drumbryden,1,1,1,1,1,1


In [26]:
world_geo = json.loads(requests.get("https://raw.githubusercontent.com/martinjc/UK-GeoJSON/master/json/statistical/sco/idz_by_lad/S12000036.json").text)
map_data = cs_data.groupby('Neighborhood').count().reset_index()
# create a plain world map
edinburgh_Coffee_map = folium.Map(location=[latitude, longitude], zoom_start=11)

In [29]:
map_data2 = map_data.drop(['Neighborhood Latitude', 'Neighborhood Longitude','Venue Latitude','Venue Longitude','Venue Category'], axis=1)
choro = df.join(map_data2.set_index('Neighborhood'), on='Areas')
choro = choro.replace('NaN', 0)
choro.fillna(0, inplace=True)
choro.sort_values(by=['Venue'],ascending=False)

Unnamed: 0,Areas,Latitude,Longitude,Venue
74,Stockbridge,55.958779,-3.210016,12.0
0,Balerno,55.95334,-3.189107,9.0
30,Merchiston and Greenhill,55.931749,-3.221226,6.0
27,Morningside,55.923009,-3.214501,5.0
79,Lorne,55.965973,-3.172764,5.0
31,Grange,55.933664,-3.193482,5.0
50,Balgreen and Roseburn,55.942681,-3.237693,5.0
77,Craigleith,55.964587,-3.233234,4.0
17,Braids,55.917427,-3.203622,4.0
57,Southside and Canongate,55.951652,-3.178614,4.0


<h2>Our dataframe is ready. Time to create choropleth map.</h2>

In [40]:
# display map
edinburgh_Coffee_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# generate choropleth map 
choropleth = folium.Choropleth(
    geo_data=world_geo,
    data=choro,
    columns=['Areas', 'Venue'],
    key_on='feature.properties.IZ_NAME',
    fill_color='YlOrRd', 
    fill_opacity=0.7, 
    line_opacity=0.2,
    legend_name='Coffee Shops and Cafes in Edinburgh',
    highlight=True,
    smooth_factor=0).add_to(edinburgh_Coffee_map)

# add labels indicating the name of the community
style_function = "font-size: 15px; font-weight: bold"
choropleth.geojson.add_child(
    folium.features.GeoJsonTooltip(['IZ_NAME'], style=style_function, labels=False))

# create a layer control
folium.LayerControl().add_to(edinburgh_Coffee_map)

# display map
edinburgh_Coffee_map

## Results and Discussion <a name="results"></a>

<p>Exploratory data analysis and visualization techniques have provided us with some insights into the coffee-related businesses distribution in Edinburgh. Stockbridge, Balerno and Merchiston &amp; Greenhill are the most popular zones with 12, 9 and 6 businesses respectively. As clearly seen from the choropleth map, the area around the old town remains the hottest spot for Edinburgh's growing coffee businesses. However, the Balerno district remains a strong outlier on the map and should not be disregarded.</p>

## Conclusion <a name="conclusion"></a>

<p>The purpose of this project was to identify Edinburgh areas with a high number of coffee shops and cafes in order to spot potential customers for Karabal company which specialized in third-wave coffee equipment, roasting techniques and distribution of various types of coffee beans from around the world. By calculating coffee business density distribution from Foursquare data we have first identified general zones that justify further analysis, and then generated major zones of interest.</p>
<p>The final decision on investments will be made by stakeholders based on specific characteristics of zones and locations in every recommended zone, taking into consideration additional factors like chain businesses like Starbucks and Costa.</p>