# Opening a new coffee shop in the City of Sydney, Australia

##  Introduction/Business Problem

Opening a coffee shops in the City of Sydney is a challenging and also potentially rewarding journey. Have you heard that Sydney is one of the cities with the best coffee? Personally I agree (Melbourne's and Aucklands are also good). It is a competitive business environment that customers are not only choosing the best cup of coffee, but also in a covenient location, and not to mention a good customer service as the basic of everything. This is a preliminary analysis for skilled coffee shop owners looking into expanding their business to the City of Syndey.

## Overview of how data will be collected and examined

First the locations, coordinates, and rating of all the coffee shops in the City of Sydney will be gathered via API from Foursquare (https://developer.foursquare.com/). Then the locations of these coffee shops will be split into 10 areas (village groups) as per the geojson file from the City of Sydney LGA (local government areas https://data.cityofsydney.nsw.gov.au/). With the total area and total employmnet data in each village known from the same data source, the number of coffee shop per employment ratio will be ploted in order to visually narrow down potential villages.

To leverage the data from Foursquare, location of train stations and bus stops will aslo be identified to estimate area with the best pedestrian traffic towards the potential villages. The ratings of exisitng coffee shops will be used as a guide to identify any final recommendation.

## Step by step plan on data collection and manipulation

1) Gather coffee shops data from Foursquare, using search end point, with limit and Query input (Applied Data Science Capstone, week 2).
2) Clean up data to a datafram that contains name of venue, unique ID, coordinates, and category (Applied Data Science Capstone, week 2)
3) Gather geojson data from API or a file from cityofsydney website, and display it using Folium (Data Visualization with Python, week 3).
4) Allocate coffee shops to each of the Village (Point in Polygon, online self study).
5) Count number of coffee shops in each Village. Gather number of employment and resident population in each village.
6) Examine data visually and decide what kind of regression can best describe the relationship by using employment and resident population to predict number of coffee shop in each Village.
7) Identify Villages (1-3) with coffee shops less than the predicted value as per the regression.
8) Ultilise premium call to get the average ratings of the coffee shops in these Village(s).
9) Find out which village has coffee shops with the least average ratings.
10) Additional findings on data.
11) The preliminary study will be concluded.

### Step 1, collect data from Foursqure

In [2]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    ca-certificates-2020.4.5.1 |       hecc5488_0         146 KB  conda-forge
    geopy-1.21.0               |             py_0          58 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    certifi-2020.4.5.1         |   py36h9f0ad1d_0         151 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1

In [4]:
# Visually from google map, Prince Alfred Park is in the middle of the City of Sydney LGA. Let's find out what is the coordinate.
# this is the address of the Prince Alfred Park:
venue_PA = 'Prince Alfred Park'
address_PA = '207 Cleveland St, Surry Hills NSW 2016, Australia'

geolocator_PA = Nominatim(user_agent="foursquare_agent")
location_PA = geolocator.geocode(address)
latitude_PA = location.latitude
longitude_PA = location.longitude
print('This is the central of the City of Sydney LGA:')
print(venue_PA)
print('address:',address_PA)
print('geo locator:',geolocator_PA)
print('location:',location_PA)
print(latitude_PA, longitude_PA)

This is the central of the City of Sydney LGA
Prince Alfred Park
address: 207 Cleveland St, Surry Hills NSW 2016, Australia
geo locator: <geopy.geocoders.osm.Nominatim object at 0x7fb9b684c198>
location: Cleveland Street, Strawberry Hills, Redfern, Gosford, Council of the City of Sydney, New South Wales, 2016, Australia
-33.8896267 151.2057093


In [5]:
# let's define the remaining 
CLIENT_ID = 'BESBPEDHNTKUY32TFLTWUP3HLLAG45FNSH4DB1PQI2KGPJP2' # your Foursquare ID
CLIENT_SECRET = 'MMB0U1FGJMTCDWEHNOI1YSWMKCGT1E0RBRH3MBBFKNEXLO2M' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 500

# let's search 5km radius from the Alfred Park. This is covering the whole City of Sydney area.
search_query = 'coffee'
radius = 5000

In [6]:
# Let's define the search query.
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(
    CLIENT_ID,
    CLIENT_SECRET,
    latitude_PA,
    longitude_PA,
    VERSION,
    search_query,
    radius,
    LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=BESBPEDHNTKUY32TFLTWUP3HLLAG45FNSH4DB1PQI2KGPJP2&client_secret=MMB0U1FGJMTCDWEHNOI1YSWMKCGT1E0RBRH3MBBFKNEXLO2M&ll=-33.8896267,151.2057093&v=20180604&query=coffee&radius=5000&limit=500'

In [7]:
results_all_coffee = requests.get(url).json()

In [9]:
# Get relevant part of JSON and transform it into a *pandas* dataframe
# assign relevant part of JSON to venues
venues_all_coffee = results_all_coffee['response']['venues']

# tranform venues into a dataframe
dataframe_all_coffee = json_normalize(venues_all_coffee)
dataframe_all_coffee.head()

Unnamed: 0,categories,hasPerk,id,location.address,location.cc,location.city,location.country,location.crossStreet,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,referralId,venuePage.id
0,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",False,4da92642a86e771ea704da83,93B Redfern St,AU,Redfern,Australia,,380,"[93B Redfern St, Redfern NSW 2016, Australia]","[{'label': 'display', 'lat': -33.8927595184862...",-33.89276,151.204072,,2016.0,NSW,Coffee Tea & Me,v-1589170021,
1,"[{'id': '4bf58dd8d48988d1e0931735', 'name': 'C...",False,4b737261f964a520f1ae2de3,Chalmers St.,AU,Surry Hills,Australia,Devonshire St.,463,"[Chalmers St. (Devonshire St.), Surry Hills NS...","[{'label': 'display', 'lat': -33.8855716578249...",-33.885572,151.206862,,2010.0,NSW,Campos Coffee Cart,v-1589170021,
2,"[{'id': '4bf58dd8d48988d16d941735', 'name': 'C...",False,4ff8ebc6e4b0aeae9f387ae4,,AU,,Australia,,162,[Australia],"[{'label': 'display', 'lat': -33.8889111224469...",-33.888911,151.207242,,,,Belvoir Crave Coffee Bar,v-1589170021,
3,"[{'id': '4bf58dd8d48988d16d941735', 'name': 'C...",False,4eee9991a69d8afe6c57a76e,61 Albion St.,AU,Surry Hills,Australia,btwn Commonwealth & Belmore,898,"[61 Albion St. (btwn Commonwealth & Belmore), ...","[{'label': 'display', 'lat': -33.8828664315345...",-33.882866,151.211014,,2010.0,NSW,Reuben Hills Coffee Roastery & Café,v-1589170021,
4,"[{'id': '4bf58dd8d48988d16d941735', 'name': 'C...",False,4b0c5685f964a520703b23e3,5/2 Lee St.,AU,Sydney,Australia,,657,"[5/2 Lee St., Sydney NSW 2000, Australia]","[{'label': 'display', 'lat': -33.8839158111476...",-33.883916,151.203888,,2000.0,NSW,Coffee Trails,v-1589170021,35970513.0


In [15]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe_all_coffee.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe_all_coffee.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered.head()

Unnamed: 0,name,categories,address,cc,city,country,crossStreet,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,Coffee Tea & Me,Coffee Shop,93B Redfern St,AU,Redfern,Australia,,380,"[93B Redfern St, Redfern NSW 2016, Australia]","[{'label': 'display', 'lat': -33.8927595184862...",-33.89276,151.204072,,2016.0,NSW,4da92642a86e771ea704da83
1,Campos Coffee Cart,Coffee Shop,Chalmers St.,AU,Surry Hills,Australia,Devonshire St.,463,"[Chalmers St. (Devonshire St.), Surry Hills NS...","[{'label': 'display', 'lat': -33.8855716578249...",-33.885572,151.206862,,2010.0,NSW,4b737261f964a520f1ae2de3
2,Belvoir Crave Coffee Bar,Café,,AU,,Australia,,162,[Australia],"[{'label': 'display', 'lat': -33.8889111224469...",-33.888911,151.207242,,,,4ff8ebc6e4b0aeae9f387ae4
3,Reuben Hills Coffee Roastery & Café,Café,61 Albion St.,AU,Surry Hills,Australia,btwn Commonwealth & Belmore,898,"[61 Albion St. (btwn Commonwealth & Belmore), ...","[{'label': 'display', 'lat': -33.8828664315345...",-33.882866,151.211014,,2010.0,NSW,4eee9991a69d8afe6c57a76e
4,Coffee Trails,Café,5/2 Lee St.,AU,Sydney,Australia,,657,"[5/2 Lee St., Sydney NSW 2000, Australia]","[{'label': 'display', 'lat': -33.8839158111476...",-33.883916,151.203888,,2000.0,NSW,4b0c5685f964a520703b23e3


In [18]:
# let's have a look on what we've got so far on a map.

coffee_map = folium.Map(location=[latitude_PA, longitude_PA], zoom_start=13) # generate map centred around the Conrad Hotel

# add a red circle marker to represent the Prince Alfred Park
folium.features.CircleMarker(
    [latitude_PA, longitude_PA],
    radius=10,
    color='red',
    popup='Prince Alfred Park',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(coffee_map)

# add the coffee shop and Cafe as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(coffee_map)

# display map
coffee_map
