# Location for opening up a new organic grocery in Toronto
Wang, Junyu  
2020.05.13

## 1. Introduction

### 1.1 Background

As the technology has made our life much better than before. People no long need to struggle for food. And in big cities there is a trend to persue organic lifestyle. This is a sustainable lifestyle aiming at protecting of our planet. People living the lifestyle eat organic food, wear organic clothes and try to use public tranportations instead of driving own car. 

### 1.2 Problem

The income may influence the lifestyle that people choose. We will combine the income data and distribution of organic groceries in Toronto to discuss the problem

### 1.3 Interest

As it is impossible for people living in the city to produce food and clothes themself, as more people persuing this lifestyle, it will lead to a growing demand of organic groceries. For entrepreneur it could be a good chance to open a such store. But as the price for organic products are higher than similar products, and not all the people are willing to pay that gap to get organic products, searching the right business target group is very important. So choosing the right place to open up a new organic grocery store could be decisive for business success. In this report we will show how many organic groceries exists in toronto and how they are distributed in the cities. This information could help people to decide, which boough could be a good place to open up a new organic grocery story. 

## 2. Data source and cleaning

### 2.1 Data source

The location data of the neighborhoods in Toronto can be found in the previous assignment. Data from [wikipedia](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M) and [IBM](http://cocl.us/Geospatial_data) are used to get location information of the neighborhood of the city Toronto. The economic data of Toronto is acquired for [wikipedia](https://en.wikipedia.org/wiki/Demographics_of_Toronto_neighbourhoods). 

### 2.2 Data cleaning

The location data from wikipedia and IBM are scraped and combined in one dataframe. Then Requests to Foursquare are made to find the organic groceries in a radius of 10 km from the center of each postal code. As the organic groceries may be found more than once, they are assigned to the neighborhood, whose center has the smallest distance to it. Then, the organic groceries will be visuallized on the map of Toronto using folium.

The population and income data will be read from wikipedia and processed further to calculate the population and average income of each borough. Then the numbers of organic groceries and population and income data will be combined to show the current organic product market.



## 3. Data Analysis

First, we import the packages we need for our analysis and scrap the data from our sources.

In [1]:
import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


In [2]:
url_file = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')

In [3]:
toronto_df = pd.read_html(url_file.text, header=0)[0].dropna(axis = 0).reset_index(drop=True)

In [4]:
df_1 = pd.read_csv('http://cocl.us/Geospatial_data')
toronto_df = toronto_df.set_index('Postal Code').join(df_1.set_index('Postal Code')).reset_index()
toronto_df.head(5)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


Using foursquare to get the infomation of veneus that is a organic grocery. Radius are selected to be 10 km to include all possible target.

In [5]:
# The code was removed by Watson Studio for sharing.

In [6]:
# Function that returns all organic grocery near the given location (10 km in this case)

def get_store(lat, lng):
    res = requests.get('https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&limit={}&radius={}&categoryId={}'.format(
                        client_id, 
                        client_secret, 
                        v, 
                        lat, lng,
                        limit,
                        radius,
                        categoryId))
    try:
        res_df = pd.DataFrame(res.json()['response']['venues']).loc[:, ['id', 'name', 'categories', 'location']]
        res_df.loc[:, ['categories']] = res_df.loc[:, ['categories']].applymap(lambda x: x[0]['name'])
        for i in ['lat', 'lng', 'distance']:
            res_df.loc[:, i] = res_df.loc[:, ['location']].applymap(lambda x: x[i]).values
        res_df.drop('location', axis=1, inplace=True)
        return res_df
    except:
        return None


In [7]:
# get all the groceries using the location of all the neighborhood

grocery_df = pd.DataFrame()
for lat, lng, neighborhood, borough in zip(toronto_df.Latitude, toronto_df.Longitude, toronto_df.Neighborhood, toronto_df.Borough):
    tmp = get_store(lat, lng)
    if tmp is not None:
        tmp.loc[:, 'Neighborhood'] = neighborhood
        tmp.loc[:, 'Borough'] =  borough
        grocery_df = grocery_df.append(tmp)
        
grocery_df.index = grocery_df.loc[:, 'id']
grocery_df.head(5)

Unnamed: 0_level_0,id,name,categories,lat,lng,distance,Neighborhood,Borough
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
57ffc0acd67cbfb360268a82,57ffc0acd67cbfb360268a82,Whole Foods Market,Grocery Store,43.857808,-79.322328,11653,Parkwoods,North York
57ffd7b2d67c8ac5e8b34dd9,57ffd7b2d67c8ac5e8b34dd9,Whole Foods Market,Grocery Store,43.760662,-79.411044,6595,Parkwoods,North York
4ad4c063f964a52052f820e3,4ad4c063f964a52052f820e3,Whole Foods Market,Grocery Store,43.671954,-79.395543,10489,Parkwoods,North York
5c3c8a9dccad6b00390ec282,5c3c8a9dccad6b00390ec282,McEwan,Grocery Store,43.66973,-79.386872,10375,Parkwoods,North York
58f8f6bce679bc03cd143eec,58f8f6bce679bc03cd143eec,Whole Foods Market,Grocery Store,43.71416,-79.377966,5834,Parkwoods,North York


In [8]:
grocery_df.shape

(1336, 8)

We found 1336 target, but many of them are the same grocery that was found many times. So we need to clean up the duplicates.

In [9]:
# clean up the stores that are found more than one time. The Neighborhood that is nearest to the store are considered as the neighborhood that the store belongs to.

grocery_df = grocery_df.sort_values('distance').drop_duplicates('id').reset_index(drop=True)
print(grocery_df.shape)
grocery_df

(27, 8)


Unnamed: 0,id,name,categories,lat,lng,distance,Neighborhood,Borough
0,4adccb27f964a520e02f21e3,Essence of Life Organics,Organic Grocery,43.654111,-79.400431,105,"Kensington Market, Chinatown, Grange Park",Downtown Toronto
1,598a34315a2c911048fc6433,Little Green Planet,Organic Grocery,43.654163,-79.39992,107,"Kensington Market, Chinatown, Grange Park",Downtown Toronto
2,5b882947c53093002ccfc604,Fresh City Farms,Organic Grocery,43.646386,-79.419503,172,"Little Portugal, Trinity",West Toronto
3,4ecfbb9a5c5c9528f868d81c,Manotas Organics and Fine Foods,Organic Grocery,43.648703,-79.371659,360,Stn A PO Boxes,Downtown Toronto
4,5c3c8a9dccad6b00390ec282,McEwan,Grocery Store,43.66973,-79.386872,524,Church and Wellesley,Downtown Toronto
5,5ad67ae60a464d29cd35c317,Fresh City @ Roncesvalles,Organic Grocery,43.653,-79.451902,573,"Parkdale, Roncesvalles",West Toronto
6,59601a846e465003934d4788,Organic Garage,Organic Grocery,43.667743,-79.463271,693,"High Park, The Junction South",West Toronto
7,5bbdf6a8c53093002c37c8d0,Organic Garage- Liberty Village,Organic Grocery,43.639194,-79.419889,718,"Brockton, Parkdale Village, Exhibition Place",West Toronto
8,5c1a8579f1fdaf002cd5269f,Adam Premium Meats,Organic Grocery,43.777487,-79.232623,733,Cedarbrae,Scarborough
9,5bbdf64d97cf5a002c686de1,Organic Garage,Organic Grocery,43.63928,-79.419717,734,"Brockton, Parkdale Village, Exhibition Place",West Toronto


After cleaning up the duplicates, we see that there are 27 stores in total that has organic grocery as label according to Foursquare. Considering that Toronto has a population of 3 million, there could definitely more demand of such groceries.

However, we ignore here that many supermarket also has sections selling organic products. This should also be considered when making final decisions.

In [10]:
!pip install folium
import folium

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/a4/f0/44e69d50519880287cc41e7c8a6acc58daa9a9acf5f6afc52bcc70f69a6d/folium-0.11.0-py2.py3-none-any.whl (93kB)
[K     |████████████████████████████████| 102kB 9.9MB/s eta 0:00:01
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/13/fb/9eacc24ba3216510c6b59a4ea1cd53d87f25ba76237d7f4393abeaf4c94e/branca-0.4.1-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0


In [11]:
toronto_map = folium.Map([43.6532, -79.3832], zoom_start=10)

for lat, lng, label in zip(grocery_df.lat, grocery_df.lng, grocery_df.categories):
    folium.vector_layers.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        fill=True,
        color='blue',
        fill_color='blue',
        fill_opacity=0.6
        ).add_to(toronto_map)

In [12]:
toronto_map

This map clearly show how the organic groceries are distributed in Toronto. It is clear that most of the groceries are located in city center, which is also expected. For further discussion, we will focus on the store in Toronto city, because the interest to start a organic grocery store there is larger comparing to the surrounding areas of Toronto.

In [13]:
# import economic data from wikipedia using requests

toronto_eco = requests.get("https://en.wikipedia.org/wiki/Demographics_of_Toronto_neighbourhoods").text
toronto_eco_df = pd.read_html(toronto_eco)[1].loc[:, ['FM', 'Name', 'Population', 'Average Income']]

We can see that, Toronto CMA has a population of 5113149, now we will use data to calulate the population and the average income of each borough. 

In [14]:
toronto_eco_df.head(5)

Unnamed: 0,FM,Name,Population,Average Income
0,,Toronto CMA Average,5113149,40704
1,S,Agincourt,44577,25750
2,E,Alderwood,11656,35239
3,OCoT,Alexandra Park,4355,19687
4,OCoT,Allenby,2513,245592


In [15]:
toronto_eco_df.loc[:, 'Total Income'] = toronto_eco_df.Population * toronto_eco_df.loc[:, 'Average Income']

In [16]:
# Get the total income of each neighborhood, and add them up to get the total income and population of each borough. 
# Then use these info to calculate the average income of each borough.

toronto_eco_borough = toronto_eco_df.groupby('FM').sum()
toronto_eco_borough.index = ['Etobicoke', 'East York', 'North York', 'Old City of Toronto', 'Scarborough', 'York']
toronto_eco_borough.loc[:, 'Average Income'] = toronto_eco_borough.loc[:, 'Total Income'] / toronto_eco_borough.Population
toronto_eco_borough.drop('Total Income', axis=1, inplace=True)
toronto_eco_borough.loc[:, ['Average Income']] = toronto_eco_borough.loc[:, ['Average Income']].astype(np.int32)

In [17]:
toronto_eco_borough

Unnamed: 0,Population,Average Income
Etobicoke,313772,41251
East York,112054,39185
North York,621068,38537
Old City of Toronto,624910,53401
Scarborough,600715,28806
York,143255,32430


Next, we count the organic groceries in each borough and then join the counts with population and income to form a single table.

In [68]:
grocery_count = grocery_df.groupby('Borough').count().loc[:, ["Neighborhood"]]
grocery_count.columns = ['Organic groceries']

In [69]:
count_toronto = grocery_count[grocery_count.index.str.contains('Toronto')].sum(axis=0).astype(np.int64)
grocery_count = grocery_count[~grocery_count.index.str.contains('Toronto')]
grocery_count.loc["Old City of Toronto"] = count_toronto
grocery_count

Unnamed: 0_level_0,Organic groceries
Borough,Unnamed: 1_level_1
East York,3
Etobicoke,1
Mississauga,2
North York,4
Scarborough,3
York,1
Old City of Toronto,13


In [71]:
toronto_eco_borough.join(grocery_count)

Unnamed: 0,Population,Average Income,Organic groceries
Etobicoke,313772,41251,1
East York,112054,39185,3
North York,621068,38537,4
Old City of Toronto,624910,53401,13
Scarborough,600715,28806,3
York,143255,32430,1


There are now only 25 stores, as 2 of the 27 are not located in the city of Toronto

## 4. Result and discussion 

We can see that although we heard much about organic lifestyle a lot in the last few years, there are only 25 organic groceries in Toronto. The small amount of organic groceres make it almost impossible to dig out much information based only on foursquare map data. Nevertheless, we could still win a overview into the market of organic products based on the data we collected.

Over half of the organic groceries are located in Old City of Toronto, where the popuplation is the highest and average income also the highest. This is to be expected as normally most such stores will locate in inner city where many public transportations are in the surrondings. 

Opening an organic grocery outside the Old City of Toronto requires a entrepreneur to know better about the corresponding borough. Etobicoke would be of an interest, because it has the second highest income and relativly large population, but only a single organic grocery is located there.

## 5. Conclusion

In this report we try to gain an overview into the current situation of organic groceries in Toronto. We plot the groceries on the folium map and calculate the population, income and numbers of organic groceries in each borough in Toronto.

There will still be a long way before the real blooming of organic products, as it is shown that there are only less than 30 such stores in Toronto. Old City of Toronto would still be the first choice for opening up a organic grocery because of the population and average income, although there are already some stores opening there. People living in Etobicoke may have interest