# Location for opening up a new organic grocery in Toronto
Wang, Junyu  
2020.05.13

## 1. Introduction

### 1.1 Background

As the technology has made our life much better than before. People no long need to struggle for food. And in big cities there is a trend to persue organic lifestyle. This is a sustainable lifestyle aiming at protecting of our planet. People living the lifestyle eat organic food, wear organic clothes and try to use public tranportations instead of driving own car. 

### 1.2 Problem

The income will influence the lifestyle that people choose. We will combine the income data and distribution of organic groceries in Toronto to discuss the problem

### 1.3 Interest

As it is impossible for people living in the city to produce food and clothes themself, as more people persuing this lifestyle, it will lead to a growing demand of organic groceries. For entrepreneur it could be a good chance to open a such store. But as the price for organic products are higher than similar products, and not all the people are willing to pay that gap to get organic products, searching the right business target group is very important. So choosing the right place to open up a new organic grocery store could be decisive for business success. In this report we will show how many organic groceries exists in toronto and how they are distributed in the cities. This information could help people to decide, which neighborhhod could be a good place to open up a new organic grocery story. 

## 2. Data source and cleaning

### 2.1 Data source

The location data of the neighborhoods in Toronto can be found in the previous assignment. Data from [wikipedia](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M) and [IBM](http://cocl.us/Geospatial_data) are used to get location information of the neighborhood of the city Toronto. The economic data of Toronto is acquired for [wikipedia](https://en.wikipedia.org/wiki/Demographics_of_Toronto_neighbourhoods). [Geojson](https://jasonicarter.github.io/Toronto-Neighbourhood-geojson/) data will be used to show the distribution of the income. 

### 2.2 Data cleaning

The location data from wikipedia and IBM are scraped and combined in one dataframe. Then Requests to Foursquare are made to find the organic groceries in a radius of 10 km from the center of each postal code. As the organic groceries may be found more than once, they are assigned to the neighborhood, whose center has the smallest distance to it.

Then, the organic groceries will be visuallized on the map of Toronto using folium. The income will be visuallized using choropleth maps. Then the two maps will be compared with each other to draw the final conclusion.



# Assignment for week 4 ends here. 

_Use the data from toronto and process it as previous assignment_

In [1]:
import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


In [2]:
url_file = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')

In [3]:
toronto_df = pd.read_html(url_file.text, header=0)[0].dropna(axis = 0).reset_index(drop=True)

In [4]:
df_1 = pd.read_csv('http://cocl.us/Geospatial_data')
toronto_df = toronto_df.set_index('Postal Code').join(df_1.set_index('Postal Code')).reset_index()
toronto_df.head(20)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,Islington Avenue,43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


_Using foursquare to get the infomation of veneus that is a grocery shop._

In [5]:
# The code was removed by Watson Studio for sharing.

In [6]:
# Function that returns all organic grocery near the given location (10 km in this case)

def get_store(lat, lng):
    res = requests.get('https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&limit={}&radius={}&categoryId={}'.format(
                        client_id, 
                        client_secret, 
                        v, 
                        lat, lng,
                        limit,
                        radius,
                        categoryId))
    try:
        res_df = pd.DataFrame(res.json()['response']['venues']).loc[:, ['id', 'name', 'categories', 'location']]
        res_df.loc[:, ['categories']] = res_df.loc[:, ['categories']].applymap(lambda x: x[0]['name'])
        for i in ['lat', 'lng', 'distance']:
            res_df.loc[:, i] = res_df.loc[:, ['location']].applymap(lambda x: x[i]).values
        res_df.drop('location', axis=1, inplace=True)
        return res_df
    except:
        return None


In [9]:
# get all the groceries using the location of all the neighborhood

grocery_df = pd.DataFrame()
for lat, lng, neighborhood in zip(toronto_df.Latitude, toronto_df.Longitude, toronto_df.Neighborhood):
    tmp = get_store(lat, lng)
    if tmp is not None:
        tmp.loc[:, 'Neighborhood'] = neighborhood
        grocery_df = grocery_df.append(tmp)
        
grocery_df.index = grocery_df.loc[:, 'id']
grocery_df.drop('id', axis=1)

In [None]:
# clean up the stores that are found more than one time. The Neighborhood that is nearest to the store are considered as the neighborhood that the store belongs to.

id_set = set()
tmp_df = pd.DataFrame()
for i in range(grocery_df.shape[0]):
    if grocery_df.index[i] not in id_set:
        tmp_df = tmp_df.append(grocery_df.iloc[[i], :])
        id_set.add(grocery_df.index[i])
    else:
        shop_id = grocery_df.id[i]
        if grocery_df.iloc[i, 4] < tmp_df.loc[shop_id, 'distance']:
            tmp_df.drop(shop_id, axis=0)
            tmp_df.append(grocery_df.iloc[[i], :])

grocery_df = tmp_df
grocery_df.reset_index(drop=True, inplace=True)

There are 27 stores in total that has organic grocery as label. Because the calls to Foursquare are exceeded, the results will be shown in the future.

In [None]:
!pip install folium
import folium

In [None]:
berlin_map = folium.Map([52.472103, 13.432550], zoom_start=12)

for lat, lng, label in zip(test_df.lat, test_df.lng, test_df.categories):
    folium.vector_layers.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        fill=True,
        color='blue',
        fill_color='blue',
        fill_opacity=0.6
        ).add_to(berlin_map)

In [None]:
berlin_map