# New yoga class in Toronto

### Introduction

From [here](https://www.thegoodbody.com/yoga-statistics/) we can find out that Yoga has grown massively in popularity over the past few years, with passionate yogis stretching around the world. The facts show that it has many health benefits, particularly for those suffering from back pain, and with the spend on yoga products growing annually the trend shows no sign of slowing down.
Also yoga market has massive annual spends Worldwide each year. 
So it's interesting trend and we can participate in it.

### Task

Need to find a good place for creating new yoga class in Toronto. Based on different open data source we should find on which attributes of Neighbourhoods have more influence on yoga studios and where is the best place for it, depending of base surroundings.

### How data sould be used

1. As it is told [here](https://www.thegoodbody.com/yoga-statistics/) yoga is popular for certain segments of the population. We should try to get this statistics from Neighbourhood Profiles 2016 (data source 1). This source includes demographical, social, employment data for each Neighbourhood. I plan select: Total population, Age (different groups), Sex (different groups), Marital status, Education, Employment status
2. After that we take data from Foursquare (data source 2) and find out which parameters affects on yoga studios in thease neighbourhoods and find out which parameters affects on Yoga studios mostly and where yoga studios os not enough.
3. We also should find best neighbours (fithes studios, health bars and so on) for yoga studios from Foursquare 
4. After that we can merge data from goverment statistics and best neighbours of yoga studios and find the places where new studio should be.
5. Find out existance of yoga studios in that Neighbourhood and define can we place another one or not

#### Data sources

1. Neighbourhood Profiles 2016 (CSV) https://www.toronto.ca/city-government/data-research-maps/open-data/open-data-catalogue/locations-and-mapping/#8c732154-5012-9afe-d0cd-ba3ffc813d5a
2. Foursquare.com


## Realisation

In [1]:
import pandas as pd
import numpy as np

##### Load data

In [2]:
!wget -q -O 'TorrontoData.csv' https://www.toronto.ca/ext/open_data/catalog/data_set_files/2016_neighbourhood_profiles.csv

In [3]:
TorrontoCsv = pd.read_csv('TorrontoData.csv',encoding = 'unicode_escape')
TorrontoCsv.drop(['Topic','Category','Data Source'], axis=1, inplace=True)
TorrontoCsv.rename(columns={"Characteristic": "Idx"},inplace=True)
TorrontoCsv.set_index('Idx',inplace = True)

#### Saving Neighbourhood Numbers

In [4]:
TorrontoNeighbourhood = TorrontoCsv.filter(regex='Neighbourhood Number', axis=0).transpose()[1:]
TorrontoNeighbourhood.rename(columns={"Neighbourhood Number": "id"},inplace=True)
TorrontoNeighbourhood['id']=TorrontoNeighbourhood['id'].apply(lambda x: x.zfill(3))
TorrontoNeighbourhood.head()

Idx,id
Agincourt North,129
Agincourt South-Malvern West,128
Alderwood,20
Annex,95
Banbury-Don Mills,42


In [5]:
TorrontoNeighbourhood.shape

(140, 1)

In [6]:
TorrontoCsv.dropna(axis=0,inplace=True)
TorrontoCsv.head()

Unnamed: 0_level_0,City of Toronto,Agincourt North,Agincourt South-Malvern West,Alderwood,Annex,Banbury-Don Mills,Bathurst Manor,Bay Street Corridor,Bayview Village,Bayview Woods-Steeles,...,Willowdale West,Willowridge-Martingrove-Richview,Woburn,Woodbine Corridor,Woodbine-Lumsden,Wychwood,Yonge-Eglinton,Yonge-St.Clair,York University Heights,Yorkdale-Glen Park
Idx,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
"Population, 2016",2731571,29113,23757,12054,30526,27695,15873,25797,21396,13154,...,16936,22156,53485,12541,7865,14349,11817,12528,27593,14804
"Population, 2011",2615060,30279,21988,11904,29177,26918,15434,19348,17671,13530,...,15004,21343,53350,11703,7826,13986,10578,11652,27713,14687
Population Change 2011-2016,4.50%,-3.90%,8.00%,1.30%,4.60%,2.90%,2.80%,33.30%,21.10%,-2.80%,...,12.90%,3.80%,0.30%,7.20%,0.50%,2.60%,11.70%,7.50%,-0.40%,0.80%
Total private dwellings,1179057,9371,8535,4732,18109,12473,6418,18436,10111,4895,...,8054,8721,19098,5620,3604,6185,6103,7475,11051,5847
Private dwellings occupied by usual residents,1112929,9120,8136,4616,15934,12124,6089,15074,9532,4698,...,7549,8509,18436,5454,3449,5887,5676,7012,10170,5344


##### For consistency, ensure that all column labels of type string.

In [7]:
all(isinstance(column, str) for column in TorrontoCsv.columns)

True

#### Convert to numbers

In [8]:
TorrontoCsv.replace("%",'', regex=True,inplace = True)
TorrontoCsv.replace(",",'', regex=True,inplace = True)
TorrontoCsv=TorrontoCsv.apply(pd.to_numeric, errors='coerce').fillna(0)

##### Filtering data data

In [9]:
TorrontoFilteredData=pd.concat([
                        TorrontoCsv.filter(regex='^Male:', axis=0),
                        TorrontoCsv.filter(regex='^Female:', axis=0),
                        TorrontoCsv.filter(regex='Married or living common law', axis=0),
                        TorrontoCsv.filter(regex='Not married and not living common law', axis=0),
                        TorrontoCsv.filter(regex='No certificate, diploma or degree', axis=0).head(1),
                        TorrontoCsv.filter(regex='Secondary \(high\) school diploma or equivalency certificate', axis=0).head(1),
                        TorrontoCsv.filter(regex='Postsecondary certificate, diploma or degree', axis=0).head(1),
                        TorrontoCsv.filter(regex='Employed$', axis=0),
                        TorrontoCsv.filter(regex='Unemployed$', axis=0),
                        TorrontoCsv.filter(regex='Population, 2016', axis=0),
])


#### Change from values to relative

In [10]:
col = TorrontoFilteredData.columns[1:]
clmn = list(col) 
for i in clmn: 
    TorrontoFilteredData[i]=TorrontoFilteredData[i] / TorrontoFilteredData['City of Toronto']
TorrontoFilteredData.drop(['City of Toronto'], axis=1, inplace=True)  
TorrontoFilteredData.head(3)

Unnamed: 0_level_0,Agincourt North,Agincourt South-Malvern West,Alderwood,Annex,Banbury-Don Mills,Bathurst Manor,Bay Street Corridor,Bayview Village,Bayview Woods-Steeles,Bedford Park-Nortown,...,Willowdale West,Willowridge-Martingrove-Richview,Woburn,Woodbine Corridor,Woodbine-Lumsden,Wychwood,Yonge-Eglinton,Yonge-St.Clair,York University Heights,Yorkdale-Glen Park
Idx,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Male: 0 to 04 years,0.009443,0.008227,0.005151,0.006367,0.008155,0.006224,0.006724,0.00651,0.002933,0.009657,...,0.005079,0.00887,0.023249,0.006581,0.003219,0.00465,0.004292,0.003148,0.010802,0.004578
Male: 05 to 09 years,0.010022,0.007787,0.003893,0.005263,0.009517,0.005119,0.003317,0.005696,0.003749,0.011464,...,0.00447,0.009012,0.024585,0.005768,0.002596,0.005047,0.004398,0.003172,0.009877,0.004542
Male: 10 to 14 years,0.010162,0.007083,0.003464,0.005004,0.010393,0.00639,0.002002,0.006313,0.004927,0.01355,...,0.00408,0.009393,0.024636,0.005081,0.002772,0.004773,0.004311,0.003003,0.009778,0.005697


#### Transpose data and get result

In [11]:
TorrontoData=TorrontoFilteredData.transpose()

##### Get result DataFrame

In [12]:
TorrontoData.reset_index(inplace=True)
TorrontoData.rename(columns={
    "index": "Neighbourhood", 
    "Population, 2016":"Population",
    "  Married or living common law":"Married",
    "  Not married and not living common law":"Not married",
    "  No certificate, diploma or degree":"No diploma",
    "  Secondary (high) school diploma or equivalency certificate":"High school",
    "  Postsecondary certificate, diploma or degree":"Postsecondary diploma",
    "    Employed":"Employed",
    "    Unemployed":"Unemployed"
},inplace=True)
TorrontoData.head()

Idx,Neighbourhood,Male: 0 to 04 years,Male: 05 to 09 years,Male: 10 to 14 years,Male: 15 to 19 years,Male: 20 to 24 years,Male: 25 to 29 years,Male: 30 to 34 years,Male: 35 to 39 years,Male: 40 to 44 years,...,Female: 95 to 99 years,Female: 100 years and over,Married,Not married,No diploma,High school,Postsecondary diploma,Employed,Unemployed,Population
0,Agincourt North,0.009443,0.010022,0.010162,0.011315,0.010419,0.008911,0.007668,0.007229,0.008783,...,0.017266,0.015385,0.01202,0.009613,0.017358,0.013296,0.008099,0.009189,0.01112,0.010658
1,Agincourt South-Malvern West,0.008227,0.007787,0.007083,0.010506,0.010265,0.009174,0.00753,0.006644,0.007049,...,0.005755,0.0,0.009485,0.008224,0.010693,0.010854,0.007583,0.007974,0.009689,0.008697
2,Alderwood,0.005151,0.003893,0.003464,0.003839,0.003644,0.003117,0.003765,0.004837,0.004854,...,0.001439,0.007692,0.004855,0.003949,0.005314,0.005275,0.003908,0.004705,0.003393,0.004413
3,Annex,0.006367,0.005263,0.005004,0.006263,0.012472,0.018261,0.014785,0.011215,0.009649,...,0.025899,0.038462,0.010391,0.013793,0.0042,0.00761,0.015062,0.012711,0.01022,0.011175
4,Banbury-Don Mills,0.008155,0.009517,0.010393,0.009631,0.007186,0.005663,0.00675,0.007813,0.009418,...,0.030216,0.030769,0.011207,0.009418,0.006082,0.009179,0.011759,0.009557,0.008258,0.010139


In [13]:
TorrontoData=TorrontoData.set_index('Neighbourhood').join(TorrontoNeighbourhood, lsuffix='_caller', rsuffix='_other')

In [14]:
TorrontoData.head()

Idx,Male: 0 to 04 years,Male: 05 to 09 years,Male: 10 to 14 years,Male: 15 to 19 years,Male: 20 to 24 years,Male: 25 to 29 years,Male: 30 to 34 years,Male: 35 to 39 years,Male: 40 to 44 years,Male: 45 to 49 years,...,Female: 100 years and over,Married,Not married,No diploma,High school,Postsecondary diploma,Employed,Unemployed,Population,id
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Agincourt North,0.009443,0.010022,0.010162,0.011315,0.010419,0.008911,0.007668,0.007229,0.008783,0.009795,...,0.015385,0.01202,0.009613,0.017358,0.013296,0.008099,0.009189,0.01112,0.010658,129
Agincourt South-Malvern West,0.008227,0.007787,0.007083,0.010506,0.010265,0.009174,0.00753,0.006644,0.007049,0.008365,...,0.0,0.009485,0.008224,0.010693,0.010854,0.007583,0.007974,0.009689,0.008697,128
Alderwood,0.005151,0.003893,0.003464,0.003839,0.003644,0.003117,0.003765,0.004837,0.004854,0.004843,...,0.007692,0.004855,0.003949,0.005314,0.005275,0.003908,0.004705,0.003393,0.004413,20
Annex,0.006367,0.005263,0.005004,0.006263,0.012472,0.018261,0.014785,0.011215,0.009649,0.009355,...,0.038462,0.010391,0.013793,0.0042,0.00761,0.015062,0.012711,0.01022,0.011175,95
Banbury-Don Mills,0.008155,0.009517,0.010393,0.009631,0.007186,0.005663,0.00675,0.007813,0.009418,0.011116,...,0.030769,0.011207,0.009418,0.006082,0.009179,0.011759,0.009557,0.008258,0.010139,42


### Getting Toronto coordinates

In [15]:
!pip install folium
import folium # map rendering library
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/72/ff/004bfe344150a064e558cb2aedeaa02ecbf75e60e148a55a9198f0c41765/folium-0.10.0-py2.py3-none-any.whl (91kB)
[K     |████████████████████████████████| 92kB 1.7MB/s eta 0:00:01
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/63/36/1c93318e9653f4e414a2e0c3b98fc898b4970e939afeedeee6075dd3b703/branca-0.3.1-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.3.1 folium-0.10.0


In [16]:
address = 'Toronto'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [17]:
import json
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

In [18]:
from branca.colormap import linear
import branca

colormap = linear.YlGn_09.scale(
    TorrontoData.Population.min(),
    TorrontoData.Population.max())

print(colormap(5.0))

colormap

#004529


#### Map of population across Toronto

In [19]:
topo = f'https://raw.githubusercontent.com/smtkvb/Python_Coursera_Capstone/master/toronto_topo.json'

TorrontoData_dict =  TorrontoData[['Population','id']].reset_index()
TorrontoData_dict['id'] = TorrontoData_dict['id'].astype(int)
TorrontoData_dict=TorrontoData_dict.set_index('id')

m = folium.Map(location=[latitude, longitude], zoom_start=11)

colorscale = branca.colormap.linear.YlOrRd_09.scale(0, 50e3)

def style_function(feature):
    data = TorrontoData_dict.loc[int(feature['properties']['id'])]
    return {
        'fillOpacity': 0.7,
        'weight': 0,
        'fillColor': '#black' if data is None else colormap(data.Population)
    }

folium.TopoJson(
    json.loads(requests.get(topo).text),
    'objects.toronto',
    name='topojson',
    style_function=style_function
).add_to(m)

folium.LayerControl().add_to(m)


m

### Getting geo polygon data as DF

Why this is for:
1. Need to find centre of each Neighbourhood. After that I goona use it for getting data from foursquare
2. Need to combine kind of polynom object to check, that venue from foursquare really from this Neighbourhood 

In [20]:
# This one is geojson from the same source as it were topojson
geojson = f'https://raw.githubusercontent.com/smtkvb/Python_Coursera_Capstone/master/toronto_crs84.geojson'

In [21]:
from urllib.request import urlopen
import json
from pandas.io.json import json_normalize

response = urlopen(geojson)
elevations = response.read()
data = json.loads(elevations)
geo_df=json_normalize(data['features'])
geo_df.head()

Unnamed: 0,geometry.coordinates,geometry.type,properties.AREA_NAME,properties.AREA_S_CD,type
0,"[[[-79.39119482699992, 43.68108112399995], [-7...",Polygon,Yonge-St.Clair (97),97,Feature
1,"[[[-79.50528791599992, 43.759873493999955], [-...",Polygon,York University Heights (27),27,Feature
2,"[[[-79.43998431099992, 43.761557654999955], [-...",Polygon,Lansing-Westgate (38),38,Feature
3,"[[[-79.43968732599991, 43.70560981799996], [-7...",Polygon,Yorkdale-Glen Park (31),31,Feature
4,"[[[-79.49262119699992, 43.6474363499999], [-79...",Polygon,Stonegate-Queensway (16),16,Feature


In [22]:
geo_df.rename(columns={"geometry.coordinates": "coordinates", "properties.AREA_NAME": "Neighbourghood", "properties.AREA_S_CD": "id"},inplace=True)

#### Function for centre of district

In [23]:
def GetCentre(d) :
    x,y=zip(*d)
    center=(max(x)+min(x))/2., (max(y)+min(y))/2.
    return(center)

In [24]:
from matplotlib.path import Path
geo_df['coordinates'].iloc[0]
geo_df['Poligon'] = geo_df.apply(lambda row: Path(row.coordinates[0]), axis=1)
geo_df['Centre'] = geo_df.apply(lambda row: GetCentre(row.coordinates[0]), axis=1)
geo_df.head()

Unnamed: 0,coordinates,geometry.type,Neighbourghood,id,type,Poligon,Centre
0,"[[[-79.39119482699992, 43.68108112399995], [-7...",Polygon,Yonge-St.Clair (97),97,Feature,"Path(array([[-79.39119483, 43.68108112],\n ...","(-79.39853901899993, 43.68808772449996)"
1,"[[[-79.50528791599992, 43.759873493999955], [-...",Polygon,York University Heights (27),27,Feature,"Path(array([[-79.50528792, 43.75987349],\n ...","(-79.4922534984999, 43.76490100749996)"
2,"[[[-79.43998431099992, 43.761557654999955], [-...",Polygon,Lansing-Westgate (38),38,Feature,"Path(array([[-79.43998431, 43.76155765],\n ...","(-79.42441129199992, 43.75211632699995)"
3,"[[[-79.43968732599991, 43.70560981799996], [-7...",Polygon,Yorkdale-Glen Park (31),31,Feature,"Path(array([[-79.43968733, 43.70560982],\n ...","(-79.45555520499992, 43.71487854199995)"
4,"[[[-79.49262119699992, 43.6474363499999], [-79...",Polygon,Stonegate-Queensway (16),16,Feature,"Path(array([[-79.4926212 , 43.64743635],\n ...","(-79.49884145449992, 43.63527967899996)"


#### Example of coordinates check

In [25]:
venue=[[-79.397842,43.690523]]
p = Path(geo_df['coordinates'].iloc[0][0]) 
p.contains_points(venue)

array([ True])

### Getting date from foursquare

In [26]:
# The code was removed by Watson Studio for sharing.

In [27]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 3000 # define radius 3km
categoryId='4bf58dd8d48988d102941735'# yoga centre



In [28]:
d=[]
for index, row in geo_df.iterrows():  
    url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&near={},{}&radius={}&limit={}&categoryId={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    geo_df['Centre'][index][1], 
    geo_df['Centre'][index][0], 
    radius, 
    LIMIT,
    categoryId)
    results = requests.get(url).json()    
    d.append({'Venues' : results, 'id':geo_df['id'][index]})

In [29]:
pd.DataFrame(d).head()

Unnamed: 0,Venues,id
0,"{'meta': {'code': 200, 'requestId': '5d46cd670...",97
1,"{'meta': {'code': 200, 'requestId': '5d46cd678...",27
2,"{'meta': {'code': 200, 'requestId': '5d46cd678...",38
3,"{'meta': {'code': 200, 'requestId': '5d46cd688...",31
4,"{'meta': {'code': 200, 'requestId': '5d46cd68b...",16


In [30]:
geo_df=geo_df.set_index('id').join(pd.DataFrame(d).set_index('id'))

In [31]:
venuesList=pd.DataFrame()
for index, row in geo_df.iterrows(): 
    responce = row['Venues']['response']
    if 'venues' in responce:
        venues=responce['venues']
        if venues and 'name' in venues[0]:
            nearby_venues = json_normalize(venues) # flatten JSON
            temp=nearby_venues[['name','location.lat','location.lng']]
            temp.insert(0, 'id', index) 
            temp.insert(4, 'poligon', row.Poligon) 
            
            venuesList=venuesList.append(temp, ignore_index=True)
venuesList.columns = [col.split(".")[-1] for col in venuesList.columns]            
venuesList.head()


Unnamed: 0,id,name,lat,lng,poligon
0,97,Moksha Yoga Uptown,43.688799,-79.394484,"Path(array([[-79.39119483, 43.68108112],\n ..."
1,97,Noor Light Healing Arts Studio,43.665129,-79.410511,"Path(array([[-79.39119483, 43.68108112],\n ..."
2,97,Rainbow Body Yoga,43.664529,-79.380073,"Path(array([[-79.39119483, 43.68108112],\n ..."
3,97,House Of Yoga,43.66378,-79.417956,"Path(array([[-79.39119483, 43.68108112],\n ..."
4,97,Totum Life Science St. Clair,43.686525,-79.383449,"Path(array([[-79.39119483, 43.68108112],\n ..."


In [32]:
def myfunc(lat, lng, poligon):
    venue=[[lng,lat]]
    return poligon.contains_points(venue)

#### Filtered list of yoga studios for each Neighbourhood and grouped then into Neighbourghood's table and geodata table

In [33]:
filteredvenuesList= venuesList[ venuesList.apply(lambda x: myfunc(x.lat,x.lng,x.poligon)[0], axis=1) ]
#filteredvenuesList['id'] = filteredvenuesList['id'].astype(int)

In [34]:
venueData=pd.to_numeric(filteredvenuesList.groupby(['id']).count().rename(columns={'name':'VenueCount'})['VenueCount'])
venueData=venueData/sum(venueData)
venueData.head()

id
001    0.012658
008    0.006329
009    0.006329
013    0.006329
014    0.012658
Name: VenueCount, dtype: float64

In [35]:
geo_df=geo_df.join(venueData).fillna(0)
TorrontoData=TorrontoData.set_index('id').join(venueData).fillna(0)

In [36]:
TorrontoData.head()

Unnamed: 0_level_0,Male: 0 to 04 years,Male: 05 to 09 years,Male: 10 to 14 years,Male: 15 to 19 years,Male: 20 to 24 years,Male: 25 to 29 years,Male: 30 to 34 years,Male: 35 to 39 years,Male: 40 to 44 years,Male: 45 to 49 years,...,Female: 100 years and over,Married,Not married,No diploma,High school,Postsecondary diploma,Employed,Unemployed,Population,VenueCount
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
129,0.009443,0.010022,0.010162,0.011315,0.010419,0.008911,0.007668,0.007229,0.008783,0.009795,...,0.015385,0.01202,0.009613,0.017358,0.013296,0.008099,0.009189,0.01112,0.010658,0.0
128,0.008227,0.007787,0.007083,0.010506,0.010265,0.009174,0.00753,0.006644,0.007049,0.008365,...,0.0,0.009485,0.008224,0.010693,0.010854,0.007583,0.007974,0.009689,0.008697,0.0
20,0.005151,0.003893,0.003464,0.003839,0.003644,0.003117,0.003765,0.004837,0.004854,0.004843,...,0.007692,0.004855,0.003949,0.005314,0.005275,0.003908,0.004705,0.003393,0.004413,0.0
95,0.006367,0.005263,0.005004,0.006263,0.012472,0.018261,0.014785,0.011215,0.009649,0.009355,...,0.038462,0.010391,0.013793,0.0042,0.00761,0.015062,0.012711,0.01022,0.011175,0.037975
42,0.008155,0.009517,0.010393,0.009631,0.007186,0.005663,0.00675,0.007813,0.009418,0.011116,...,0.030769,0.011207,0.009418,0.006082,0.009179,0.011759,0.009557,0.008258,0.010139,0.006329


In [37]:
geo_df.head()

Unnamed: 0_level_0,coordinates,geometry.type,Neighbourghood,type,Poligon,Centre,Venues,VenueCount
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
97,"[[[-79.39119482699992, 43.68108112399995], [-7...",Polygon,Yonge-St.Clair (97),Feature,"Path(array([[-79.39119483, 43.68108112],\n ...","(-79.39853901899993, 43.68808772449996)","{'meta': {'code': 200, 'requestId': '5d46cd670...",0.006329
27,"[[[-79.50528791599992, 43.759873493999955], [-...",Polygon,York University Heights (27),Feature,"Path(array([[-79.50528792, 43.75987349],\n ...","(-79.4922534984999, 43.76490100749996)","{'meta': {'code': 200, 'requestId': '5d46cd678...",0.006329
38,"[[[-79.43998431099992, 43.761557654999955], [-...",Polygon,Lansing-Westgate (38),Feature,"Path(array([[-79.43998431, 43.76155765],\n ...","(-79.42441129199992, 43.75211632699995)","{'meta': {'code': 200, 'requestId': '5d46cd678...",0.0
31,"[[[-79.43968732599991, 43.70560981799996], [-7...",Polygon,Yorkdale-Glen Park (31),Feature,"Path(array([[-79.43968733, 43.70560982],\n ...","(-79.45555520499992, 43.71487854199995)","{'meta': {'code': 200, 'requestId': '5d46cd688...",0.0
16,"[[[-79.49262119699992, 43.6474363499999], [-79...",Polygon,Stonegate-Queensway (16),Feature,"Path(array([[-79.4926212 , 43.64743635],\n ...","(-79.49884145449992, 43.63527967899996)","{'meta': {'code': 200, 'requestId': '5d46cd68b...",0.0


#### Map of spreading Yoga Studios across Toronto

In [38]:
topo = f'https://raw.githubusercontent.com/smtkvb/Python_Coursera_Capstone/master/toronto_topo.json'

geo_df_dict=  geo_df['VenueCount'].reset_index()
geo_df_dict['id'] = geo_df_dict['id'].astype(int)
geo_df_dict=geo_df_dict.set_index('id')

m = folium.Map(location=[latitude, longitude], zoom_start=11)

colorscale = branca.colormap.linear.YlOrRd_09.scale(0, 50e3)

def style_function(feature):
    data = geo_df_dict.loc[int(feature['properties']['id'])]
    return {
        'fillOpacity': 0.7,
        'weight': 0,
        'fillColor': '#black' if data is None else colormap(data.VenueCount)
    }

folium.TopoJson(
    json.loads(requests.get(topo).text),
    'objects.toronto',
    name='topojson',
    style_function=style_function
).add_to(m)

folium.LayerControl().add_to(m)


m


### Intermediate conclusions
As you can see on maps, population are spreaded otherwise than Yoga studios. The latest mostly located at centre of the town. I assume, that there are concentration business centres and other town lifes. <br>Well, even we get negative result from parameters of population, which we will check below, we definetelly can get result from venues surroundings!

#### Check correlation for data

In [39]:
corr = TorrontoData.corr()
print(corr["VenueCount"].abs().sort_values(ascending=False).head(10))


VenueCount                1.000000
Male: 30 to 34 years      0.574702
Female: 25 to 29 years    0.565828
Male: 25 to 29 years      0.552819
Female: 30 to 34 years    0.546951
Male: 35 to 39 years      0.538898
Postsecondary diploma     0.522884
Employed                  0.480992
Male: 40 to 44 years      0.451718
Not married               0.438912
Name: VenueCount, dtype: float64


<hr>

## Checking nearbly venues for  for each yoga class

In [40]:
filteredvenuesList.head()

Unnamed: 0,id,name,lat,lng,poligon
0,97,Moksha Yoga Uptown,43.688799,-79.394484,"Path(array([[-79.39119483, 43.68108112],\n ..."
48,27,York Quad,43.773494,-79.502334,"Path(array([[-79.50528792, 43.75987349],\n ..."
74,63,Beaches Hot Yoga,43.669057,-79.304331,"Path(array([[-79.31485087, 43.66673977],\n ..."
75,63,Bikram Yoga Beaches,43.668817,-79.304346,"Path(array([[-79.31485087, 43.66673977],\n ..."
76,63,Prana Fitness,43.67107,-79.295043,"Path(array([[-79.31485087, 43.66673977],\n ..."


In [41]:
def getNearbyVenues(names, latitudes, longitudes, id, radius=1000):
    
    venues_list=[]
    for name, lat, lng, id in zip(names, latitudes, longitudes, id):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        res=requests.get(url).json()["response"]
        if 'groups' in res:
            results = res['groups'][0]['items']

            # return only relevant information for each nearby venue
            venues_list.append([(
                name, 
                lat, 
                lng,
                id,
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],  
                v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['YogaName', 
                  'Yoga Latitude', 
                  'Yoga Longitude', 
                  'id',           
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [42]:
toronto_nearbly_venues = getNearbyVenues(names=filteredvenuesList['name'],
                                   latitudes=filteredvenuesList['lat'],
                                   longitudes=filteredvenuesList['lng'],
                                   id=filteredvenuesList['id']
                                  )

Moksha Yoga Uptown
York Quad
Beaches Hot Yoga
Bikram Yoga Beaches
Prana Fitness
Downward Dog Yoga Centre-Beaches
Main Fitness Beaches
Afterglow Yoga Studio
Glanville Mediation Services
Beach Yoga Centre
One Tiger Yoga
Bikram Yoga East York
Power Yoga Canada Leaside
Power Yoga Canada Etobicoke
Moksha Yoga Etobicoke
Serendipity Yoga &  Pilates
YogaFit Training Centre
ANKH YOGA
RAH Centre
Pure Yoga Toronto
LiV Yoga Studio
Good Space
Gyan yoga
Chang'e Studio
chi junky
Energy Exchange
Seven Seeds Yoga Studio
Framewrk
YOGAthletix
Spirit Loft Yoga
Setu Yoga Studio
The Flying Yogi
Leslieville Sanctuary
Still Light Centre
BeHot Yoga Toronto
Extreme Fitness Yoga/Hot Yoga Studio
Buddha Body Yoga
Bikram Yoga Yonge
Iam Yoga
Breathe Yoga Studio
fifty-seven
Green Lavender
Lokasa Yoga
Om Daisy
FitBot Studio
Toronto Yoga Mamas
Bomb Wellness
Muse Movement Studio
YogaSpace
Laya Spa & Yoga
Trinity Yoga
Y Yoga
Mind & Body Yoga
Yoga Tree Downtown
Barreworks
Core Studio Yoga & Pilates
Hotel Yoga & Fitness In

In [43]:
print(toronto_nearbly_venues.shape)

(12292, 8)


In [44]:
toronto_nearbly_venues.head()

Unnamed: 0,YogaName,Yoga Latitude,Yoga Longitude,id,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Moksha Yoga Uptown,43.688799,-79.394484,97,The Bagel House,43.687374,-79.393696,Bagel Shop
1,Moksha Yoga Uptown,43.688799,-79.394484,97,Cava Restaurant,43.689809,-79.394932,Tapas Restaurant
2,Moksha Yoga Uptown,43.688799,-79.394484,97,9bars,43.68866,-79.39194,Café
3,Moksha Yoga Uptown,43.688799,-79.394484,97,DAVIDsTEA,43.688376,-79.394158,Tea Room
4,Moksha Yoga Uptown,43.688799,-79.394484,97,Capocaccia Café,43.685915,-79.393305,Italian Restaurant


#### Analyze yoga neighbours for each Neighbourhood

In [45]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_nearbly_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['YogaName'] = toronto_nearbly_venues['YogaName'] 
toronto_onehot['id'] = toronto_nearbly_venues['id'] 

# move neighborhood column to the first column
fixed_columns = [toronto_onehot.columns[-1]] + list(toronto_onehot.columns[:-1])
toronto_onehot = toronto_onehot[fixed_columns]

toronto_onehot.head()

Unnamed: 0,id,Accessories Store,Afghan Restaurant,African Restaurant,American Restaurant,Animal Shelter,Antique Shop,Aquarium,Arcade,Arepa Restaurant,...,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Xinjiang Restaurant,Yoga Studio,YogaName
0,97,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Moksha Yoga Uptown
1,97,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Moksha Yoga Uptown
2,97,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Moksha Yoga Uptown
3,97,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Moksha Yoga Uptown
4,97,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,Moksha Yoga Uptown


In [46]:
toronto_onehot.shape

(12292, 344)

In [47]:
venueData=venueData/venueData.sum()

In [48]:
venueData.head()

id
001    0.012658
008    0.006329
009    0.006329
013    0.006329
014    0.012658
Name: VenueCount, dtype: float64

In [49]:
VenuesListCorrelatedTemp=toronto_onehot.drop(['YogaName'], axis=1).groupby(['id']).mean().join(venueData).fillna(0).corr()

In [50]:
VenuesListCorrelatedTemp['VenueCount'].drop(['VenueCount'], axis=0).abs().sort_values(ascending=False).head(10)

Latin American Restaurant    0.644522
Hotel Bar                    0.574975
Arepa Restaurant             0.547231
Other Nightlife              0.528123
Lake                         0.528123
Basketball Stadium           0.528123
Baseball Stadium             0.528123
Aquarium                     0.528123
Roof Deck                    0.528123
Street Art                   0.519939
Name: VenueCount, dtype: float64

Well, as we can see, there are a lot of categories, which are not connected with yoga at all, but they are most popular in centre regions, so we can use it to find, how much yoga centres should be for this categories... 

In [51]:
Columns=''
for items in VenuesListCorrelatedTemp['VenueCount'].drop(['VenueCount'], axis=0).abs().sort_values(ascending=False).head(10).iteritems(): 
    Columns=Columns+'\''+items[0]+'\','
Columns    

"'Latin American Restaurant','Hotel Bar','Arepa Restaurant','Other Nightlife','Lake','Basketball Stadium','Baseball Stadium','Aquarium','Roof Deck','Street Art',"

In [52]:
CategoryesData=toronto_onehot[['id', 'Other Nightlife','Aquarium','Basketball Stadium','Roof Deck','Lake','Baseball Stadium','Arepa Restaurant','Latin American Restaurant','Street Art','Udon Restaurant']]
CategoryesData=CategoryesData.groupby(['id']).mean().join(venueData).fillna(0)
CategoryesData.head()

Unnamed: 0_level_0,Other Nightlife,Aquarium,Basketball Stadium,Roof Deck,Lake,Baseball Stadium,Arepa Restaurant,Latin American Restaurant,Street Art,Udon Restaurant,VenueCount
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012658
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.006329
9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.006329
13,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.006329
14,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012658


Let's get from this data related to sport and health - 'Aquarium','Basketball Stadium','Lake','Baseball Stadium'

### Creating model 

Let's create mode with nearest neighbours of Yoga classes and population data

In [53]:
from sklearn.linear_model import LinearRegression
from sklearn import linear_model

In [54]:
CategoryesData['VenueCount'].shape

(61,)

In [55]:
ComposedData=pd.concat(
    [
        CategoryesData[['Aquarium','Basketball Stadium','Lake','Baseball Stadium']],
        TorrontoData[['Male: 30 to 34 years','Female: 25 to 29 years','Female: 30 to 34 years','Male: 25 to 29 years','Male: 35 to 39 years']],
        CategoryesData['VenueCount']
    ], 
    axis=1, 
    sort=False
).fillna(0)
ComposedData.head()


Unnamed: 0,Aquarium,Basketball Stadium,Lake,Baseball Stadium,Male: 30 to 34 years,Female: 25 to 29 years,Female: 30 to 34 years,Male: 25 to 29 years,Male: 35 to 39 years,VenueCount
1,0.0,0.0,0.0,0.0,0.010469,0.011467,0.010331,0.012467,0.010205,0.012658
8,0.0,0.0,0.0,0.0,0.002663,0.002394,0.00268,0.002107,0.002817,0.006329
9,0.0,0.0,0.0,0.0,0.003903,0.003528,0.004106,0.004214,0.004093,0.006329
13,0.0,0.0,0.0,0.0,0.003857,0.00399,0.003804,0.004521,0.003614,0.006329
14,0.0,0.0,0.0,0.0,0.019055,0.017137,0.017247,0.017032,0.018284,0.012658


In [56]:
X = ComposedData[['Aquarium','Basketball Stadium','Lake','Baseball Stadium','Male: 30 to 34 years','Female: 25 to 29 years','Female: 30 to 34 years','Male: 25 to 29 years','Male: 35 to 39 years']]
Y = ComposedData['VenueCount']

In [57]:
regr = linear_model.LinearRegression()
regr.fit(X, Y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

In [58]:
print('Intercept: \n', regr.intercept_)
print('Coefficients: \n', regr.coef_)

Intercept: 
 0.0018975963564597928
Coefficients: 
 [ 4.22495483  1.05623871  1.05623871  4.22495483  3.72596594  5.23733584
 -5.51983556 -4.80230652  2.06594061]


In [77]:
ComposedData['ModelValues']=regr.predict(X)
ComposedData['res']=ComposedData['ModelValues']-ComposedData['VenueCount']

Let's select 2 Neighbourhoods, where difference  between model value and real quantity of yoga centres is the biggest

In [78]:
ComposedData.sort_values(by=(['res']), ascending=False).head(2)

Unnamed: 0,Aquarium,Basketball Stadium,Lake,Baseball Stadium,Male: 30 to 34 years,Female: 25 to 29 years,Female: 30 to 34 years,Male: 25 to 29 years,Male: 35 to 39 years,VenueCount,ModelValues,res
82,0.0,0.0,0.0,0.0,0.031223,0.030326,0.02922,0.028664,0.023334,0.0,0.026323,0.026323
80,0.0,0.0,0.0,0.0,0.008357,0.011089,0.008169,0.00913,0.006272,0.0,0.015126,0.015126


They are 82 and 80 neighbourhoods

#### Let's see the map and show there existsing yogas

In [74]:
topo = f'https://raw.githubusercontent.com/smtkvb/Python_Coursera_Capstone/master/toronto_topo.json'
temp_data=geo_df_dict.reset_index()
temp_data.loc[(temp_data['id'] !=82) & (temp_data['id'] !=80),'VenueCount']=0
temp_data.loc[(temp_data['id'] ==82) | (temp_data['id'] ==80),'VenueCount']=1
temp_data=temp_data.set_index('id')

m = folium.Map(location=[latitude, longitude], zoom_start=11)

colormap = linear.YlGn_09.scale(0,1)

def style_function(feature):
    data = temp_data.loc[int(feature['properties']['id'])]
    return {
        'fillOpacity': 0.7,
        'weight': 0,
        'fillColor': '#black' if data is None else colormap(data.VenueCount)
    }

folium.TopoJson(
    json.loads(requests.get(topo).text),
    'objects.toronto',
    name='topojson',
    style_function=style_function
).add_to(m)

folium.LayerControl().add_to(m)

yogas = filteredvenuesList.loc[(filteredvenuesList['id']=="082") | (filteredvenuesList['id']=="080")]


# loop through the 100 crimes and add each to the incidents feature group
for lat, lng, in zip(yogas['lat'],yogas['lng']):
        folium.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color='blue',
            fill=True,
            fill_color='blue',
            fill_opacity=0.6
        ).add_to(m)



m


In [73]:
TorrontoNeighbourhood.loc[(TorrontoNeighbourhood['id']=="082") | (TorrontoNeighbourhood['id']=="080")]

Idx,id
Niagara,82
Palmerston-Little Italy,80


### So as we can see, we can palce new yoga classes at Mimico and Palmerston-Little Italy Neighbourhoods

In [90]:
ComposedData["resValue"]=ComposedData.res*filteredvenuesList.id.count()

In [92]:
ComposedData.sort_values(by=(['res']), ascending=False).head(2)

Unnamed: 0,Aquarium,Basketball Stadium,Lake,Baseball Stadium,Male: 30 to 34 years,Female: 25 to 29 years,Female: 30 to 34 years,Male: 25 to 29 years,Male: 35 to 39 years,VenueCount,ModelValues,res,resValue
82,0.0,0.0,0.0,0.0,0.031223,0.030326,0.02922,0.028664,0.023334,0.0,0.026323,0.026323,4.158982
80,0.0,0.0,0.0,0.0,0.008357,0.011089,0.008169,0.00913,0.006272,0.0,0.015126,0.015126,2.389905
