## The battle of Neighborhoods - Capstone Project at Coursera Applied Data Science

## Table of contents
* [Introduction. Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

### Introduction. Business problem <a name="introduction"></a>
STEM education (where STEM stands for **S**cience, **T**echnology, **E**ngineering, **M**athematics) is not new for developed countries where technical education is already a part of studying at schools. However, in developing countries like Ukraine where I am from, functions of STEM education still rare at regular schools and delivered as services by Technic clubs. So it is up to parents/kids whether he would attend such courses.

On the other side such clubs themselves are commercial projects with profit as an aim. Though more and more clubs opening last years there is still great share of market of technical education for children to be covered. And the most lucrative is opening new Robotics club in Kyiv - the capital of Ukraine with more than 3 million people officially living there (real number much higher). 

Starting such business is not hard - couple of franchise offers is readily available. But the main challenge that would define success of this business is **choosing proper place to start** - that is where Foursquare and data analysis come into force.

#### Business problem
Mistake in choosing the place in this case might cost existence of business and loss of large part of investment. Hence entrepreneur, planning to start this business, as the main stakeholder, should be desirable for support in this choice

To choose best place for starting Robotics club we need to take into account many aspects. Most important of them are:
- How to divide the city into neighborhoods (no such information readily available so decision must be taken)
- Amount of schools in neighborhood (the more - the better)
- Amount of rivals (technical / math courses for school children, the less - the better)
- Amount of other not technical courses for same auditory (they might be interested in other education as well)
- Cost of rent (requires some work to get some indirect markers of price from Foursquare while rent prices not available there)

Obviously exploring Foursquare data for the city would answer most of above questions (and with some enhancements described in data section - all of them) so let us move on to define what data can we collect and which way to use it.

### Data - sourcing and usage <a name="data"></a>

First lets install necessary libraries and then move to explanation on how data would be sourced and used

In [1]:
#importing libraries
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         237 KB

The following NEW packages will be INSTALLED:

    geographiclib: 1.49-py_0        conda-forge
    geopy:         1.20.0-py_0      conda-forge

The following packages will be UPDATED:

    certifi:       2019.6.

To answer main question of the project - i.e. choose best place to start with Robotics club following steps will be performed:

**Training data** 
We need to divide city into at least districts if not neighborhoods. Geospatial data for none of them is readily available so I have decided to use districts and pick one point for each as a center of a circle covering the most inhabited area (visible on satellite images). Main logic here is that starting club that works on weekends and working day evenings is better where most people live.
This led me to creating following table:



In [2]:
district_centers = pd.read_csv('districts_rent.csv')
district_centers = district_centers.sort_values('District')
district_centers.reset_index(drop = True, inplace = True)
district_centers.head()

Unnamed: 0,District,Lat,Lon,Rent
0,Darnicky,50.406387,30.648453,12000
1,Desnansky,50.517948,30.604093,7800
2,Dniprovsky,50.443046,30.62397,9200
3,Holoseevsky,50.349179,30.550377,13900
4,Obolonsky,50.557136,30.319664,11500


As you can see I have already added column of Rent rate (it is in UAH (Ukrainian Hryvnas) per 2 room flat). This is not commercial real estate prices which I have not found on the web, however it would give a clue on what districts are more expensive when building a model.
As a source of information I have used infographics from olx.ua. In case you do not understand Cyrillic letters - there are districts of Kyiv and average prices for one, two and three room flats during 1H 2019.

![wewrwer](1h2019_rentrates_districts.jpg)

**Parameters of model**
Main parameters of model that would help us score alternatives are Primary schools / Rivals / Other courses / Rent level.
All this can be answered using Foursquare data. As it is can be seen request for search_query = 'primary school' in Kyiv, returns only couple primary schools and rest is plenty of various Courses / Language Schools / Coworking etc.
 1. Using category we might filter only schools and for each Neighborhood record quantity of schools in 3 km radius
 2. Using category we might as well filter rivals (if it is technical education) and other courses for same age group (non-technical education). Those quantities would give us rivals and complements.
 3. Finally we already have rent prices for districts though for living space. So we should normalize it and use in a model.


**Alternatives**
Using same webpage olx.ua I have picked three (sometimes district did not have enough offers so sometimes it was only 1-2) alternatives of possible office spaces that can be used for starting Robotics club. I have chosen them using filter on square 90-130 m2, ready to be used (does not need renovation) and has easy access.
All alternatives collected in following table:

In [3]:
alternatives = pd.read_csv('alternatives.csv')
alternatives = alternatives.sort_values('District')
alternatives.reset_index(drop = True, inplace = True)
alternatives.head()

Unnamed: 0,District,Address,Square_m2,Rent
0,Darnicky,"Kyiv, Hryhorenka 20",90,24000
1,Darnicky,"Kyiv, Chavdar 3",110,38500
2,Darnicky,"Kyiv, Dniprovska Embarkment 26",120,60000
3,Desnansky,"Kyiv, Marshala Zhukova 33a",112,11000
4,Dniprovsky,"Kyiv, Kharkivske Highway 188",90,28000


As you can see it has various square so for comparison purposes we should add column with rent/m2 and remove Square & Rent columns

In [4]:
alternatives['Rent_m2'] = alternatives['Rent']/alternatives['Square_m2']
alternatives = alternatives.drop(columns=["Square_m2","Rent"])

In [8]:
alternatives.head()

Unnamed: 0,District,Address,Rent_m2
0,Darnicky,"Kyiv, Hryhorenka 20",266.666667
1,Darnicky,"Kyiv, Chavdar 3",350.0
2,Darnicky,"Kyiv, Dniprovska Embarkment 26",500.0
3,Desnansky,"Kyiv, Marshala Zhukova 33a",98.214286
4,Dniprovsky,"Kyiv, Kharkivske Highway 188",311.111111


We have addresses but for using Foursquare and Folium we need coordinates. So using Nominatim Geolocator to turn addresses into coordinates. 
(it appeared that geolocator accepts only exact spell of street name so finding proper transliteration of Cyrillic letters cost me couple hours)

In [5]:
#Creating list of alternatives addresses, getting coordinates for each address 
address_list = alternatives['Address']
lon = []
lat = []
for i in range(len(address_list)):
    geolocator = Nominatim(user_agent="ny_explorer")
    address = address_list[i]
    location = geolocator.geocode(address)
    lat.append(location.latitude)
    lon.append(location.longitude)
alternatives['lat'] = pd.DataFrame(lat)
alternatives['lon'] = pd.DataFrame(lon)
    
#print('The geograpical coordinate of {} are {}, {}.'.format(address,latitude, longitude))
#results = pd.DataFrame({'Name':['Kyiv'],'Latitude':[latitude],'Longitude':[longitude]})


In [6]:
#Adding those lists back to the alternatives DataFrame
alternatives['lat'] = pd.DataFrame(lat)
alternatives['lon'] = pd.DataFrame(lon)

In [7]:
alternatives.head()

Unnamed: 0,District,Address,Rent_m2,lat,lon
0,Darnicky,"Kyiv, Hryhorenka 20",266.666667,50.410007,30.626258
1,Darnicky,"Kyiv, Chavdar 3",350.0,50.393878,30.619779
2,Darnicky,"Kyiv, Dniprovska Embarkment 26",500.0,50.394111,30.613044
3,Desnansky,"Kyiv, Marshala Zhukova 33a",98.214286,50.477461,30.636167
4,Dniprovsky,"Kyiv, Kharkivske Highway 188",311.111111,50.407044,30.673148


Let us see all these alternatives on the map of Kyiv

In [8]:
#getting coordinates for Kyiv 
address = 'Kyiv, Ukraine'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# create map of Kyiv using latitude and longitude values
map_kyiv = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers of alternatives to map
for lat, lng, label in zip(alternatives['lat'], alternatives['lon'], alternatives['Address']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_kyiv)  

# add markers of districts to map
for lat, lng, label in zip(district_centers['Lat'], district_centers['Lon'], district_centers['District']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=7,
        popup=label,
        color='red',
        fill=False,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_kyiv)  
    

#showing map
map_kyiv

We can see that one of alternatives appeared on the map not in Kyiv but in neighbor city Boryspil so this alternative should be deleted

In [9]:
#Get names of indexes for which column Address equals to Kyiv, Mechnikova 2
indexName = alternatives[alternatives['Address']=='Kyiv, Mechnikova 2'].index

# Delete these row indexes from dataFrame
alternatives.drop(indexName , inplace=True)

In [10]:
# create map of Kyiv using latitude and longitude values
map_kyiv = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers of alternatives to map
for lat, lng, label in zip(alternatives['lat'], alternatives['lon'], alternatives['Address']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_kyiv)  

# add markers of districts to map
for lat, lng, label in zip(district_centers['Lat'], district_centers['Lon'], district_centers['District']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=7,
        popup=label,
        color='red',
        fill=False,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_kyiv)  
    

#showing map
map_kyiv

### Foursquare - getting the key

So now we have coordinates of centers in all districts (call it train data) and coordinates of alternatives as well as information in the rent rate. On the next step we will explore Foursquare data for each place and its neighborhoods to find number of Schools, Entertaiments, Other courses. (I've decided not to use Restaurants/Cafe as proxy as it gives back a lot of venues quickly decreasing free daily limit)

Necessary information on categories ID were found at https://developer.foursquare.com/docs/resources/categories

For schools they are:
- Preschool : 52e81612bcbc57f1066b7a45
- Elementary School : 4f4533804b9074f6e4fb0105
- Middle School : 4f4533814b9074f6e4fb0106

For possible rivals they are:
- Entertainment Service : 56aa371be4b08b9a8d573554

For alternative courses: 
- Language School : 52e81612bcbc57f1066b7a48
- Recreation Center : 52e81612bcbc57f1066b7a26
- Arts and Entertainment : 4d4b7104d754a06370d81259

I have found with some surprise that rivals (Technical education clubs) are presented on Foursquare not in an education related category but as an entertainment service. So I had to adopt my initial plan to build model not for rivals only expected but for overall entertainment services venues amount expected for this surrounding and than comparing it with actual. Then I would simply check all exact direct rivals locations and eliminate alternatives where rivals are less than 2 km away (distance on surface, actual via roads might be higher) from the alternative coordinates.

In [11]:
#Credentials
CLIENT_ID = 'N2NVGFIVEEPWIDUNHMWCR0Q2HIQI3XYGO03OVZ3CD5JCK2D0' # your Foursquare ID
CLIENT_SECRET = 'HFQRYHNC5BWU3YTFDQLNN0KLDSD2YOLOZEMHY3SCCYTU4WZE' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [12]:
#declaring function to collect venues data in Preschool category
def getNearbyVenues(names, latitudes, longitudes, category = '52e81612bcbc57f1066b7a45', radius=1500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&categoryId={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION,
            category,
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#Getting information of not more than 100 venues of each category for each pair of coordinates in Districts and Alternatives
LIMIT = 100
#Getting Preschools data
venues_districts_preschool = getNearbyVenues(names=district_centers['District'], 
                                   latitudes=district_centers['Lat'],
                                   longitudes=district_centers['Lon']                                   
                            )

venues_alternatives_preschool = getNearbyVenues(names=alternatives['Address'], 
                                   latitudes=alternatives['lat'],
                                   longitudes=alternatives['lon']                                   
                            )

In [13]:
#declaring function to collect venues data in Elementary School category
def getNearbyVenues(names, latitudes, longitudes, category = '4f4533804b9074f6e4fb0105', radius=1500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&categoryId={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION,
            category,
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#Getting information of not more than 100 venues of each category for each pair of coordinates in Districts and Alternatives
LIMIT = 100
#Getting Elementary School data
venues_districts_elementary = getNearbyVenues(names=district_centers['District'], 
                                   latitudes=district_centers['Lat'],
                                   longitudes=district_centers['Lon']                                   
                            )

venues_alternatives_elementary = getNearbyVenues(names=alternatives['Address'], 
                                   latitudes=alternatives['lat'],
                                   longitudes=alternatives['lon']                                   
                            )

In [14]:
#declaring function to collect venues data in Middle School category
def getNearbyVenues(names, latitudes, longitudes, category = '4f4533814b9074f6e4fb0106', radius=1500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&categoryId={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION,
            category,
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#Getting information of not more than 100 venues of each category for each pair of coordinates in Districts and Alternatives
LIMIT = 100
#Getting Middle school data
venues_districts_middle = getNearbyVenues(names=district_centers['District'], 
                                   latitudes=district_centers['Lat'],
                                   longitudes=district_centers['Lon']                                   
                            )

venues_alternatives_middle = getNearbyVenues(names=alternatives['Address'], 
                                   latitudes=alternatives['lat'],
                                   longitudes=alternatives['lon']                                   
                            )

In [15]:
#declaring function to collect venues data in Enterainment Service category
def getNearbyVenues(names, latitudes, longitudes, category = '56aa371be4b08b9a8d573554', radius=1500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&categoryId={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION,
            category,
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#Getting information of not more than 100 venues of each category for each pair of coordinates in Districts and Alternatives
LIMIT = 100

venues_districts_entertainmentservice = getNearbyVenues(names=district_centers['District'], 
                                   latitudes=district_centers['Lat'],
                                   longitudes=district_centers['Lon']                                   
                            )

venues_alternatives_entertainmentservice = getNearbyVenues(names=alternatives['Address'], 
                                   latitudes=alternatives['lat'],
                                   longitudes=alternatives['lon']                                   
                            )

In [16]:
#declaring function to collect venues data in Language Schools category
def getNearbyVenues(names, latitudes, longitudes, category = '52e81612bcbc57f1066b7a48', radius=1500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&categoryId={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION,
            category,
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#Getting information of not more than 100 venues of each category for each pair of coordinates in Districts and Alternatives
LIMIT = 100

venues_districts_languageschools = getNearbyVenues(names=district_centers['District'], 
                                   latitudes=district_centers['Lat'],
                                   longitudes=district_centers['Lon']                                   
                            )

venues_alternatives_languageschools = getNearbyVenues(names=alternatives['Address'], 
                                   latitudes=alternatives['lat'],
                                   longitudes=alternatives['lon']                                   
                            )

In [17]:
#declaring function to collect venues data in Recreation Centers category
def getNearbyVenues(names, latitudes, longitudes, category = '52e81612bcbc57f1066b7a26', radius=1500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&categoryId={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION,
            category,
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#Getting information of not more than 100 venues of each category for each pair of coordinates in Districts and Alternatives
LIMIT = 100

venues_districts_recreationcenters = getNearbyVenues(names=district_centers['District'], 
                                   latitudes=district_centers['Lat'],
                                   longitudes=district_centers['Lon']                                   
                            )

venues_alternatives_recreationcenters = getNearbyVenues(names=alternatives['Address'], 
                                   latitudes=alternatives['lat'],
                                   longitudes=alternatives['lon']                                   
                            )

In [18]:
#declaring function to collect venues data in Arts and Entertainment category
LIMIT = 300
def getNearbyVenues(names, latitudes, longitudes, category = '4d4b7104d754a06370d81259', radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&categoryId={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION,
            category,
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#Getting information of not more than 100 venues of each category for each pair of coordinates in Districts and Alternatives
LIMIT = 300

venues_districts_artentertainment = getNearbyVenues(names=district_centers['District'], 
                                   latitudes=district_centers['Lat'],
                                   longitudes=district_centers['Lon']                                   
                            )

venues_alternatives_artentertainment = getNearbyVenues(names=alternatives['Address'], 
                                   latitudes=alternatives['lat'],
                                   longitudes=alternatives['lon']                                   
                            )

So we have collected list of venues for both Districts and Alternatives in following categories:
Preschool: venues_districts_preschool / venues_alternatives_preschool
Elementary School: venues_districts_elementary / venues_alternatives_elementary
Middle School: venues_districts_middle / venues_alternatives_middle

Category which contains possible rivals: 

Entertainment service: venues_districts_entertainmentservice / venues_alternatives_entertainmentservice

And categories where clients might be interested in technical education:

Language schools: venues_districts_languageschools / venues_alternatives_languageschools
Recreation centers: venues_districts_recreationcenters / venues_alternatives_recreationcenters
Art / Entertainment: venues_districts_artentertainment / venues_alternatives_artentertainment

In categories which had subgroups in Venue Category field we would find the name of subgroup so groupby method in pandas would consider them different. So lets unify Venue Category column for each dataframe

In [19]:
venues_districts_preschool['Venue Category'] = 'Preschool'
venues_alternatives_preschool['Venue Category'] = 'Preschool'

venues_districts_elementary['Venue Category'] ='Elementary school'
venues_alternatives_elementary['Venue Category'] ='Elementary school'

venues_districts_middle['Venue Category'] ='Middle school'
venues_alternatives_middle['Venue Category'] ='Middle school'

venues_districts_entertainmentservice['Venue Category'] ='Entertainment Service'
venues_alternatives_entertainmentservice['Venue Category'] ='Entertainment Service'

venues_districts_languageschools['Venue Category'] ='Language Schools'
venues_alternatives_languageschools['Venue Category'] ='Language Schools'

venues_districts_recreationcenters['Venue Category'] ='Recreation centers'
venues_alternatives_recreationcenters['Venue Category'] ='Recreation centers'

venues_districts_artentertainment['Venue Category'] ='Art and Entertainment'
venues_alternatives_artentertainment['Venue Category'] ='Art and Entertainment'

Lets merge them into two dataframes: venues_districts & venues_alternatives

In [20]:
venues_districts = pd.concat([venues_districts_preschool,venues_districts_elementary,venues_districts_middle,venues_districts_entertainmentservice,venues_districts_languageschools,venues_districts_recreationcenters,venues_districts_artentertainment])
venues_alternatives = pd.concat([venues_alternatives_preschool,venues_alternatives_elementary,venues_alternatives_middle,venues_alternatives_entertainmentservice,venues_alternatives_languageschools,venues_alternatives_recreationcenters,venues_alternatives_artentertainment])

In [21]:
#checking count of records in each dataframe
print(len(venues_districts))
print(len(venues_alternatives))

205
1181


As for analysis we need rather count or frequency rather than list of venues lets group those dataframes by point names and venues categories

In [22]:
venues_districts.groupby('Neighborhood')['Venue Category'].value_counts()

Neighborhood    Venue Category       
Darnicky        Language Schools         15
                Recreation centers        6
                Art and Entertainment     5
                Middle school             4
                Elementary school         3
                Preschool                 1
Desnansky       Language Schools          6
                Art and Entertainment     5
                Elementary school         4
                Middle school             4
                Preschool                 4
                Recreation centers        2
Dniprovsky      Language Schools          4
                Art and Entertainment     3
                Elementary school         3
                Recreation centers        3
                Preschool                 2
                Middle school             1
Holoseevsky     Recreation centers        1
Obolonsky       Recreation centers        1
Pechersky       Language Schools         11
                Art and Entertainment 

In [23]:
venues_alternatives.groupby('Neighborhood')['Venue Category'].value_counts()

Neighborhood                    Venue Category       
Kyiv, Andriivskyi descent 2     Language Schools         33
                                Art and Entertainment    30
                                Elementary school         4
                                Recreation centers        3
                                Middle school             1
                                Preschool                 1
Kyiv, Antonovycha 103           Language Schools         23
                                Art and Entertainment     9
                                Recreation centers        7
                                Elementary school         3
                                Preschool                 3
                                Entertainment Service     2
                                Middle school             2
Kyiv, Antonovycha 4/6           Language Schools         77
                                Art and Entertainment    19
                                Middle school 

Summarizing data section we got lists of venues of certain category which might have correlation with number of entertainment places around. However, it can be seen that Entertainment service is rare category and within districts theres 0 Entertainment services. It means that we cannot use it as a variable to predict. So I have decided to use data on schools , recreation centers and language schools to define corresponding amount of Art/Entertainment places. 

Now we move to the Methodology section where I would explain how I plan to use this data for model building and detecting the best of alternatives.

### Methodology <a name="methodology"></a>

In this section I will briefly explain how data we collected will be used to choose the best of alternatives

**Step 1.**
We have count and rent rate in absolute numbers. So on the first step we divide all values by maximum expected for this column.

**Step 2.**
In DF Districts we will have our "train data" and in DF Alternatives - our "test data". But before we move on we should check whether all input parameters (venue categories) make any use for regression model. We do this by checking correlation matrix

**Step 3.** Create, train and review Multiple Regression model to forecast Art&Entertainment venues basis on determinants we have selected on Step 2.

**Step 4.** Apply model to alternatives to get expected amount of Art/Entertainment venues around each of alternatives. Leave top 5 variants with highest ratio of expected / actual count of Entertainment venues. These alternatives should be the most interesting and used to choose best one of them

**Step 5.** Have you noticed that we have not ever mentioned direct rivals and used them in our comparison. I have collected database of existing Robotics Clubs in Kyiv and we need to define function that would calculate distance to nearest Robotics Club and apply it to alternatives. Among two equivalent alternatives one is best which has higher distance to direct competitor. Results will be discussed in corresponding section

Let us now apply above methodology to our data

### Analysis <a name="analysis"></a>

**Step 1. Input/Ouptut values**

Lets first once again look at data we collected using describe() method to check the range of values to define what how we should bring them to 0-1 range

In [24]:
district_venues = pd.read_csv('districts_venues.csv')
district_venues.drop(district_venues.iloc[:, 10:13], inplace=True, axis=1)
district_venues.describe()

Unnamed: 0,Lat,Lon,Rent,Pre,Elem,Mid,Art_Enter,Recreation,Language
count,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0,10.0
mean,50.452297,30.496663,12190.0,1.9,2.3,2.0,4.3,2.7,7.2
std,0.059875,0.118864,4342.157682,1.66333,2.002776,1.943651,4.137901,1.888562,9.413938
min,50.349179,30.319664,7800.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,50.417255,30.38873,9475.0,0.25,0.25,0.0,1.25,1.25,0.75
50%,50.44992,30.522555,11400.0,2.0,3.0,2.0,4.0,2.5,3.5
75%,50.485487,30.590664,13275.0,3.5,3.0,4.0,5.0,3.75,9.75
max,50.557136,30.648453,23200.0,4.0,6.0,4.0,13.0,6.0,30.0


In [25]:
alternatives_venues = pd.read_csv('alternatives_venues.csv')
alternatives_venues.drop(alternatives_venues.iloc[:, 11:13], inplace=True, axis=1)
alternatives_venues.describe()

Unnamed: 0,Rent_m2,lat,lon,Pre,Elem,Mid,Art_Enter,Recreation,Language
count,27.0,27.0,27.0,27.0,27.0,27.0,27.0,27.0,27.0
mean,385.072225,50.447043,30.521239,3.074074,2.888889,2.518519,11.62963,4.111111,19.148148
std,135.20431,0.028896,0.079431,1.858989,1.577079,1.528458,11.23233,1.783112,21.315765
min,98.214286,50.393878,30.350865,0.0,0.0,0.0,1.0,0.0,0.0
25%,289.631463,50.429864,30.485419,2.0,2.0,1.0,5.0,3.0,4.5
50%,373.913043,50.446664,30.516659,3.0,3.0,2.0,6.0,4.0,10.0
75%,477.05,50.462555,30.568562,4.0,4.0,3.5,19.0,5.0,27.5
max,620.71,50.505603,30.673148,7.0,6.0,6.0,43.0,7.0,77.0


In [26]:
#dividing each column by max expected value (not max observed) 
district_venues['Rent'] = district_venues['Rent']/30000
alternatives_venues['Rent_m2'] = alternatives_venues['Rent_m2']/1000

district_venues['Pre'] = district_venues['Pre'] / 10
alternatives_venues['Pre'] = alternatives_venues['Pre'] / 10

district_venues['Elem'] = district_venues['Elem'] / 10
alternatives_venues['Elem'] = alternatives_venues['Elem'] / 10

district_venues['Mid'] = district_venues['Mid'] / 10
alternatives_venues['Mid'] = alternatives_venues['Mid'] / 10

district_venues['Art_Enter'] = district_venues['Art_Enter'] / 100
alternatives_venues['Art_Enter'] = alternatives_venues['Art_Enter'] / 100

district_venues['Recreation'] = district_venues['Recreation'] / 10
alternatives_venues['Recreation'] = alternatives_venues['Recreation'] / 10

district_venues['Language'] = district_venues['Language'] / 10
alternatives_venues['Language'] = alternatives_venues['Language'] / 10

In [27]:
district_venues.head()

Unnamed: 0,District,Lat,Lon,Rent,Pre,Elem,Mid,Art_Enter,Recreation,Language
0,Darnicky,50.406387,30.648453,0.4,0.1,0.3,0.4,0.05,0.6,1.5
1,Desnansky,50.517948,30.604093,0.26,0.4,0.4,0.4,0.05,0.2,0.6
2,Dniprovsky,50.443046,30.62397,0.306667,0.2,0.3,0.1,0.03,0.3,0.4
3,Holoseevsky,50.349179,30.550377,0.463333,0.0,0.0,0.0,0.0,0.1,0.0
4,Obolonsky,50.557136,30.319664,0.383333,0.0,0.0,0.0,0.0,0.1,0.0


In [28]:
alternatives_venues.head()

Unnamed: 0,District,Address,Rent_m2,lat,lon,Pre,Elem,Mid,Art_Enter,Recreation,Language
0,Podilsky,"Kyiv, Andriivskyi descent 2",0.5508,50.462261,30.517862,0.1,0.4,0.1,0.3,0.3,3.3
1,Holoseevsky,"Kyiv, Antonovycha 103",0.4541,50.423673,30.516659,0.3,0.3,0.2,0.09,0.7,2.3
2,Holoseevsky,"Kyiv, Antonovycha 4/6",0.424905,50.43892,30.512462,0.4,0.3,0.5,0.19,0.4,7.7
3,Svatoshinsky,"Kyiv, Bessarabska square 9/1",0.436975,50.44265,30.522994,0.3,0.2,0.5,0.31,0.3,7.1
4,Podilsky,"Kyiv, Borisoglibska 15",0.62071,50.463149,30.521558,0.1,0.3,0.1,0.22,0.3,2.5


**Step 2. Correlation matrices**

Using corr() method we can get correlation matrices but we should pay attention only to Art_Enter and its correlation with other venues data.

In [29]:
district_venues.corr()

Unnamed: 0,Lat,Lon,Rent,Pre,Elem,Mid,Art_Enter,Recreation,Language
Lat,1.0,-0.505043,-0.399062,0.040708,0.08425,0.033329,0.05063,-0.267446,-0.124113
Lon,-0.505043,1.0,0.138656,0.378944,0.27268,0.356972,0.076352,0.436695,0.34264
Rent,-0.399062,0.138656,1.0,0.279838,-0.411027,0.065827,0.058315,0.110699,0.30721
Pre,0.040708,0.378944,0.279838,1.0,0.577023,0.68737,0.666729,0.449212,0.611667
Elem,0.08425,0.27268,-0.411027,0.577023,1.0,0.599413,0.738749,0.61396,0.668293
Mid,0.033329,0.356972,0.065827,0.68737,0.599413,1.0,0.828916,0.787012,0.661903
Art_Enter,0.05063,0.076352,0.058315,0.666729,0.738749,0.828916,1.0,0.723709,0.80551
Recreation,-0.267446,0.436695,0.110699,0.449212,0.61396,0.787012,0.723709,1.0,0.753706
Language,-0.124113,0.34264,0.30721,0.611667,0.668293,0.661903,0.80551,0.753706,1.0


In [30]:
alternatives_venues.corr()

Unnamed: 0,Rent_m2,lat,lon,Pre,Elem,Mid,Art_Enter,Recreation,Language
Rent_m2,1.0,-0.160593,0.104354,0.092544,0.309565,-0.179491,0.344559,0.30791,0.266618
lat,-0.160593,1.0,-0.340897,-0.471114,-0.124666,-0.278296,0.117775,-0.52411,-0.063483
lon,0.104354,-0.340897,1.0,0.065695,-0.188127,0.012334,-0.087063,-0.131479,-0.025865
Pre,0.092544,-0.471114,0.065695,1.0,0.4096,0.337903,-0.07784,0.519558,0.102598
Elem,0.309565,-0.124666,-0.188127,0.4096,1.0,0.008864,0.509995,0.14133,0.442139
Mid,-0.179491,-0.278296,0.012334,0.337903,0.008864,1.0,0.076585,0.175618,0.342263
Art_Enter,0.344559,0.117775,-0.087063,-0.07784,0.509995,0.076585,1.0,-0.326245,0.808101
Recreation,0.30791,-0.52411,-0.131479,0.519558,0.14133,0.175618,-0.326245,1.0,-0.1745
Language,0.266618,-0.063483,-0.025865,0.102598,0.442139,0.342263,0.808101,-0.1745,1.0


It can be seen that correlation between different factors is quite strong in most cases in training (Districts) dataset. Somewhat surprisingly for me is that in both sets there is almost no correlation between rent and count of Entertainment venues. This once again assures me in need to pay attention to rent price in results section. However for building model we would use all factors

**Step 3. Multiple linear regression model**

First we need to prepare train and test data and train model

In [31]:
from sklearn import linear_model
regr = linear_model.LinearRegression()
x = np.asanyarray(district_venues[['Rent','Pre','Elem','Mid','Recreation','Language']])
y = np.asanyarray(district_venues[['Art_Enter']])
regr.fit (x, y)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,
         normalize=False)

In [32]:
#quick glance at coefficients
regr.coef_

array([[ 0.17295088, -0.11417027,  0.21067547,  0.17305665, -0.08236479,
        -0.00157314]])

In [33]:
regr.intercept_

array([-0.06527889])

**Step 4. Expected amount of Art/Entertainment venues**

 Now we get expected Art_Enter for each alternative and finally add in alternatives_venues column with ratio of expected to actual Art_Enter

In [34]:
#creating test data Numpy array
xtest = np.asanyarray(alternatives_venues[['Rent_m2','Pre','Elem','Mid','Recreation','Language']])

In [35]:
#getting predicted values
yhat = regr.predict(xtest)

In [36]:
#adding them to alternatives_venues dataframe
alternatives_venues['Enter_predict'] = pd.DataFrame(yhat)

In [37]:
#adding column with ratio of predicted and actual count
alternatives_venues['Enter_predict_rel'] = alternatives_venues['Enter_predict'] / alternatives_venues['Art_Enter'] 

In [38]:
#sorting values bss column we've just added
alternatives_venues = alternatives_venues.sort_values('Enter_predict_rel', ascending = False)
alternatives_venues.reset_index(drop = True, inplace = True)
alternatives_venues.head()

Unnamed: 0,District,Address,Rent_m2,lat,lon,Pre,Elem,Mid,Art_Enter,Recreation,Language,Enter_predict,Enter_predict_rel
0,Dniprovsky,"Kyiv, Miropilska 25",0.25,50.468884,30.620705,0.2,0.1,0.4,0.01,0.3,0.4,0.020076,2.007629
1,Svatoshinsky,"Kyiv, Peremohy Ave 131A",0.130435,50.454428,30.350865,0.2,0.3,0.4,0.02,0.5,0.3,0.025217,1.260842
2,Dniprovsky,"Kyiv, Raisy Okipnoi 10",0.54529,50.44782,30.590449,0.3,0.4,0.2,0.05,0.6,1.0,0.062668,1.253358
3,Darnicky,"Kyiv, Dniprovska Embarkment 26",0.5,50.394111,30.613044,0.7,0.4,0.3,0.03,0.5,0.7,0.035181,1.172698
4,Svatoshinsky,"Kyiv, Palladina 20",0.373913,50.461059,30.356947,0.2,0.3,0.4,0.07,0.4,0.4,0.075406,1.077225


In [39]:
alternatives_venues.reset_index(drop = True, inplace = True)
alternatives_venues.head()

Unnamed: 0,District,Address,Rent_m2,lat,lon,Pre,Elem,Mid,Art_Enter,Recreation,Language,Enter_predict,Enter_predict_rel
0,Dniprovsky,"Kyiv, Miropilska 25",0.25,50.468884,30.620705,0.2,0.1,0.4,0.01,0.3,0.4,0.020076,2.007629
1,Svatoshinsky,"Kyiv, Peremohy Ave 131A",0.130435,50.454428,30.350865,0.2,0.3,0.4,0.02,0.5,0.3,0.025217,1.260842
2,Dniprovsky,"Kyiv, Raisy Okipnoi 10",0.54529,50.44782,30.590449,0.3,0.4,0.2,0.05,0.6,1.0,0.062668,1.253358
3,Darnicky,"Kyiv, Dniprovska Embarkment 26",0.5,50.394111,30.613044,0.7,0.4,0.3,0.03,0.5,0.7,0.035181,1.172698
4,Svatoshinsky,"Kyiv, Palladina 20",0.373913,50.461059,30.356947,0.2,0.3,0.4,0.07,0.4,0.4,0.075406,1.077225


**Step 5. Distance to rivals**

In [40]:
#uploading data on rivals
rivals = pd.read_csv('rivals.csv')
rivals.head()

Unnamed: 0,RivalName,Lat,Lon
0,Vunakhidnyk,50.499436,30.516768
1,Inventor,50.393479,30.627029
2,Vunakhidnyk,50.459785,30.401142
3,RoboKids,50.44542,30.440214
4,Vunakhidnyk,50.427346,30.542137


I will use Equirectangular approximation to calculate approximate distance between to points on a circle (Earth surface):

x = (λ2-λ1) * Math.cos((φ1+φ2)/2);

y = (φ2-φ1);

dist = Math.sqrt(x*x + y*y) * R;

Where λ = lon.toRadians & φ = lat.toRadians

In [41]:
import math

In [42]:
#defining distance function
def dist_circle(lat1,lon1,lat2,lon2):
    #converting degrees into radians
    phy1 = math.radians(lat1)
    phy2 = math.radians(lat2)
    lambda1 = math.radians(lon1)
    lambda2 = math.radians(lon2)
    x = (lambda2 - lambda1)*math.cos((phy1+phy2)/2)
    y = (phy2 - phy1)
    dist = math.sqrt(x*x+y*y)*6371
    #here 6371 is Radios of Earth in km, so result would be in km as well
    return dist

In [43]:
#creating lists of coordinates
rivals_lat = rivals['Lat']
rivals_lon = rivals['Lon']
altern_lat = alternatives_venues['lat']
altern_lon = alternatives_venues['lon']

In [44]:
#creating list of min distances
mindist = [0]*len(alternatives_venues)
for i in range(len(mindist)):
    mind = 40000
    for j in range(len(rivals)):
        if dist_circle(altern_lat[i],altern_lon[i],rivals_lat[j],rivals_lon[j]) < mind:
            mind = dist_circle(altern_lat[i],altern_lon[i],rivals_lat[j],rivals_lon[j])
    mindist[i] = mind
mindist

[2.5077214248310513,
 1.0688991767998663,
 0.4077698042143992,
 0.9938506675588153,
 1.385827479788947,
 0.7423600207150302,
 0.7907526180352739,
 0.3238899924218183,
 0.35680632383025684,
 0.20577164098892667,
 0.06231078981627709,
 0.5280534816070578,
 0.4704211201444258,
 0.38682293774400056,
 0.3452674140830465,
 0.1520788247531519,
 0.28666370997487317,
 0.24126605015430105,
 1.5518316810907788,
 0.6413419965800033,
 0.43856741008285927,
 0.5158477840476239,
 1.7210821117955344,
 3.958583917720604,
 1.3536770793657744,
 0.9951855181009452,
 0.6186977391443921]

In [45]:
#lets add these min distances to initial table
alternatives_venues['Min Distance to Rival'] = mindist

In [46]:
#lets look how top 5 look now and wheter leader has changed
alternatives_venues.head()

Unnamed: 0,District,Address,Rent_m2,lat,lon,Pre,Elem,Mid,Art_Enter,Recreation,Language,Enter_predict,Enter_predict_rel,Min Distance to Rival
0,Dniprovsky,"Kyiv, Miropilska 25",0.25,50.468884,30.620705,0.2,0.1,0.4,0.01,0.3,0.4,0.020076,2.007629,2.507721
1,Svatoshinsky,"Kyiv, Peremohy Ave 131A",0.130435,50.454428,30.350865,0.2,0.3,0.4,0.02,0.5,0.3,0.025217,1.260842,1.068899
2,Dniprovsky,"Kyiv, Raisy Okipnoi 10",0.54529,50.44782,30.590449,0.3,0.4,0.2,0.05,0.6,1.0,0.062668,1.253358,0.40777
3,Darnicky,"Kyiv, Dniprovska Embarkment 26",0.5,50.394111,30.613044,0.7,0.4,0.3,0.03,0.5,0.7,0.035181,1.172698,0.993851
4,Svatoshinsky,"Kyiv, Palladina 20",0.373913,50.461059,30.356947,0.2,0.3,0.4,0.07,0.4,0.4,0.075406,1.077225,1.385827


### Results and Discussion <a name="results"></a>

At the moment alternative "Kyiv, Miropilska 25" looks like the best choice with high distance to rivals and very high ratio of expected entertainment venues to actual. Both parameters are twice higher than 2nd alternative has so lets have closer look to this result.

In [48]:
#best alternative
best_alternative = pd.read_csv('alternatives_venues.csv')
best_alternative[best_alternative['Address'] == 'Kyiv, Miropilska 25']

Unnamed: 0,District,Address,Rent_m2,lat,lon,Pre,Elem,Mid,Art_Enter,Recreation,Language,Entern_service
17,Dniprovsky,"Kyiv, Miropilska 25",250.0,50.468884,30.620705,2,1,4,1,3,4,1


In [49]:
#2nd alternative
best_alternative = pd.read_csv('alternatives_venues.csv')
best_alternative[best_alternative['Address'] == 'Kyiv, Peremohy Ave 131A']

Unnamed: 0,District,Address,Rent_m2,lat,lon,Pre,Elem,Mid,Art_Enter,Recreation,Language,Entern_service
21,Svatoshinsky,"Kyiv, Peremohy Ave 131A",130.434783,50.454428,30.350865,2,3,4,2,5,3,0


In [50]:
#3rd alternative
best_alternative = pd.read_csv('alternatives_venues.csv')
best_alternative[best_alternative['Address'] == 'Kyiv, Raisy Okipnoi 10']

Unnamed: 0,District,Address,Rent_m2,lat,lon,Pre,Elem,Mid,Art_Enter,Recreation,Language,Entern_service
22,Dniprovsky,"Kyiv, Raisy Okipnoi 10",545.29,50.44782,30.590449,3,4,2,5,6,10,0


Lets put this three alternatives on Map with all real rivals (robotics clubs)

In [53]:
# create map of Kyiv using latitude and longitude values
map_kyiv = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers of alternatives to map RivalName	Lat	Lon
for lat, lng, label in zip(rivals['Lat'], rivals['Lon'], rivals['RivalName']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_kyiv)  

# add markers of best alternatives to map
    folium.CircleMarker(
        [50.468884, 30.620705],
        radius=7,
        popup="Number 1",
        color='red',
        fill=False,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_kyiv)  
    folium.CircleMarker(
        [50.454428, 30.350865],
        radius=7,
        popup="Number 2",
        color='yellow',
        fill=False,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_kyiv) 
    folium.CircleMarker(
        [50.44782, 30.590449],
        radius=7,
        popup="Number 3",
        color='green',
        fill=False,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_kyiv) 
    

#showing map
map_kyiv

On the map above Red circle is alternative 1, Yellow circle is alternative 2 and Green circle is alternative 3 while all blue circles are existing Robotics Clubs

This once again proved that first alternative is the best one. Alternative 3 (Green) is very close to two competitors. Even more, the rent rate in alternative 3 is twice more than in alternative one.

Alternative two might be viable and it has lower rent rate and many schools in surrounding. This alternative might be use if 1 for some reason cannot be realized.

And final thoughts regarding alternative 1. It is situated far from competitors, it's a district where people spend most evenings and weekends (living area) and it has 2 preschool and 4 mid school near. This means that best choice is to open Robotics club for the full range of ages from 4 y.o. to 15 y.o.

### Conclusion <a name="conclusion"></a>

In this work choice of alternatives had to be made. Among possible 27 location across capital of Ukraine the best was chosen.
To do this first Folium was used to represent alternatives on Map. Than couple of points in each district was taken to build regression model used for "scoring" alternatives. Foursquare data using https requests was used to get list (and then number) of venues in those categories that might have (and actually have) strong correlation with amount of entertainment venues.

Basis on that data and rent rates multiple linear regression model was trained on selected data to predict amount of Entertainment venues basis count of venues in other categories and rent rate.

On the final stage function to determine distance between two points on the surface of Earth was defined and used in couple with the list of existing Robotics clubs to find minimal distance to nearest rival.

Only one of alternatives happened to be outstanding, i.e. having twice larger distance to rivals and twice less entertainment venues than it should be expected for such surrounding. Moreover, the rent rate is admirable.

So the outcome of this analysis is recommendation for investor to start new Robotics Club at address: Kyiv, Miropilska 25

Good Luck!