# IBM Coursera Capstone

## Introduction

Moving to work and live in a new city, one often wants to find a place that would correspond to the usual corner of life in the current city. I mean urban infrastructure and services.

To solve this problem, you can use Foursqure data by dividing the city you know into clusters and applying this clustering to the city you are going to move to.

Thus, you can get an idea of the most suitable areas for you.

I will solve this problem using the example of two capitals of my native country of Russia: Moscow and St. Petersburg. I live in Moscow and will try to find areas that suit my preferences in St. Petersburg.

In [339]:
# pip install lxml

In [1]:
import pandas as pd
import numpy as np
import re

In [2]:
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

## Data

Make sure that you provide adequate explanation and discussion, with examples, of the data that you will be using, even if it is only Foursquare location data.

Start with reading Moscow Neighbourhood names from wikipedia.

In [130]:
msk= pd.read_html('https://ru.wikipedia.org/wiki/%D0%A1%D0%BF%D0%B8%D1%81%D0%BE%D0%BA_%D1%80%D0%B0%D0%B9%D0%BE%D0%BD%D0%BE%D0%B2_%D0%B8_%D0%BF%D0%BE%D1%81%D0%B5%D0%BB%D0%B5%D0%BD%D0%B8%D0%B9_%D0%9C%D0%BE%D1%81%D0%BA%D0%B2%D1%8B')[0]
msk.drop(msk.columns[[0,1,2,6,7,8,9,10]], axis=1, inplace=True)
msk.columns =['Borough', 'Neighbourhood', 'Distict']

In [338]:
msk.head(3)

Unnamed: 0,Borough,Neighbourhood,Distict,lat,lon
0,Академический,Академический,ЮЗАО,55.6920915,37.5892595
1,Алексеевский,Алексеевский,СВАО,55.82145565,37.643960745306856
2,Алтуфьевский,Алтуфьевский,СВАО,55.880255,37.5816349


Then read Boroughs and Neiborhoods of Saint-Peterburg.

In [117]:
spb_neig= pd.read_html('https://ru.wikipedia.org/wiki/%D0%9A%D0%B0%D1%82%D0%B5%D0%B3%D0%BE%D1%80%D0%B8%D1%8F:%D0%9C%D1%83%D0%BD%D0%B8%D1%86%D0%B8%D0%BF%D0%B0%D0%BB%D1%8C%D0%BD%D1%8B%D0%B5_%D0%BE%D0%B1%D1%80%D0%B0%D0%B7%D0%BE%D0%B2%D0%B0%D0%BD%D0%B8%D1%8F_%D0%A1%D0%B0%D0%BD%D0%BA%D1%82-%D0%9F%D0%B5%D1%82%D0%B5%D1%80%D0%B1%D1%83%D1%80%D0%B3%D0%B0')[1][1][0]
spb_neig= re.sub('[¹³²⁴⁵⁷⁸]', '', spb_neig)
spb_neig= re.sub('№ ', '№', spb_neig)
spb_neig= re.sub('округ', '', spb_neig)
spb_neig= re.sub('остров', '', spb_neig)
spb_neig= re.sub('речка', '', spb_neig)
spb_neig= re.sub('аэродром', '', spb_neig)
spb_neig= re.sub('застава', '', spb_neig)
spb_neig= re.sub('Остров', '', spb_neig)
spb_neig= re.sub('Озеро', '', spb_neig) 
spb_neig= re.sub('меридиан', '', spb_neig) 
spb_neig= re.sub('ворота', '', spb_neig) 
    
spb_neig= re.sub('  ', ' ', spb_neig)
s = pd.Series(spb_neig)
spb_neig= s.str.split(expand=True).transpose().rename(columns={0:'Neighbourhood'}, errors="raise")
spb_bor = pd.read_html('https://ru.wikipedia.org/wiki/%D0%90%D0%B4%D0%BC%D0%B8%D0%BD%D0%B8%D1%81%D1%82%D1%80%D0%B0%D1%82%D0%B8%D0%B2%D0%BD%D0%BE-%D1%82%D0%B5%D1%80%D1%80%D0%B8%D1%82%D0%BE%D1%80%D0%B8%D0%B0%D0%BB%D1%8C%D0%BD%D0%BE%D0%B5_%D0%B4%D0%B5%D0%BB%D0%B5%D0%BD%D0%B8%D0%B5_%D0%A1%D0%B0%D0%BD%D0%BA%D1%82-%D0%9F%D0%B5%D1%82%D0%B5%D1%80%D0%B1%D1%83%D1%80%D0%B3%D0%B0')[0]
spb_bor.drop(spb_bor.columns[[0,2,3,4]], axis=1, inplace=True)
spb_bor.rename(columns={"район": "Borough"}, inplace=True)

In [337]:
spb_neig.head(3)

Unnamed: 0,Neighbourhood
0,Автово
1,Адмиралтейский
2,Академическое


Let's create a function that allow us to get Neighbourhood coordinates based on a string with the name.

In [303]:
def getCoordByRegionName(regionName):
    url = "http://nominatim.openstreetmap.org/search?" + 'q=' + regionName + '&polygon_geojson=1&format=jsonv2'
    
    results = requests.get(url).json()
#    print(results)
    r= pd.DataFrame.from_dict( {'lat' :[None] ,'lon' :[None]})
        
    try:
        r= pd.DataFrame([[results[0]['lat'], results[0]['lon']]])
    except IndexError:
        print("Oops!  That was an error with: "+regionName)
       
    return( r )
#getCoordByRegionName('округ Клёновское, поселение, Москва')

Then get coordinates of Moscow and Saint-Peterburg Neighbourhood.

In [304]:
msk_c = pd.DataFrame.from_dict( {'lat' :[] ,'lon' :[]})

for index, x in msk.iterrows():
    coord = getCoordByRegionName('округ ' + x['Neighbourhood'] + ' , Москва')
    msk_c= msk_c.append({'lat' :coord.iloc[0][0] ,'lon' : coord.iloc[0][1]} , ignore_index=True)
    #msk_c= msk_c.append(pd.Series([coord[[0]], coord[[1]]], index=msk_c.columns ), ignore_index=True)

Oops!  That was an error with: округ Вороновское, поселение , Москва
Oops!  That was an error with: округ Клёновское, поселение , Москва
Oops!  That was an error with: округ Кокошкино, поселение , Москва
Oops!  That was an error with: округ Краснопахорское, поселение , Москва
Oops!  That was an error with: округ Михайлово-Ярцевское, поселение , Москва
Oops!  That was an error with: округ Московский, поселение , Москва
Oops!  That was an error with: округ Мосрентген, поселение , Москва
Oops!  That was an error with: округ Первомайское, поселение , Москва
Oops!  That was an error with: округ Роговское, поселение , Москва
Oops!  That was an error with: округ Рязановское, поселение , Москва
Oops!  That was an error with: округ Троицк, городской округ , Москва
Oops!  That was an error with: округ Щербинка, городской округ , Москва


In [309]:
msk= msk.join(msk_c)
msk= msk[msk.lat.notnull()]

In [398]:
msk['lat']= msk['lat'].astype(float)
msk['lon']= msk['lon'].astype(float)
msk.head(3)

Unnamed: 0,Borough,Neighbourhood,Distict,lat,lon
0,Академический,Академический,ЮЗАО,55.692091,37.589259
1,Алексеевский,Алексеевский,СВАО,55.821456,37.643961
2,Алтуфьевский,Алтуфьевский,СВАО,55.880255,37.581635


In [345]:
spb_c= pd.DataFrame.from_dict( {'lat' :[] ,'lon' :[]})

for index, x in spb_neig.iterrows():
    coord = getCoordByRegionName('округ ' + x['Neighbourhood'] + ' , Санкт-Петербург')
    spb_c= spb_c.append({'lat' :coord.iloc[0][0] ,'lon' : coord.iloc[0][1]} , ignore_index=True)
    #msk_c= msk_c.append(pd.Series([coord[[0]], coord[[1]]], index=msk_c.columns ), ignore_index=True)

Oops!  That was an error with: округ Аптекарский , Санкт-Петербург
Oops!  That was an error with: округ Красненькая , Санкт-Петербург
Oops!  That was an error with: округ Морские , Санкт-Петербург
Oops!  That was an error with: округ Невская , Санкт-Петербург
Oops!  That was an error with: округ Декабристов , Санкт-Петербург
Oops!  That was an error with: округ Сенной , Санкт-Петербург
Oops!  That was an error with: округ Сосновая , Санкт-Петербург
Oops!  That was an error with: округ Поляна , Санкт-Петербург


In [346]:
spb_neig= spb_neig.join(spb_c)
spb_neig= spb_neig[spb_neig.lat.notnull()]

In [405]:
spb_neig['lat']= spb_neig['lat'].astype(float)
spb_neig['lon']= spb_neig['lon'].astype(float)
spb_neig.head(3)

Unnamed: 0,Neighbourhood,lat,lon
0,Автово,59.858637,30.278818
1,Адмиралтейский,59.931054,30.296798
2,Академическое,60.011526,30.394661


Let's visualise results on the folium maps

In [358]:
# Matplotlib and associated plotting modules
#!conda install -c conda-forge matplotlib --yes
import matplotlib.cm as cm
import matplotlib.colors as colors

In [356]:
#!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

In [379]:
msk_center= getCoordByRegionName('Москва')
spb_center= getCoordByRegionName('Санкт-Петербург')

In [404]:
map_moscow = folium.Map(location=msk_center.astype(float).values.tolist()[0], zoom_start=10)

# add markers to the map
markers_colors = []
for lat, lon, poi in zip(msk['lat'], msk['lon'], msk['Neighbourhood']):
    label = folium.Popup(str(poi), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        #color=1,
        fill=True,
        #fill_color=1,
        fill_opacity=0.7).add_to(map_moscow)
       
map_moscow

In [407]:
map_spb = folium.Map(location=spb_center.astype(float).values.tolist()[0], zoom_start=10)

markers_colors = []
for lat, lon, poi in zip(spb_neig['lat'], spb_neig['lon'], spb_neig['Neighbourhood']):
    label = folium.Popup(str(poi), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        #color=1,
        fill=True,
        #fill_color=1,
        fill_opacity=0.7).add_to(map_spb)
       
map_spb

Next step is to load data from foursquare.  
Let's create a function that allow as get information about Moscow and St.Petersburg neighborhoods venues.

In [408]:
CLIENT_ID = 'ZMTK2WDYCDINQIHLP5BVTN03U1NY41AGZVIJPGDZSGPDTOPX' # your Foursquare ID
CLIENT_SECRET = '4PNTRNEXTCD53E3C0ZMWFPQP2WAU50QGQ3KF2CMUSW4COSRT' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ZMTK2WDYCDINQIHLP5BVTN03U1NY41AGZVIJPGDZSGPDTOPX
CLIENT_SECRET:4PNTRNEXTCD53E3C0ZMWFPQP2WAU50QGQ3KF2CMUSW4COSRT


In [411]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now write the code to run the above function on each neighborhood and create a new dataframe for Moscow and St.Peterburg.

In [412]:
msk_venues = getNearbyVenues(names=msk['Neighbourhood'],
                                   latitudes=msk['lat'],
                                   longitudes=msk['lon']
                                  )

Академический
Алексеевский
Алтуфьевский
Арбат
Аэропорт
Бабушкинский
Басманный
Беговой
Бескудниковский
Бибирево
Бирюлёво Восточное
Бирюлёво Западное
Богородское
Братеево
Бутырский
Вешняки
Внуково
Войковский
Восточное Дегунино
Восточное Измайлово
Восточный
Выхино-Жулебино
Гагаринский
Головинский
Гольяново
Даниловский
Дмитровский
Донской
Дорогомилово
Замоскворечье
Западное Дегунино
Зюзино
Зябликово
Ивановское
Измайлово
Капотня
Коньково
Коптево
Косино-Ухтомский
Котловка
Красносельский
Крылатское
Крюково
Кузьминки
Кунцево
Куркино
Левобережный
Лефортово
Лианозово
Ломоносовский
Лосиноостровский
Люблино
Марфино
Марьина Роща
Марьино
Матушкино
Метрогородок
Мещанский
Митино
Можайский
Молжаниновский
Москворечье-Сабурово
Нагатино-Садовники
Нагатинский Затон
Нагорный
Некрасовка
Нижегородский
Новогиреево
Новокосино
Ново-Переделкино
Обручевский
Орехово-Борисово Северное
Орехово-Борисово Южное
Останкинский
Отрадное
Очаково-Матвеевское
Перово
Печатники
Покровское-Стрешнево
Преображенское
Пресненский
Про

In [414]:
spb_venues = getNearbyVenues(names=spb_neig['Neighbourhood'],
                                   latitudes=spb_neig['lat'],
                                   longitudes=spb_neig['lon']
                                  )

Автово
Адмиралтейский
Академическое
Балканский
Большая
Охта
Васильевский
Введенский
Владимирский
Волковское
Гавань
Гагаринское
Георгиевский
Горелово
Гражданка
Дачное
Дворцовый
Екатерингофский
Звёздное
Ивановский
Измайловское
Княжево
Коломна
Коломяги
Комендантский
Константиновское
Кронверкское
Купчино
Ланское
Лахта-Ольгино
Лиговка-Ямская
Литейный
Малая
Охта
Морской
Московская
Нарвский
Народный
Невский
Новоизмайловское
Обуховский
Долгое
Оккервиль
Петровский
Пискарёвка
Полюстрово
Пороховые
Посадский
Правобережный
Прометей
Пулковский
Ржевка
Рыбацкое
Сампсониевское
Светлановское
Северный
Семёновский
Сергиевское
Смольнинское
Сосновское
Ульянка
Урицк
Финляндский
Чкаловское
Шувалово-Озерки
Юго-Запад
Южно-Приморский
Юнтолово
№7
№15
№21
№54
№65
№72
№75
№78


Let's look how is data looks like.

In [415]:
msk_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Академический,55.692091,37.589259,Вкусвилл,55.691800,37.588235,Health Food Store
1,Академический,55.692091,37.589259,Парк «Новые Черёмушки»,55.693547,37.589786,Park
2,Академический,55.692091,37.589259,Spirit Fitness,55.690272,37.601362,Gym / Fitness Center
3,Академический,55.692091,37.589259,Подружка,55.690292,37.601388,Cosmetics Shop
4,Академический,55.692091,37.589259,Cats & Dogs,55.689592,37.601016,Pet Store
...,...,...,...,...,...,...,...
6801,"Филимонковское, поселение",55.586240,37.265358,"Остановка ""Анино""",55.585378,37.263972,Bus Stop
6802,"Щаповское, поселение",55.352489,37.443953,Atlantic City,55.355982,37.450608,Campground
6803,"Щаповское, поселение",55.352489,37.443953,Nikulino Village,55.344541,37.443891,Resort
6804,"Щаповское, поселение",55.352489,37.443953,"Магазин ""Никулинский""",55.344530,37.444056,Health Food Store


In [416]:
spb_venues

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Автово,59.858637,30.278818,Fitness One,59.853499,30.282053,Gym / Fitness Center
1,Автово,59.858637,30.278818,Кода,59.852220,30.283060,Dance Studio
2,Автово,59.858637,30.278818,Подружка,59.852164,30.273006,Cosmetics Shop
3,Автово,59.858637,30.278818,Лэнд,59.852476,30.281625,Grocery Store
4,Автово,59.858637,30.278818,IMPERIAL SHOPS,59.852355,30.268517,Boutique
...,...,...,...,...,...,...,...
5062,№78,59.861993,30.319444,Дикси,59.859379,30.303870,Convenience Store
5063,№78,59.861993,30.319444,Неон,59.859336,30.303708,Pool Hall
5064,№78,59.861993,30.319444,"Остановка ""Новоизмайловский 45""",59.860671,30.305284,Bus Stop
5065,№78,59.861993,30.319444,Лит.ра,59.863323,30.304072,Beer Store


> So, we have insights in a data, next step is to analyse neighbourhoods and find out similar clusters in to cities.