# CAPSTONE PROJECT THE BATTEL OF NEIGHBORHOODS

Applied Data Science Capstone Assignment

By. Leonardo Vega, MSc.

---

##  GEOREFERENCED REAL STATE MARKET STUDY - SUPIA, CALDAS, COLOMBIA

### Business Understanding

The definition of a type of real state project is required in a piece of land nearby Supia Town Center, Caldas, Colombia.  The Land owner (as developer) wants to perform a **georeferenced real state market** study that provide some insights to support the decision of which type project start to plan.

to get higher accuracy, geopy GoogleV3 is used for Geocoding purposes.  Then a foursquuare API is used to explore a **Region of Interes** and understand venue offer.  

The study has the following scope:

1. **Environmental analysis (sociogemographics)** :  National sociodemographics georeferenced analysis;
2. **Environmental analysis (economic sector)** :  National construction sector  georeferenced analysis;
3. **Market analysis**:  National real-state market analysis 
4. **Region of Interest analysis**: After conclusions national scope analysis, an analysis of a Region of Interest is performed to get further insight of what sort of project start to plan.  This region is chosen based on its proximity and cultual and economical influence to Supía.
- Región of interest
> - Supía
> - Riosucio
> - Marmato
> - Manizales
> - Pereira
> - Medellín
5. **Venue Cluster Analysis**: To refine the understanding of customs and likes of people in the Region of Interest.
6. **Places of Interest Analysis**: a real state market will continue by georeferencing places of interest in supía, with its  own set of data, as shown below.  Finally a map evaluation (distance from GPS certified point in land) to different places is performed.

| `Places of Interest`            | `Data Source`                         | `Data of Interest`                                                |
| :---                            | :---                                  | :---                                                              |
| Project location                | Topograghy Study                      | Coordinates                                                       |
| Neighborhoods of Supía          | Supía Development Plan                | Coordinates                                                       |
| Educational Institutions        | Supía Development Plan                | Coordinates                                                       |
| Health Services Entities        | Supía Development Plan                | Coordinates                                                       |
| Current Real State Offer        | Real-State Market Study               | Coordinates, Type, Avaliable Offer, Rooms, Social Level, Price/m2 |
| Venue offer 2                   | Foursquare                            | Coordinates, venue, venue category                                |


Finally a map evaluation (distance from GPS certified point in land) to different places is performed.

In [1]:
# Importing Libraries

import requests # Library to handle requests
import os # Find current working directory

import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis

from geopy.geocoders import GoogleV3 # convert an address into latitude and longitude values
from functools import partial # To pass extra options to the geocode
from geopy.extra.rate_limiter import RateLimiter # RateLimiter allows to perform bulk operations while gracefully handling error responses and adding delays when needed.
from geopy import distance

# Partial geopy parameter definition
myapi = 'AIzaSyCiB0trZYbTk7eEj2e9ls0dMI_ODX7X1Qk'
geolocator = GoogleV3(myapi, user_agent='default_user_agent')
geocode = partial(geolocator.geocode, language='es')

#! pip install folium==0.5.0
import folium # plotting library

print('Libraries imported')

Libraries imported


### 1. Environmental Analysis (Sociodemographics)
#### 1a. Download and Explore Dataset Census

**Downloading the latest census 2018** from the national statistics department, DANE.

In [4]:
# Read data from CSV file
os.listdir(os.getcwd())
df_censo2018 = pd.read_csv('censo2018.csv', encoding = 'latin-1')

print('The censo has {} municipalities.'.format(len(df_censo2018),df_censo2018.shape[0]))
df_censo2018.head()

The censo has 1122 municipalities.


Unnamed: 0,Department,Municipality,Total 2018
0,Antioquia,Medellín,2427129
1,Antioquia,Abejorral,20367
2,Antioquia,Abriaquí,2695
3,Antioquia,Alejandría,4657
4,Antioquia,Amagá,30227


In [5]:
df = df_censo2018['Department']
df.unique()

array(['Antioquia', 'Atlántico', 'Bogotá, D.C.', 'Bolívar', 'Boyacá',
       'Caldas', 'Caquetá', 'Cauca', 'Cesar', 'Córdoba ', 'Cundinamarca',
       'Chocó', 'Huila', 'La Guajira', 'Magdalena', 'Meta', 'Nariño',
       'Norte de Santander', 'Quindío', 'Risaralda', 'Santander', 'Sucre',
       'Tolima', 'Valle del Cauca', 'Arauca', 'Casanare', 'Putumayo',
       'Archipiélago de San Andrés', 'Amazonas', 'Guainía', 'Guaviare',
       'Vaupés', 'Vichada'], dtype=object)

Validating if there are duplicated municipalities in censo 2018

In [6]:
mun=df_censo2018['Municipality']+', '+df_censo2018['Department']
mun_2018 = pd.DataFrame(mun, columns=['Municipality'])
print('The censo has {} municipalities.'.format(mun_2018['Municipality'].nunique(),df_censo2018.shape[0]))

The censo has 1122 municipalities.


In [7]:
df_censo2018.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1122 entries, 0 to 1121
Data columns (total 3 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Department    1122 non-null   object
 1   Municipality  1122 non-null   object
 2   Total 2018    1122 non-null   int64 
dtypes: int64(1), object(2)
memory usage: 26.4+ KB


In [8]:
df_censo2018[['Location']] = mun_2018[['Municipality']]
df_censo2018.head()

Unnamed: 0,Department,Municipality,Total 2018,Location
0,Antioquia,Medellín,2427129,"Medellín, Antioquia"
1,Antioquia,Abejorral,20367,"Abejorral, Antioquia"
2,Antioquia,Abriaquí,2695,"Abriaquí, Antioquia"
3,Antioquia,Alejandría,4657,"Alejandría, Antioquia"
4,Antioquia,Amagá,30227,"Amagá, Antioquia"


It is demonstrated there are **not duplicated municipalities in census 2018**.  Accordingly to the latest census, Colombia has 1,222 municipalities

**Downloading census 2005** from the national statistics department, DANE.

In [9]:
# Read data from CSV file
os.listdir(os.getcwd())
df_censo2005 = pd.read_csv('censo2005.csv', encoding = 'latin-1')

print('The censo has {} municipalities.'.format(len(df_censo2005),df_censo2005.shape[0]))
df_censo2005.head() 

The censo has 1114 municipalities.


Unnamed: 0,Department,Municipality,Total 2005
0,Antioquia,Medellín,2219861
1,Antioquia,Abejorral,19893
2,Antioquia,Abriaquí,2173
3,Antioquia,Alejandría,3730
4,Antioquia,Amagá,27115


In [10]:
df = df_censo2005['Department']
df.unique()

array(['Antioquia', 'Atlántico', 'Bogotá, D.C.', 'Bolívar', 'Boyacá',
       'Caldas', 'Caquetá', 'Cauca', 'Cesar', 'Córdoba', 'Cundinamarca',
       'Chocó', 'Huila', 'La Guajira', 'Magdalena', 'Meta', 'Nariño',
       'Norte de Santander', 'Quindío', 'Risaralda', 'Santander', 'Sucre',
       'Tolima', 'Valle del Cauca', 'Arauca', 'Casanare', 'Putumayo',
       'Archipiélago de San Andrés', 'Amazonas', 'Guainía', 'Guaviare',
       'Vaupés', 'Vichada'], dtype=object)

Validating if there are duplicated municipalities in censo 2005

In [11]:
mun=df_censo2005['Municipality']+', '+df_censo2005['Department']
mun_2005= pd.DataFrame(mun, columns=['Municipality'])
print('The censo has {} municipalities.'.format(mun_2005['Municipality'].nunique(),df_censo2005.shape[0]))

The censo has 1114 municipalities.


In [12]:
df_censo2005.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1114 entries, 0 to 1113
Data columns (total 3 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Department    1114 non-null   object
 1   Municipality  1114 non-null   object
 2   Total 2005    1114 non-null   int64 
dtypes: int64(1), object(2)
memory usage: 26.2+ KB


In [13]:
df_censo2005[['Location']] = mun_2005[['Municipality']]
df_censo2005.head()

Unnamed: 0,Department,Municipality,Total 2005,Location
0,Antioquia,Medellín,2219861,"Medellín, Antioquia"
1,Antioquia,Abejorral,19893,"Abejorral, Antioquia"
2,Antioquia,Abriaquí,2173,"Abriaquí, Antioquia"
3,Antioquia,Alejandría,3730,"Alejandría, Antioquia"
4,Antioquia,Amagá,27115,"Amagá, Antioquia"


It is demonstrated there are not duplicated municipalities in census 2005. Accordingly to the census 2005, Colombia had 1,114 municipalities at that time

Updating index to df_censo2018 to location

In [14]:
#update censo 2018 index to be municipality
if(df_censo2018.index.name != 'Location'):
    df_censo2018 = df_censo2018.set_index('Location')

df_censo2018.head()

Unnamed: 0_level_0,Department,Municipality,Total 2018
Location,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
"Medellín, Antioquia",Antioquia,Medellín,2427129
"Abejorral, Antioquia",Antioquia,Abejorral,20367
"Abriaquí, Antioquia",Antioquia,Abriaquí,2695
"Alejandría, Antioquia",Antioquia,Alejandría,4657
"Amagá, Antioquia",Antioquia,Amagá,30227


Updating index to df_censo2005 to location

In [15]:
#update censo 2005 index to be municipality
if(df_censo2005.index.name != 'Location'):
    df_censo2005 = df_censo2005.set_index('Location')

df_censo2005.head()

Unnamed: 0_level_0,Department,Municipality,Total 2005
Location,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
"Medellín, Antioquia",Antioquia,Medellín,2219861
"Abejorral, Antioquia",Antioquia,Abejorral,19893
"Abriaquí, Antioquia",Antioquia,Abriaquí,2173
"Alejandría, Antioquia",Antioquia,Alejandría,3730
"Amagá, Antioquia",Antioquia,Amagá,27115


Comparing Census municipalities, and mantining municipalities with data available from both census

In [16]:
frames = [df_censo2018, df_censo2005]
census = pd.concat(frames, axis=1, join='inner')
census

Unnamed: 0_level_0,Department,Municipality,Total 2018,Department,Municipality,Total 2005
Location,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Medellín, Antioquia",Antioquia,Medellín,2427129,Antioquia,Medellín,2219861
"Abejorral, Antioquia",Antioquia,Abejorral,20367,Antioquia,Abejorral,19893
"Abriaquí, Antioquia",Antioquia,Abriaquí,2695,Antioquia,Abriaquí,2173
"Alejandría, Antioquia",Antioquia,Alejandría,4657,Antioquia,Alejandría,3730
"Amagá, Antioquia",Antioquia,Amagá,30227,Antioquia,Amagá,27115
...,...,...,...,...,...,...
"Taraira, Vaupés",Vaupés,Taraira,2215,Vaupés,Taraira,1015
"Puerto Carreño, Vichada",Vichada,Puerto Carreño,20936,Vichada,Puerto Carreño,12897
"La Primavera, Vichada",Vichada,La Primavera,10122,Vichada,La Primavera,4517
"Santa Rosalía, Vichada",Vichada,Santa Rosalía,4059,Vichada,Santa Rosalía,3188


Validating if there are null values in total popluation in either census

In [17]:
mun_nan_2005 = census['Total 2005'].isnull()
df = pd.DataFrame(mun_nan_2005)
df = df[df['Total 2005'] != False]
df

Unnamed: 0_level_0,Total 2005
Location,Unnamed: 1_level_1


In [18]:
mun_nan_2018 = census['Total 2018'].isnull()
df = pd.DataFrame(mun_nan_2018)
df = df[df['Total 2018'] != False]
df

Unnamed: 0_level_0,Total 2018
Location,Unnamed: 1_level_1


So far, can be concluded that there are 1067 municipalities out of 1122 in census 2018, common to both census.

#### 1b. Data Preparation

Now, getting ready census table for georeferencing analysis

In [19]:
# Dropping unnecesary data 
census.drop(columns=['Department', 'Municipality'], inplace=True)

# Checking data types
census.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1067 entries, Medellín, Antioquia to Cumaribo, Vichada
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype
---  ------      --------------  -----
 0   Total 2018  1067 non-null   int64
 1   Total 2005  1067 non-null   int64
dtypes: int64(2)
memory usage: 25.0+ KB


An **Population Anual Variation** is wanted to understand the needs of housing, as a first step.

In [20]:
year1 = 2005
year2 = 2018
numberYears = year2-year1
df = census.apply(lambda x: ((1+((x[0]-x[1])/x[1]))**(1/numberYears))-1, axis=1)
df= pd.DataFrame(df, columns=['var_pop'])
df

Unnamed: 0_level_0,var_pop
Location,Unnamed: 1_level_1
"Medellín, Antioquia",0.006890
"Abejorral, Antioquia",0.001813
"Abriaquí, Antioquia",0.016699
"Alejandría, Antioquia",0.017221
"Amagá, Antioquia",0.008393
...,...
"Taraira, Vaupés",0.061866
"Puerto Carreño, Vichada",0.037970
"La Primavera, Vichada",0.064033
"Santa Rosalía, Vichada",0.018754


In [21]:
frames = [census, df]
census = pd.concat(frames, axis=1, join='inner')
census

Unnamed: 0_level_0,Total 2018,Total 2005,var_pop
Location,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
"Medellín, Antioquia",2427129,2219861,0.006890
"Abejorral, Antioquia",20367,19893,0.001813
"Abriaquí, Antioquia",2695,2173,0.016699
"Alejandría, Antioquia",4657,3730,0.017221
"Amagá, Antioquia",30227,27115,0.008393
...,...,...,...
"Taraira, Vaupés",2215,1015,0.061866
"Puerto Carreño, Vichada",20936,12897,0.037970
"La Primavera, Vichada",10122,4517,0.064033
"Santa Rosalía, Vichada",4059,3188,0.018754


In [22]:
census.describe()

Unnamed: 0,Total 2018,Total 2005,var_pop
count,1067.0,1067.0,1067.0
mean,43221.76,37288.98,0.009256
std,259738.4,236335.1,0.024748
min,1097.0,290.0,-0.055628
25%,6671.0,6253.0,-0.002834
50%,12664.0,11230.0,0.006422
75%,26477.5,22251.5,0.017654
max,7412566.0,6778691.0,0.413783


#### 1c. National sociodemographics georeferenced analysis

Use geopy library to get the latitude and longitude values of Colombia.

In [23]:
address = 'Colombia'

# Geolocation is a simple and clever application which uses google maps api.
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print('The geograpical coordinate of Supia are {}, {}.'.format(latitude, longitude))
print(location.raw)

The geograpical coordinate of Supia are 4.570868, -74.297333.
{'address_components': [{'long_name': 'Colombia', 'short_name': 'CO', 'types': ['country', 'political']}], 'formatted_address': 'Colombia', 'geometry': {'bounds': {'northeast': {'lat': 13.5177999, 'lng': -66.8463122}, 'southwest': {'lat': -4.227109899999999, 'lng': -81.8317}}, 'location': {'lat': 4.570868, 'lng': -74.297333}, 'location_type': 'APPROXIMATE', 'viewport': {'northeast': {'lat': 13.5177999, 'lng': -66.8463122}, 'southwest': {'lat': -4.227109899999999, 'lng': -81.8317}}}, 'place_id': 'ChIJo5QVrjqkFY4RQKPy7wSaDZo', 'types': ['country', 'political']}


Let's get the geographical coordinates of Municipalities

In [24]:
# Convenient function to delay between geocoding calls
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)
#geocode = lambda query: geolocator.geocode("%s" % query)

df=census
df.reset_index(inplace=True)
df['location'] = df['Location'].apply(geocode)
df['point'] = df['location'].apply(lambda loc: tuple(loc.point) if loc else None)

# Split point column into latitude, longitude and altitude columns
df[['latitude', 'longitude', 'altitude']] = pd.DataFrame(df['point'].tolist(), index=df.index)

#census.head()
df

Unnamed: 0,Location,Total 2018,Total 2005,var_pop,location,point,latitude,longitude,altitude
0,"Medellín, Antioquia",2427129,2219861,0.006890,"(Medellín, Medellin, Antioquia, Colombia, (6.2...","(6.2476376, -75.56581530000001, 0.0)",6.247638,-75.565815,0.0
1,"Abejorral, Antioquia",20367,19893,0.001813,"(Abejorral, Antioquia, Colombia, (5.7897437, -...","(5.7897437, -75.4280185, 0.0)",5.789744,-75.428018,0.0
2,"Abriaquí, Antioquia",2695,2173,0.016699,"(Abriaquí, Antioquia, Colombia, (6.63266249999...","(6.632662499999999, -76.0663467, 0.0)",6.632662,-76.066347,0.0
3,"Alejandría, Antioquia",4657,3730,0.017221,"(Alejandría, Antioquia, Colombia, (6.3758292, ...","(6.3758292, -75.14173699999999, 0.0)",6.375829,-75.141737,0.0
4,"Amagá, Antioquia",30227,27115,0.008393,"(Amagá, Antioquia, Colombia, (6.03711300000000...","(6.037113000000001, -75.702277, 0.0)",6.037113,-75.702277,0.0
...,...,...,...,...,...,...,...,...,...
1062,"Taraira, Vaupés",2215,1015,0.061866,"(Taraira, Vaupés, Colombia, (-0.56442189999999...","(-0.5644218999999999, -69.63385339999999, 0.0)",-0.564422,-69.633853,0.0
1063,"Puerto Carreño, Vichada",20936,12897,0.037970,"(Puerto Carreño, Vichada, Colombia, (6.1899117...","(6.1899117, -67.48256959999999, 0.0)",6.189912,-67.482570,0.0
1064,"La Primavera, Vichada",10122,4517,0.064033,"(La Primavera, Vichada, Colombia, (5.4915919, ...","(5.4915919, -70.4139951, 0.0)",5.491592,-70.413995,0.0
1065,"Santa Rosalía, Vichada",4059,3188,0.018754,"(Santa Rosalia, Santa Rosalía, Vichada, Colomb...","(5.135381, -70.86496090000001, 0.0)",5.135381,-70.864961,0.0


In [25]:
# Drop not assigned values
census = census.dropna() 
census

Unnamed: 0,Location,Total 2018,Total 2005,var_pop,location,point,latitude,longitude,altitude
0,"Medellín, Antioquia",2427129,2219861,0.006890,"(Medellín, Medellin, Antioquia, Colombia, (6.2...","(6.2476376, -75.56581530000001, 0.0)",6.247638,-75.565815,0.0
1,"Abejorral, Antioquia",20367,19893,0.001813,"(Abejorral, Antioquia, Colombia, (5.7897437, -...","(5.7897437, -75.4280185, 0.0)",5.789744,-75.428018,0.0
2,"Abriaquí, Antioquia",2695,2173,0.016699,"(Abriaquí, Antioquia, Colombia, (6.63266249999...","(6.632662499999999, -76.0663467, 0.0)",6.632662,-76.066347,0.0
3,"Alejandría, Antioquia",4657,3730,0.017221,"(Alejandría, Antioquia, Colombia, (6.3758292, ...","(6.3758292, -75.14173699999999, 0.0)",6.375829,-75.141737,0.0
4,"Amagá, Antioquia",30227,27115,0.008393,"(Amagá, Antioquia, Colombia, (6.03711300000000...","(6.037113000000001, -75.702277, 0.0)",6.037113,-75.702277,0.0
...,...,...,...,...,...,...,...,...,...
1062,"Taraira, Vaupés",2215,1015,0.061866,"(Taraira, Vaupés, Colombia, (-0.56442189999999...","(-0.5644218999999999, -69.63385339999999, 0.0)",-0.564422,-69.633853,0.0
1063,"Puerto Carreño, Vichada",20936,12897,0.037970,"(Puerto Carreño, Vichada, Colombia, (6.1899117...","(6.1899117, -67.48256959999999, 0.0)",6.189912,-67.482570,0.0
1064,"La Primavera, Vichada",10122,4517,0.064033,"(La Primavera, Vichada, Colombia, (5.4915919, ...","(5.4915919, -70.4139951, 0.0)",5.491592,-70.413995,0.0
1065,"Santa Rosalía, Vichada",4059,3188,0.018754,"(Santa Rosalia, Santa Rosalía, Vichada, Colomb...","(5.135381, -70.86496090000001, 0.0)",5.135381,-70.864961,0.0


In [26]:
varmax = census['var_pop'].max()
varmin = census['var_pop'].min()
t = np.linspace(varmin, varmax, 4)
print(t)

[-0.05562785  0.10084228  0.25731242  0.41378255]


Create Map from Colombia

In [29]:
# create map of Supia using latitude and longitude values
map = folium.Map(location=[latitude, longitude], zoom_start=5)

# add markers "Municipalities" to map
for label, lat, lng, var, tot in zip(census['Location'], census['latitude'], census['longitude'], census['var_pop'], census['Total 2018']):
    label = folium.Popup(label, parse_html=True)
    
    if var<=t[0]:
        color='black'         
    elif var<=t[1]: 
        color='grey'        
    elif var<=t[2]:
        color='orange'        
    elif var<=t[3]:
        color='cyan'        
    else: 
        color='yellow'
        
    if label!='Supía, Caldas':
        radius=5 
        fill=True
    else:
        color='red'
        radius=50
        fill=False
    
    folium.CircleMarker(
        [lat, lng],
        radius=radius,
        popup=label,
        color=color,
        fill=fill,
        fill_color=color,
        fill_opacity=0.7,
        parse_html=False).add_to(map)

map

# NEXT    !!!!!!!!

In [28]:
census[census['Location']=='Supía, Caldas']

Unnamed: 0,Location,Total 2018,Total 2005,var_pop,location,point,latitude,longitude,altitude
339,"Supía, Caldas",28642,24072,0.013461,"(Supía, Caldas, Colombia, (5.45589, -75.650401...","(5.45589, -75.65040100000002, 0.0)",5.45589,-75.650401,0.0


### Importing List of Neighbourhoods from CSV file

In [None]:
# Read data from CSV file

import os # Find current working directory
os.listdir(os.getcwd())
df_n = pd.read_csv('Neighborhood_Supia.csv') 

# Input Neighborhood Table
#neighbourhood = {'Neighbourhood':['Libertadores', 'Congo', 'El Progreso', 'Las Palmas', 'Guayabal', 'La Cruz', 'La Plazuela', 'Popular',\
#                     'Renán Barco', 'Villa Carmenza', 'La Estrella', 'Los Álamos', 'Codemas', 'La Moravia', 'La Julia', 'La Vega I',\
#                     'La Vega II', 'Los Colores', 'San Lorenzo', 'Torres de la Vega', 'La Pista']}
#df = pd.DataFrame(neighbourhood)

print('Supias has {} neighborhoods.'.format(len(df_n['Neighborhood'].unique()),df_n.shape[0]))

df_n[['Category']] = 'Neighborhood'
df_n

### Importing List of Educational institutions from CSV file

In [None]:
# Read data from CSV file

os.listdir(os.getcwd())
df_e = pd.read_csv('Educational_Supia.csv') 

print('Supias has {} Educational Insitutions.'.format(len(df_e['Institutions'].unique()),df_e.shape[0]))

df_e[['Category']] = 'Educational Institution'
df_e

### Importing List of Health Services

In [None]:
# Read data from CSV file

os.listdir(os.getcwd())
df_s = pd.read_csv('Health_Supia.csv') 

print('Supias has {} Health Services.'.format(len(df_s['Entities'].unique()),df_s.shape[0]))

df_s[['Category']] = 'Health Services'
df_s

### Importing Real Estate Offer

In [None]:
# Read data from CSV file
os.listdir(os.getcwd())
df_i = pd.read_csv('Info_Inmobiliaria_Supia.csv')

print('Supias has {} Real State Projects.'.format(len(df_i['Real State Offer'].unique()),df_i.shape[0]))

df_i[['Category']] = 'Real State Projects'
df_i

### Remove Neighborhoods where 'Address' is not provided

In [None]:
df_n['Address'].replace('', np.nan, inplace=True)
df_n.dropna(subset=['Address'], inplace=True)
df_n.shape

More than one neighborhood can exist in same address area provided, combined these into one row with the neighborhoods separated with a comma

In [None]:
df_n["Neighborhood"] = df_n.groupby("Address")["Neighborhood"].transform(lambda neigh: ', '.join(neigh))

#remove duplicates
df_n = df_n.drop_duplicates()

#update index to be neighborhood if it isn't already
if(df_n.index.name != 'Neighborhood'):
    df_n = df_n.set_index('Neighborhood')
    
df_n

In [None]:
df_n.shape

Use geopy library to get the latitude and longitude values of Supia, Caldas.

In [None]:
address = 'Supia, Caldas'

# Geolocation is a simple and clever application which uses google maps api.
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print('The geograpical coordinate of Supia are {}, {}.'.format(latitude, longitude))
print(location.raw)

### Let's get the geographical coordinates of Neighborhoods of Supia

In [None]:
# Convenient function to delay between geocoding calls
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)
geocode = lambda query: geolocator.geocode("%s, Supia, Caldas" % query)

df_n['location'] = df_n['Address'].apply(geocode)
df_n['Point'] = df_n['location'].apply(lambda loc: tuple(loc.point) if loc else None)

# Split point column into latitude, longitude and altitude columns
df_n[['Latitude', 'Longitude', 'Altitude']] = pd.DataFrame(df_n['Point'].tolist(), index=df_n.index)

df_n.head()

To compare the unique result locations versus number of neighborhoods with address provided

In [None]:
la = df_n['Latitude'].unique()
lo = df_n['Longitude'].unique()

print('The number of geograpical neighborhood coordinate found by Geocoder of Supia are Latitude {}, Longitude {}, Neighborhoods {}'.format(len(la), len(lo), len(df_n)))

### Let's get the geographical coordinates of Educational institutions of Supia

In [None]:
# Conveneint function to delay between geocoding calls
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)
geocode = lambda query: geolocator.geocode("%s, Supia, Caldas" % query)

df_e['location'] = df_e['Address'].apply(geocode)
df_e['Point'] = df_e['location'].apply(lambda loc: tuple(loc.point) if loc else None)

# Split point column into latitude, longitude and altitude columns
df_e[['Latitude', 'Longitude', 'Altitude']] = pd.DataFrame(df_e['Point'].tolist(), index=df_e.index)

#update index to be Insititutions if it isn't already
if(df_e.index.name != 'Institutions'):
    df_e = df_e.set_index('Institutions')

df_e.head()

### Let's get the geographical coordinates of Health Entities of Supia

In [None]:
# Conveneint function to delay between geocoding calls
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)
geocode = lambda query: geolocator.geocode("%s, Supia, Caldas" % query)

df_s['location'] = df_s['Address'].apply(geocode)
df_s['Point'] = df_s['location'].apply(lambda loc: tuple(loc.point) if loc else None)

# Split point column into latitude, longitude and altitude columns
df_s[['Latitude', 'Longitude', 'Altitude']] = pd.DataFrame(df_s['Point'].tolist(), index=df_s.index)

#update index to be Entities if it isn't already
if(df_s.index.name != 'Entities'):
    df_s = df_s.set_index('Entities')

df_s.head()

### Let's get the geographical coordinates of Real State Offer of Supia

In [None]:
# Conveneint function to delay between geocoding calls
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)
geocode = lambda query: geolocator.geocode("%s, Supia, Caldas" % query)

df_i['location'] = df_i['Address'].apply(geocode)
df_i['Point'] = df_i['location'].apply(lambda loc: tuple(loc.point) if loc else None)

# Split point column into latitude, longitude and altitude columns
df_i[['Latitude', 'Longitude', 'Altitude']] = pd.DataFrame(df_i['Point'].tolist(), index=df_i.index)

#update index to be Entities if it isn't already
if(df_i.index.name != 'Real State Offer'):
    df_i = df_i.set_index('Real State Offer')

df_i.head()

### Let's get the geographical coordinates of Project

In [None]:
# project location, latitude, longitude columns

project_loc = {'Location':['Parcelación La Cristalina', 'Parcelación La Cristalina Expansión Urbana ', 'GPS-1', 'GPS-2'],
               'Latitude':[5.454169, 5.455778, 5.455003, 5.45642], 'Longitude':[-75.644269, -75.647433, -75.647734,-75.647916]}
df_p = pd.DataFrame(project_loc)

#update index to be Real State Offer if it isn't already
if(df_p.index.name != 'Location'):
    df_p = df_p.set_index('Location')

df_p

### Create Map from Supia

In [None]:
#! pip install folium==0.5.0
import folium # plotting library

In [None]:
# create map of Supia using latitude and longitude values
map = folium.Map(location=[latitude, longitude], zoom_start=15)

# add markers "Neighborhoods" to map
for label, lat, lng in zip(df_n.index, df_n['Latitude'], df_n['Longitude']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='black',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map)

# add markers "Educational institutions" to map
for label, lat, lng in zip(df_e.index, df_e['Latitude'], df_e['Longitude']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map)
 
# add markers "Health Entities" to map
for label, lat, lng in zip(df_s.index, df_s['Latitude'], df_s['Longitude']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map)

# add markers "Real State Offer" to map
for label, lat, lng in zip(df_i.index, df_i['Latitude'], df_i['Longitude']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='cyan',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map)
 

# add markers "Project" to map
for label, lat, lng in zip(df_p.index, df_p['Latitude'], df_p['Longitude']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='orange',
        fill=True,
        fill_color='red',
        fill_opacity=0.7,
        parse_html=False).add_to(map)
    
map

Next, we are going to start utilizing the Foursquare API to explore the neighborhoods and segment them.

### Define Foursquare Credentials and Version

In [None]:
CLIENT_ID = 'AW4K3GVZ4TSF02O3IVBDYBVSJOPWPO0NO0LSM43PBSLNJCQH' # your Foursquare ID
CLIENT_SECRET = 'K221IIQ4R0O5AHVAUFAUHMQEX4ZMGOC0CS50PMZFUELE2NSZ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

#### Now, let's get the venues that are in Supía

In [None]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 5000 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)

url # display URL

Send the GET request and examine the resutls

In [None]:
results = requests.get(url).json()
results

In Foursquare all the information is in the items key. Before we proceed, let's borrow the get_category_type function from the Foursquare lab.

In [None]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a pandas dataframe.

In [None]:
from pandas import json_normalize

In [None]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues = nearby_venues.rename(columns={'name': 'Venue', 'categories': 'Category', 'lat': 'Latitude', 'lng': 'Longitude'})

nearby_venues.head()

Venues in Supía returned by Foursquare?

In [None]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

Update index

In [None]:
#update index to be Entities if it isn't already
if(nearby_venues.index.name != 'Venue'):
    nearby_venues = nearby_venues.set_index('Venue')

nearby_venues.head()

### Update Map from Supia with venues

In [None]:
# create map of Supia using latitude and longitude values
map = folium.Map(location=[latitude, longitude], zoom_start=15)

# add markers "Neighborhoods" to map
for label, lat, lng in zip(df_n.index, df_n['Latitude'], df_n['Longitude']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='black',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map)

# add markers "Educational institutions" to map
for label, lat, lng in zip(df_e.index, df_e['Latitude'], df_e['Longitude']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map)
 
# add markers "Health Entities" to map
for label, lat, lng in zip(df_s.index, df_s['Latitude'], df_s['Longitude']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map)

# add markers "Real State Offer" to map
for label, lat, lng in zip(df_i.index, df_i['Latitude'], df_i['Longitude']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='cyan',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map)
 

# add markers "Project" to map
for label, lat, lng in zip(df_p.index, df_p['Latitude'], df_p['Longitude']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='orange',
        fill=True,
        fill_color='red',
        fill_opacity=0.7,
        parse_html=False).add_to(map)

# add markers "Venues" to map
for label, lat, lng in zip(nearby_venues.index, nearby_venues['Latitude'], nearby_venues['Longitude']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='white',
        fill=True,
        #fill_color='#3186cc',
        fill_color='#3186cc',
        parse_html=False).add_to(map)
    
map

#### Let's find out how many unique categories can be curated from all the returned venues!

In [None]:
print('There are {} uniques categories.'.format(len(nearby_venues['Category'].unique())))

#### Calculating Distances from Project

Consolidating all the coordinates of interest into one table

In [None]:
coord_int = [df_n[['Category','Latitude','Longitude']], df_e[['Category','Latitude','Longitude']], df_s[['Category','Latitude','Longitude']], 
             df_i[['Category','Latitude','Longitude']], nearby_venues[['Category','Latitude','Longitude']]]

c = pd.concat(coord_int)
c

Adding projec coordinate reference to calculate distance to each coordinate of interest

In [None]:
coord_ref = df_p.loc['GPS-2',:]

c[['Latitude_Ref', 'Longitude_Ref']] = coord_ref
c.head()

#### Caluclating distance to each coordinate of interest

In [None]:
c['Distance'] = c.apply(lambda x: distance.distance((x[1], x[2]), (x[3], x[4])), axis=1)
c

Converting Distance to float

In [None]:
c['Distance'] = c['Distance'].astype(str).str[:-3].astype(float)
c = c.round(4)
c

In [None]:
c.info()

In [None]:
# create map of Supia using latitude and longitude values
map = folium.Map(location=[latitude, longitude], zoom_start=15)

# add all markers (adding "distance" to lable) to map
for label, lat, lng, cat, dist in zip(c.index, c['Latitude'], c['Longitude'], c['Category'], c['Distance']):    
    #label = folium.Popup(label, parse_html=True)
    label = label + ' ' + str(dist) + " km" 
    
    if cat=='Neighborhood':
        color='black' 
        radius=5            
    elif cat=='Educational Institution': 
        color='blue'
        radius=8         
    elif cat=='Health Services':
        color='red'
        radius=8        
    elif cat=='Real State Projects':
        color='cyan'
        radius=9        
    else: 
        color='white' 
        radius=10 
           
    folium.CircleMarker(
        [lat, lng],
        radius=radius,
        popup=label,
        color=color,        
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,        
        parse_html=True).add_to(map)    
    
    #folium.Marker([lat, lng], popup=label).add_to(map)  


# add markers "Project" to map
for label, lat, lng in zip(df_p.index, df_p['Latitude'], df_p['Longitude']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='orange',
        fill=True,
        fill_color='red',
        fill_opacity=0.7,          
        ).add_to(map)
map


In [None]:
c.head()