## Geographical distribution of venues in Buenos Aires City and its correlation with the value added per commune

In this notebook you will be able to see the different data collection, processing and visualization operations that were performed in order to have the results analyzed throughout the report. This notebook won't be structured as a report but rather as a document with all the steps, operations and transformations that have been done to the data. In order to read a more thorough analysis about this data please refer to the [report](https://docs.google.com/document/d/1QU99UWwxyQdBpGJGGIP9DPBVH75K5ismrHXUH80m7CY/edit?usp=sharing).


First we download a couple of libraries for geographic locations and map plotting respectively.

In [None]:
!pip install geopy
!pip install folium

Here we import some relevant libraries that will be used throughout the notebook.

In [None]:
import requests
import pandas as pd
import numpy as np
from geopy.geocoders import Nominatim
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans

Here we get the Foursquare API information about venues in Buenos Aires, Argentina in JSON format. After that the information about name, location and categories is selected and stored into an array. Finally with the help of pandas we turn it into a dataframe. The total amount of venues is then displayed.

In [None]:
r = requests.get('https://api.foursquare.com/v2/venues/explore?near=Buenos%20Aires&limit=100&v=20200520&client_id=HNUUTNPHSEDUFHAXHQUO5JFYGABTZ4CLH1XOCJO14UHLJIBW&client_secret=WOHLD0PHKILZN1MCWVJTXL3TWBJH0AENXMSOKPXPRDK2ABKL')
ba_venues = r.json()['response']['groups'][0]['items']

In [None]:
ba_venues_list = []
for v in ba_venues:
    ba_venues_list.append([v['venue']['name'], v['venue']['location']['lat'], v['venue']['location']['lng'], v['venue']['categories'][0]['name']])

In [None]:
ba_df = pd.DataFrame(ba_venues_list, columns=['Name', 'Latitude', 'Longitude', 'Category'])
ba_df.head(10)
ba_df.shape

A map plot is then generated from all the data of the different venues. 

In [None]:
address = 'Buenos Aires, AR'
geolocator = Nominatim(user_agent='ba_explorer')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

ba_map = folium.Map(location=[latitude, longitude], zoom_start=12)

for lat, lng, name, category in zip(ba_df['Latitude'], ba_df['Longitude'], ba_df['Name'], ba_df['Category']):
    label = '{}, {}'.format(name, category)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(ba_map)  
    
ba_map

Here we start processing the information for the clustering. First we drop the name and category columns in order to just have the latitude and longitude as dataframe columns. After that, the K-means algorithm is applied on the dataframe with K = 5 getting an array with the labels for all of the dataframe rows. We merge it with the original dataframe and plot it obtaining three main clusters with two outlier clusters.

In [None]:
ba_venues_coordinates = ba_df.drop(['Name', 'Category'], axis=1)
ba_venues_coordinates.head(10)

In [None]:
kclusters = 5
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(ba_venues_coordinates)
kmeans.labels_[0:10]

In [None]:
ba_df['Cluster Labels'] = kmeans.labels_
ba_df.head()

In [None]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(ba_df['Latitude'], ba_df['Longitude'], ba_df['Name'], ba_df['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In the following cells an Excel file with the apartments on sale in Buenos Aires in 2012 is downloaded, turned into a dataframe and processed to get the total amount of apartments on sale by commune.

In [None]:
ba_apartments = pd.read_excel('http://cdn.buenosaires.gob.ar/datosabiertos/datasets/departamentos-en-venta/departamentos-en-venta-2012.xlsx')

In [None]:
res = ba_apartments.groupby('COMUNA').count()
res.sort_values(by=['CALLE'], axis=0, ascending=False, inplace=True)
res