<h1>Introduction

<h3>A description of the problem and a discussion of the background. (15 marks)</h3>
As a person who enjoys visiting parks and plazas, I want to live in a zip code with more parks in plazas. I prefer the cleaner air these spaces provide as well as the green with the grass and the trees. I enjoy listening to birds and playing in the park. Any chance to escape the city a bit is a chance I try to take.

<h3>A description of the data and how it will be used to solve the problem. (15 marks)</h3>
I will be using foursquare's api to pull data on parks around the city of Madrid such as their name and geo coordinates. These parks will be compared with zip codes within a walking distance of 1 km. All zip codes and parks will be placed on the map. Zip code points will be colored based on the number of parks that are contained within.

<h1>Data

<h3>This project begins by downloading all necessary dependencies

In [26]:
import requests
import urllib.request
import lxml.html as lh
import pandas as pd
import json
from pandas.io.json import json_normalize
import numpy as np
import geopy
import folium
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
from geopy.geocoders import Nominatim
from apitest import *
print("Libraries imported")

Libraries imported


<h3>Reading the file with zip codes and their geocoordinates

In [27]:
df = pd.read_csv('madrid_zip_codes_latlon.csv')
df.head()

Unnamed: 0,Zip_Code,Latitude,Longitude
0,28001,40.424549,-3.68419
1,28002,40.449268,-3.67406
2,28003,40.443001,-3.69812
3,28004,40.424759,-3.6941
4,28005,40.40472,-3.71615


<h3>Determining the coordinates of Madrid

In [28]:
madrid_address = 'Madrid, Madrid'
geolocator = Nominatim(user_agent="madrid_explore")
location = geolocator.geocode(madrid_address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Madrid are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Madrid are 40.4167047, -3.7035825.


<H3>This map showcases the location of each of the zip codes within Madrid

In [29]:
# create map
intro_map = folium.Map(location=[latitude, longitude], zoom_start=12)



# add markers to the map
markers_colors = []
for lat, lon, poi in zip(df['Latitude'], df['Longitude'], df['Zip_Code']):
    label = folium.Popup(str(poi), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='blue',
        fill_opacity=0.7).add_to(intro_map)
       
intro_map

<h3> Code to clean up the map zip codes, so they are more condensed and closer to the city

In [30]:
# Get names of indexes for which column category is a far zipcode
drops = df[df['Zip_Code'] == 28024].index
#Must remove 28070, 28048, 28042, 28052, 28051, 28021, 28054, 28044, 28024 
# Delete these row indexes from dataFrame
df.drop(drops , inplace=True)

In [31]:
df.shape

(56, 3)

<h1> API Credentials

In [32]:
CLIENT_ID = foursquareid
CLIENT_SECRET = foursquaresecret
VERSION = '20200124'
radius = 1000
LIMIT = 500
CATEGORY = '4bf58dd8d48988d163941735'  #category for parks in foursquare API

<H2>Main Function Call
    <h3>Gets data on how many parks are found in each of the zip codes' coordinates

In [33]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)  - commented out once I got this working properly
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore/?&categoryId={}&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        CATEGORY,
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        lat, 
        lng, 
        radius, 
        LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Zip Code', 
                  'Zip Code Latitude', 
                  'Zip Code Longitude', 
                  'Park Name', 
                  'Park Latitude', 
                  'Park Longitude', 
                  'Park Category']
    
    return(nearby_venues)

In [None]:
df_counts = getNearbyVenues(names=df['Zip_Code'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

<h1>Methodology</h1>
Once we have the data, we can now see how many parks are within walking distance of the zip codes.  We also have the names of the parks.  From here we preview the names of the parks and determine the number of parks per zip code.
After that, we run a k means analysis which clusters the zip codes based on the availability of parks within that zip code.  We generate a new table with the k means information and merge that with the previous table to get our final df_merged table with all the data necessary to produce the map.

<h2>Determine the size of our sample.  I am not sure why I could not get more than 107 to show up.  This took many hours of playing before moving on

In [None]:
print(df_counts.shape)
df_counts

In [None]:
pd.unique(df_counts['Park Name'])

<h3>Determine how many parks are found within the zip code

In [None]:
df_final = df_counts.groupby('Zip Code')['Park Name'].nunique()
df_final = pd.DataFrame(df_final)
df_final.reset_index()

<h1> Cluster Zip Codes based on availability of parks

In [None]:
# set number of clusters
kclusters = 4

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(df_final)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:] 

In [None]:
kmeans

In [None]:
df_final.insert(0, 'Cluster Labels', kmeans.labels_)

In [None]:
df_final.rename(columns={"Park Name": "Park Count"}, inplace=True)

<h3>Reset the index before joining the dataframes for the final map

In [None]:
df_final.reset_index()

<h1> Time to join the columns

In [None]:
df_merged = df_counts.join(df_final, on='Zip Code')

df_merged.head() # check the last columns!

<h1>Results</h1>
Final map display to show the zip codes with color based on the number of parks.  The black spots are parks or plazas.

In [None]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set colors for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# adding markers to the map
markers_colors = []
for lat, lon, poi, cluster, parks in zip(df_merged['Zip Code Latitude'], 
                                         df_merged['Zip Code Longitude'], 
                                         df_merged['Zip Code'], 
                                         df_merged['Cluster Labels'],
                                         df_merged['Park Count']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster) + "\nThere are " + str(parks) + " parks.", parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=8,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

# adding parks to the map
for lat, lon, name in zip(df_merged['Park Latitude'],
                          df_merged['Park Longitude'],
                          df_merged['Park Name']):
    label2 = folium.Popup(str(name), parse_html=True)
    folium.CircleMarker(
    [lat, lon],
    radius=3,
    popup=label2,
    color='black',
    fill=True).add_to(map_clusters)

       
map_clusters

<h1>Discussion</h1>
When looking at the map you can see that there are other parks visible that the foursquare api did not pick up.  I've looked through the documentation but have been unable to determine if they are a part of foursquare, or if I'd limited the api call in some way.  Either way this is a good start to analayzing zip codes and their availability of parks.  I would like to run this analaysis again trying other location api companies to see how it differs in quality and information.  For now, though it looks like there are two prime zip codes to live in for lots of green space.  These are 28013 and 28009 each with a count of 10 parks or plazas that were pulled from Foursquare's api.

<h1>Conclusion</h1>
Thank you for taking the time to look at this. This project is the very first of many data science projects that I will be working on in the future.  It brought me great pleasure to produce this.  I know that from here they will only get better.  This is definitely a journey of learning.