# Recommending Portland Neighborhoods for Renters: A Geospatial Analysis of Rent and Venues
#### Jonathan Jettenberger-Burleson
#### April 27, 2020
### Applied Data Science Capstone by IBM/Coursera
<hr>

## Table of Contents:
* [Introduction](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Results](#results)
* [Discussion](#discussion)
* [Conclusion](#conclusion)

## Introduction <a name="introduction"></a>

The cost of a home is just one aspect that comes into play when calculating expenses. For many people the idea of buying a home can be either overwhelming or outside of their financial means. Renting is a good alternative that many decide is the optimal option for them. There are resources out there, such as Zillow that assist renters and home buyers alike in finding the right home. The monthly cost to rent can be affected by the unit size, location, number of bedrooms, and many other features. In addition, a unit’s proximity to venues such as restaurants, parks, night clubs, and museums may play a role in pricing and, more importantly, a renter’s interest in that location. Information relating monthly rent rates by neighborhood to nearby venues can be an important tool for renters.

A new home buyer that has decided that renting may be an option they wish to explore may weigh several criteria when looking for a place that is right for them. Other than the monthly rent, a location’s proximity to certain venues can play a role in a renter’s decision. This report will aim to recommend a neighborhood for a prospective renter based on venue type, venue proximity, and of course, monthly rent. In this report, we will focus on the Portland, Oregon area to minimize the scope.

This report is primarily of interest to both renters familiar with the Portland, Oregon area and those unfamiliar and new to the area. Home buyers and landlords looking to rent out their future property would also find this information of use in gauging the potential value of their investment based on proximity to venues. Those in real estate or city planning may also find this information of use when predicting rent, property value, or interest in an area. Business owners, especially small, local businesses, may find this report noteworthy in predicting their potential customer base.

## Data <a name="data"></a>

In this report, I will be leveraging geospatial venue data gathered from [Foursquare](https://foursquare.com/) and rental data gathered from [Zillow](https://www.zillow.com/). The Foursquare API will mainly be utilized for venue types and their locations in the Portland area, where requests send back trending locations within the search radius. As the Zillow API does not allow for general searches for an area, information on rent from multiple properties and their locations will come from [here](https://www.zillow.com/research/data/). This data is periodically updated and is relevant at the time of this report. The Zillow service uses a proprietary algorithm to determine rent as an attribute they call Rent Zestimate. This rent value factors in a property’s characteristics, unique features, on-market data, and off-market data. We will be selecting the Zillow Rent Index (ZRI) data with neighborhood geography. Boundary data, found [here](https://gis-pdx.opendata.arcgis.com/datasets/1ef75e34b8504ab9b14bef0c26cade2c_3), will also be used for modeling of neighborhoods. To assist in the cleaning of data, information about neighborhood names was gathered from [PDX Listed](https://www.pdxlisted.com/neighborhoods/).

Data requested from Foursquare provides much information about venues in an area. Information can be gathered based on location, other users, or even just by venue category. Foursquare also provides data on tips, hours, menus, photos, and events. Much of this information would be useful depending on the needs of the user, but are out of scope for this report.

We will be focusing on the latitude and longitude location of each venue, and the categories the venue falls under. The names of the venues will also be included for labeling. To restrict the data size for this report, we will search for venues with Portland as a city value. Due to the 100-entry limit on returned data per request for the Foursquare API, five separate requests were made in an attempt to increase the dataset size. Each request was centered on one of the five sections of Portland and was searched in a radius of 8046 meters, or approximately 5 miles. Overlapping data points were also removed. This method yielded a dataset of 366 venues in the Portland area (Figure 1).

In [1]:
#importing neccesary libraries
import numpy as np
import pandas as pd
import requests
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium --yes
!pip install folium
import folium
print('Libraries Imported.')

Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/a4/f0/44e69d50519880287cc41e7c8a6acc58daa9a9acf5f6afc52bcc70f69a6d/folium-0.11.0-py2.py3-none-any.whl (93kB)
[K     |████████████████████████████████| 102kB 7.1MB/s ta 0:00:011
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/13/fb/9eacc24ba3216510c6b59a4ea1cd53d87f25ba76237d7f4393abeaf4c94e/branca-0.4.1-py3-none-any.whl
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.11.0
Libraries Imported.


In [2]:
# The code was removed by Watson Studio for sharing.

In [3]:
VERSION = '20200427'
#8046 meters or ~5 miles radius
radius = 8046
limit = 100
#[Northwest, North, Northeast, Southwest, Southeast]
locations = [(45.5586,-122.7609), (45.6104,-122.7034), (45.5676,-122.6179), (45.4849,-122.7116), (45.4914,-122.5930)]
#Portland, OR location for folium maps
latitude = 45.5051
longitude = -122.6750

In [4]:
#request Foursquare venue data
df_ls = []
for s_lat, s_lng in locations:
    foursquare_url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, s_lat, s_lng, VERSION, radius, limit)
    foursquare_results = requests.get(foursquare_url).json()
    venue_data = foursquare_results['response']['groups'][0]['items']
    df = json_normalize(venue_data)
    df_ls.append(df)
venue_df = pd.concat(df_ls)
venue_df.reset_index(drop = True, inplace = True)

In [5]:
#filtering and cleaning venue data
filtered_columns = ['venue.name', 'venue.categories'] + [col for col in venue_df.columns if col.startswith('venue.location.')] + ['venue.id']
venue_df_filtered = venue_df.loc[:, filtered_columns]

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

venue_df_filtered['venue.categories'] = venue_df_filtered.apply(get_category_type, axis=1)
venue_df_filtered.columns = [column.split('.')[-1] for column in venue_df_filtered.columns]
venue_df_filtered = venue_df_filtered[['name', 'categories', 'lat', 'lng']]
venue_df_filtered = venue_df_filtered.drop_duplicates()
print("Number of venues:", venue_df_filtered.shape[0])
venue_df_filtered.head()

Number of venues: 373


Unnamed: 0,name,categories,lat,lng
0,Forest Park,Forest,45.560008,-122.756338
1,Cathedral Park,Park,45.587744,-122.759822
2,Forest Park - Thurman Gate,Trail,45.539759,-122.723745
3,Leif Erickson Trail,Trail,45.539568,-122.724785
4,Hoplandia Beer,Beer Store,45.589662,-122.755614


### Figure 1.

In [6]:
#display venue points on map of Portland, OR
venues_map = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, lng, name, categories in zip(venue_df_filtered.lat, venue_df_filtered.lng, venue_df_filtered.name, venue_df_filtered.categories):
    label = '{}, {}'.format(name, categories)
    folium.CircleMarker(
        [lat, lng],
        radius=2,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

venues_map

Zillow has detailed information on property characteristics and market data. The dataset collected has information from multiple locations. To refine the data, I began with selecting the subset focused on rental properties with the values City and State equaling Portland and OR, respectively. Due to the limited scope of this report, only the neighborhood names and the Rent Zestimate data will be examined. The column titles for this information was standardized to Neighborhood and Rent as well. In Portland, OR, many neighborhoods are represented by a home owners association or a league. This information was not represented between data sources. To match up the rent subset with the geospatial information gathered from the boundary data, 17 data entries for neighborhood names had to be cleaned because of this. There was one additional instance of a neighborhood being represented by its former name.

In [7]:
#download geojson boundry data
!wget --quiet https://opendata.arcgis.com/datasets/1ef75e34b8504ab9b14bef0c26cade2c_3.geojson -O portland_boundaries.json
print('GeoJSON file downloaded.')

GeoJSON file downloaded.


### Figure 2.

In [8]:
#displaying boundry data
portland_geo = r'portland_boundaries.json'
boundaries_map = folium.Map(location=[latitude, longitude], zoom_start=11)
folium.GeoJson(portland_geo, name='geojson').add_to(boundaries_map)
boundaries_map

In [9]:
price_df = pd.read_csv('http://files.zillowstatic.com/research/public/Neighborhood/Neighborhood_Zri_AllHomesPlusMultifamily_Summary.csv')
price_df.head()

Unnamed: 0,Date,RegionName,State,Metro,County,City,SizeRank,Zri,MoM,QoQ,YoY,ZriRecordCnt
0,2020-01-31,Northeast Dallas,TX,Dallas-Fort Worth-Arlington,Dallas County,Dallas,0,1402,0.0002,0.0068,0.0275,70388
1,2020-01-31,Maryvale,AZ,Phoenix-Mesa-Scottsdale,Maricopa County,Phoenix,1,1260,0.0141,0.0263,0.0821,56470
2,2020-01-31,Paradise,NV,Las Vegas-Henderson-Paradise,Clark County,Las Vegas,2,1389,0.0046,0.0132,0.0616,73688
3,2020-01-31,South Los Angeles,CA,Los Angeles-Long Beach-Anaheim,Los Angeles County,Los Angeles,3,2351,-0.0079,-0.0127,-0.0117,37112
4,2020-01-31,Upper East Side,NY,New York-Newark-Jersey City,New York County,New York,4,3772,-0.0274,-0.0664,-0.0715,102099


In [10]:
#filtering and cleaning the rental data
price_df_filtered = price_df[(price_df['State']=='OR') & (price_df['City']=='Portland')]
price_df_filtered = price_df_filtered[['RegionName', 'Zri']]
price_df_filtered.rename(columns={"RegionName": "Neighborhood", "Zri": "Rent"}, inplace=True)

price_df_filtered.loc[price_df_filtered['Neighborhood']=='Corbett-Terwilliger-Lair Hill', 'Neighborhood'] = 'South Portland'
price_df_filtered.loc[price_df_filtered['Neighborhood']=='Argay', 'Neighborhood'] = 'ARGAY TERRACE'
price_df_filtered.loc[price_df_filtered['Neighborhood']=='Brooklyn', 'Neighborhood'] = 'BROOKLYN ACTION CORPS'
price_df_filtered.loc[price_df_filtered['Neighborhood']=='Buckman', 'Neighborhood'] = 'BUCKMAN COMMUNITY ASSOCIATION'
price_df_filtered.loc[price_df_filtered['Neighborhood']=='Centennial', 'Neighborhood'] = 'CENTENNIAL COMMUNITY ASSOCIATION'
price_df_filtered.loc[price_df_filtered['Neighborhood']=='Cully', 'Neighborhood'] = 'CULLY ASSOCIATION OF NEIGHBORS'
price_df_filtered.loc[price_df_filtered['Neighborhood']=='Downtown', 'Neighborhood'] = 'PORTLAND DOWNTOWN'
price_df_filtered.loc[price_df_filtered['Neighborhood']=='Goose Hollow', 'Neighborhood'] = 'GOOSE HOLLOW FOOTHILLS LEAGUE'
price_df_filtered.loc[price_df_filtered['Neighborhood']=='Hayden Island', 'Neighborhood'] = 'HAYDEN ISLAND NEIGHBORHOOD NETWORK'
price_df_filtered.loc[price_df_filtered['Neighborhood']=='Irvington', 'Neighborhood'] = 'IRVINGTON COMMUNITY ASSOCIATION'
price_df_filtered.loc[price_df_filtered['Neighborhood']=='Mount Scott', 'Neighborhood'] = 'MT. SCOTT-ARLETA'
price_df_filtered.loc[price_df_filtered['Neighborhood']=='Mount Tabor', 'Neighborhood'] = 'MT. TABOR'
price_df_filtered.loc[price_df_filtered['Neighborhood']=='Northwest', 'Neighborhood'] = 'NORTHWEST DISTRICT ASSOCIATION'
price_df_filtered.loc[price_df_filtered['Neighborhood']=='Powellhurst Gilbert', 'Neighborhood'] = 'POWELLHURST-GILBERT'
price_df_filtered.loc[price_df_filtered['Neighborhood']=='Sellwood-Moreland', 'Neighborhood'] = 'SELLWOOD-MORELAND IMPROVEMENT LEAGUE'
price_df_filtered.loc[price_df_filtered['Neighborhood']=='Southwest Hills', 'Neighborhood'] = 'SOUTHWEST HILLS RESIDENTIAL LEAGUE'
price_df_filtered.loc[price_df_filtered['Neighborhood']=='St.Johns', 'Neighborhood'] = 'ST. JOHNS'
price_df_filtered.loc[price_df_filtered['Neighborhood']=='Wilkes', 'Neighborhood'] = 'WILKES COMMUNITY GROUP'

price_df_filtered['Neighborhood'] = price_df_filtered['Neighborhood'].str.upper()
price_df_filtered.sort_values(by='Neighborhood', ascending=True, inplace = True)
price_df_filtered.reset_index(drop = True, inplace = True)
price_df_filtered.head()

Unnamed: 0,Neighborhood,Rent
0,ALAMEDA,1914
1,ARBOR LODGE,1884
2,ARGAY TERRACE,1638
3,BEAUMONT-WILSHIRE,1972
4,BOISE,1781


In [11]:
print("Number of price points:", price_df_filtered.shape[0])

Number of price points: 60


In [12]:
#filtering and cleaning the boundary data
geo_df = pd.read_csv('https://opendata.arcgis.com/datasets/1ef75e34b8504ab9b14bef0c26cade2c_3.csv?outSR=%7B%22latestWkid%22%3A3857%2C%22wkid%22%3A102100%7D')
geo_df_filtered = geo_df[['NAME', 'SHARED']]
geo_df_filtered.drop(geo_df_filtered[geo_df_filtered['NAME'].str.contains('UNCLAIMED')].index, inplace=True)
geo_df_filtered.rename(columns={"NAME": "Neighborhood", "SHARED": "Shared"}, inplace=True)
geo_df_filtered = geo_df_filtered.drop_duplicates()
geo_df_filtered.sort_values(by='Neighborhood', ascending=True, inplace = True)
geo_df_filtered.reset_index(drop = True, inplace = True)
geo_df_filtered.loc[geo_df_filtered['Neighborhood']=='ALAMEDA/IRVINGTON COMMUNITY ASSN.', 'Neighborhood'] = 'ALAMEDA/IRVINGTON COMMUNITY ASSOCIATION'
geo_df_filtered.loc[geo_df_filtered['Neighborhood']=='CENTENNIAL COMMUNITY ASSN./PLEASANT VALLEY', 'Neighborhood'] = 'CENTENNIAL COMMUNITY ASSOCIATION/PLEASANT VALLEY'
geo_df_filtered.loc[geo_df_filtered['Neighborhood']=='HILLSIDE/NORTHWEST DISTRICT ASSN.', 'Neighborhood'] = 'HILLSIDE/NORTHWEST DISTRICT ASSOCIATION'
geo_df_filtered.loc[geo_df_filtered['Neighborhood']=="LLOYD DISTRICT COMMUNITY ASSN./SULLIVAN'S GULCH", 'Neighborhood'] = "LLOYD DISTRICT COMMUNITY ASSOCIATION/SULLIVAN'S GULCH"
geo_df_filtered.loc[geo_df_filtered['Neighborhood']=="SABIN COMMUNITY ASSN./IRVINGTON COMMUNITY ASSN.", 'Neighborhood'] = "SABIN COMMUNITY ASSOCIATION/IRVINGTON COMMUNITY ASSOCIATION"
geo_price_df = geo_df_filtered.merge(price_df_filtered, how='outer')

#merging the rent and boundary data
shared_df = geo_price_df[geo_price_df['Shared'] == 'Y']
shared_df.reset_index(drop = True, inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  errors=errors)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)


In [13]:
print("Number of neighborhoods:", geo_df.shape[0])
print("Number of shared neighborhoods:", shared_df.shape[0])

Number of neighborhoods: 130
Number of shared neighborhoods: 25


In [14]:
#filling in overlapping neighborhoods with average rent
for x in shared_df['Neighborhood']:
    args_ls = x.split('/')
    size = len(args_ls)
    total = 0
    for i in args_ls:
        if i in geo_price_df['Neighborhood'].values:
            if pd.isna(geo_price_df.loc[geo_price_df['Neighborhood']==i, 'Rent'].values[0]):  
                size-= 1
            else:
                value = geo_price_df.loc[geo_price_df['Neighborhood']==i, 'Rent'].values[0]
                total+= value
        else:
            size-= 1
    if size != 0:
        geo_price_df.loc[geo_price_df['Neighborhood']==x, 'Rent'] = total/size
        
geo_price_df.head()

Unnamed: 0,Neighborhood,Shared,Rent
0,ALAMEDA,N,1914.0
1,ALAMEDA/BEAUMONT-WILSHIRE,Y,1943.0
2,ALAMEDA/IRVINGTON COMMUNITY ASSOCIATION,Y,1983.0
3,ARBOR LODGE,N,1884.0
4,ARDENWALD-JOHNSON CREEK,N,


In [15]:
#reverting neighborhood names back to match the geojson file
geo_price_df.loc[geo_price_df['Neighborhood']=='ALAMEDA/IRVINGTON COMMUNITY ASSOCIATION', 'Neighborhood'] = 'ALAMEDA/IRVINGTON COMMUNITY ASSN.'
geo_price_df.loc[geo_price_df['Neighborhood']=='CENTENNIAL COMMUNITY ASSOCIATION/PLEASANT VALLEY', 'Neighborhood'] = 'CENTENNIAL COMMUNITY ASSN./PLEASANT VALLEY'
geo_price_df.loc[geo_price_df['Neighborhood']=='HILLSIDE/NORTHWEST DISTRICT ASSOCIATION', 'Neighborhood'] = 'HILLSIDE/NORTHWEST DISTRICT ASSN.'
geo_price_df.loc[geo_price_df['Neighborhood']=="LLOYD DISTRICT COMMUNITY ASSOCIATION/SULLIVAN'S GULCH", 'Neighborhood'] = "LLOYD DISTRICT COMMUNITY ASSN./SULLIVAN'S GULCH"
geo_price_df.loc[geo_price_df['Neighborhood']=='SABIN COMMUNITY ASSOCIATION/IRVINGTON COMMUNITY ASSOCIATION', 'Neighborhood'] = "SABIN COMMUNITY ASSN./IRVINGTON COMMUNITY ASSN."

In [16]:
print("Number of areas with no data:", pd.isna(geo_price_df['Rent']).sum())

Number of areas with no data: 37


In [17]:
#rent data heatmap
price_map = folium.Map(location=[latitude, longitude], zoom_start=11)

price_map.choropleth(
    geo_data=portland_geo,
    data=geo_price_df,
    columns=['Neighborhood', 'Rent'],
    key_on='feature.properties.NAME',
    fill_color='YlGnBu', 
    fill_opacity=0.8, 
    line_opacity=0.2,
    legend_name='Rent by Neighborhood'
)

price_map



The neighborhood boundary data includes 25 entries that represent overlapping of the neighborhoods. In these instances, available rental data was averaged for the overlapping areas. From the rental data from Zillow, only 93 of the 130 areas on Figure 3 were filled. This still left some areas of the map with no rental data. A few of these areas represent unclaimed neighborhoods that are not claimed by any association. The other areas simply have no rental data on Zillow. This could be due to a lack of rental information posted on Zillow, or even no rental properties to speak of.

## Methodology <a name="methodology"></a>

In analyzing the raw data, it is clear we have a reasonable spread to our data points. As seen in Figure 4, there are some clear outliers. These venue points with locations outside of the neighborhood boundaries is due to the way in which venues are requested. By searching in a radius from a given set of coordinates, we have introduced venues in other neighborhoods.

In [18]:
categories = venue_df_filtered['categories'].drop_duplicates()
print("Unique categories:", categories.shape[0])

Unique categories: 128


### Figure 4.

In [19]:
#rent data heatmap with venue overlay
merged_map = price_map

for lat, lng, name, categories in zip(venue_df_filtered.lat, venue_df_filtered.lng, venue_df_filtered.name, venue_df_filtered.categories):
    label = '{}, {}'.format(name, categories)
    folium.CircleMarker(
        [lat, lng],
        radius=2,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(price_map)
    
merged_map

In [20]:
#clustering venue data points
kclusters = 5
venue_clustering = venue_df_filtered.drop(['name','categories'], 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(venue_clustering)
kmeans.labels_

array([3, 3, 0, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 0, 3, 3, 0, 0, 3,
       3, 3, 0, 4, 3, 0, 4, 0, 3, 0, 3, 0, 0, 3, 3, 0, 3, 0, 3, 0, 0, 3,
       4, 3, 4, 0, 0, 4, 0, 3, 4, 3, 0, 3, 0, 0, 0, 4, 3, 3, 0, 0, 0, 3,
       0, 4, 4, 0, 0, 0, 0, 4, 4, 0, 3, 4, 0, 4, 0, 4, 4, 0, 4, 0, 4, 4,
       4, 0, 4, 0, 4, 4, 4, 4, 4, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 4, 1, 4, 1, 1, 4, 1, 1, 1, 1, 4, 4, 3, 4, 4, 3, 1, 1, 4, 3, 4,
       1, 1, 4, 1, 4, 4, 4, 4, 4, 4, 4, 2, 4, 4, 4, 4, 4, 4, 2, 2, 4, 2,
       2, 2, 2, 2, 4, 2, 4, 4, 4, 2, 4, 4, 4, 2, 4, 2, 2, 2, 2, 2, 2, 2,
       4, 2, 4, 4, 2, 4, 2, 4, 2, 4, 2, 2, 2, 2, 4, 2, 2, 4, 2, 2, 4, 2,
       0, 2, 0, 2, 2, 0, 2, 2, 2, 2, 2, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 2, 0, 0, 0, 2, 2, 0, 2, 0, 2, 0,
       0, 2, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2,

In [21]:
venue_grouped = venue_df_filtered
venue_grouped.insert(0, 'cluster labels', kmeans.labels_)

In order to analyze the relationship between the locations of venues and their proximity to neighborhoods, the venue points were put through a k-means clustering algorithm. The venue data was clustered into 5 clusters, matching the number of sections of the city in which venue data points was gathered. With the clustering labels generated, the data points in Figure 4 were given color and transformed into that which is displayed in Figure 5. A clear divide can be seen partitioning the venue points.

### Figure 5.

In [22]:
#displaying clustered venue points on rent heatmap
map_clusters = price_map

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lng, name, categories, cluster in zip(venue_grouped['lat'], venue_grouped['lng'], venue_grouped['name'], venue_grouped['categories'], venue_grouped['cluster labels']):
    label = folium.Popup(str(name) + str(categories) + ': Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=2,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Results <a name="results"></a>

#### Cluster 1:

The first cluster is representative of the Southwest section of Portland. There are many venues with a reasonable spread of category type, with multiple options for each type. Most of the venues are clustered in the downtown area, but many of them are spaced further out. This area of the city does contain the highest priced rental properties though, with some neighborhoods with rent topping the chart at an average of $2346/month. While there appears to be enough rental data for this area, there are several patches with no rental data, especially in the southern half of this cluster.

In [23]:
cluster_1 = venue_grouped.loc[venue_grouped['cluster labels'] == 0, venue_grouped.columns[[1, 2, 3]]]
cluster_1.head()

Unnamed: 0,name,categories,lat
2,Forest Park - Thurman Gate,Trail,45.539759
3,Leif Erickson Trail,Trail,45.539568
16,Great Notion Brewing,Brewery,45.539852
19,Forest Park - Wildwood Trail,Trail,45.527233
20,Lower MacLeay Park,Park,45.535837


In [24]:
cluster_1_categories = cluster_1['categories'].drop_duplicates()
print("Number of venues:", cluster_1.shape[0])
print("Unique categories:", cluster_1_categories.shape[0])

Number of venues: 107
Unique categories: 58


#### Cluster 2:

The second cluster represents the most northern area of Portland. This area consists of several larger neighborhoods and has relatively low average rent. This cluster has not that many venues, but very little overlap in their types. Most of the venues in this cluster actually reside outside of the Portland area, but are near each other.

In [25]:
cluster_2 = venue_grouped.loc[venue_grouped['cluster labels'] == 1, venue_grouped.columns[[1, 2, 3]]]
cluster_2.head()

Unnamed: 0,name,categories,lat
100,Columbia River,River,45.61405
101,Stanford's Jantzen Beach,American Restaurant,45.611653
102,Vancouver Farmers Market,Farmers Market,45.626614
103,Loowit Brewing Company,Brewery,45.62516
104,Boomers,BBQ Joint,45.613565


In [26]:
cluster_2_categories = cluster_2['categories'].drop_duplicates()
print("Number of venues:", cluster_2.shape[0])
print("Unique categories:", cluster_2_categories.shape[0])

Number of venues: 45
Unique categories: 33


#### Cluster 3:

The third cluster is representative of the Northwest section of Portland. This cluster contains the least amount of venues and category types. There is also a good deal of distance between most of the venues. Most of the area resides in neighborhoods that have no rental data. Of what little data there is, the average rent appears to be about average.

In [27]:
cluster_3 = venue_grouped.loc[venue_grouped['cluster labels'] == 2, venue_grouped.columns[[1, 2, 3]]]
cluster_3.head()

Unnamed: 0,name,categories,lat
206,Pip's Original,Donut Shop,45.548389
218,Mẹ Kha,Vietnamese Restaurant,45.546846
221,Pizzeria Otto,Pizza Place,45.546202
223,Hollywood Farmer's Market,Farmers Market,45.536406
224,Hollywood Theatre,Indie Movie Theater,45.535728


In [32]:
cluster_3_categories = cluster_3['categories'].drop_duplicates()
print("Number of venues:", cluster_3.shape[0])
print("Unique categories:", cluster_3_categories.shape[0])

Number of venues: 37
Unique categories: 25


#### Cluster 4:

The fourth cluster is representative of the Southeast section of Portland. This cluster has the lowest average rent, with rent as low as $1573/month. There are many venues with a similar uniqueness in their categories as cluster 1. Venues are spread out a good deal, but there is some clustering within proximity to the downtown area. As you go east in this cluster, rent averages go down.

In [33]:
cluster_4 = venue_grouped.loc[venue_grouped['cluster labels'] == 3, venue_grouped.columns[[1, 2, 3]]]
cluster_4.head()

Unnamed: 0,name,categories,lat
205,Pip's Original,Donut Shop,45.548389
208,Fire on the Mountain,Wings Joint,45.548068
218,Roseway Theater,Movie Theater,45.548703
223,Mẹ Kha,Vietnamese Restaurant,45.546846
226,Pizzeria Otto,Pizza Place,45.546202


In [34]:
cluster_4_categories = cluster_4['categories'].drop_duplicates()
print("Number of venues:", cluster_4.shape[0])
print("Unique categories:", cluster_4_categories.shape[0])

Number of venues: 110
Unique categories: 64


#### Cluster 5:

The fifth cluster is representative of the Northeast and some of the Northern section of Portland. This area has relatively average rent. Venues are spread out with two areas of clustering along major streets, North Mississippi Avenue and Northeast Alberta Street. There is a fair amount of venues in this area, with a decent amount of spread in their categories.

In [35]:
cluster_5 = venue_grouped.loc[venue_grouped['cluster labels'] == 4, venue_grouped.columns[[1, 2, 3]]]
cluster_5.head()

Unnamed: 0,name,categories,lat
26,Blend Coffee Lounge,Coffee Shop,45.562699
29,Columbia Park,Park,45.580414
46,King Burrito Mexican Food,Mexican Restaurant,45.57697
49,Mock Crest Tavern,Bar,45.577226
52,Arbor Lodge Park,Park,45.572958


In [36]:
cluster_5_categories = cluster_5['categories'].drop_duplicates()
print("Number of venues:", cluster_5.shape[0])
print("Unique categories:", cluster_5_categories.shape[0])

Number of venues: 68
Unique categories: 42


## Discussion <a name="discussion"></a>

With the clustering of the rent data from Zillow, it is clear which neighborhoods of the city of Portland, OR are more expensive. It follows that these areas have a larger amount of trending venues. What is interesting to note, the areas with low rent share this trend. Cluster 1 with its high rental areas contains downtown Portland. Right across the Willamette River is Cluster 4 with its low rent. The other cluster, representing the areas with average relative rent, span the northern half of Portland. The lack of data in these northern regions could have played a role in their standing in this comparison. Cluster 3 may be an exception though, as it contains Forest Park. This park covers much of the area, actually dividing the region. This could be the reason for the lack of data, but also plays a role in the distance between venues. On the up side, this area contains the majority of the park and trail venues in the entire city.

For someone looking for a property to rent, it is important to take their budget in to account. If a renter is looking to budget, the Southeast section of Portland is the obvious choice. This area boasts the lowest rent between $\$$1500 and $\$$2000/month, while still being close to downtown. Venues are spread out, but neighborhoods in the northwestern part of this area show more clustering with closer venues. If budget is not a concern, the Southwestern section of Portland is the right choice. With the downtown area, these neighborhoods contain numerous venues, especially if including its proximity to other sections of the city. Renters should be prepared to see over $\$$2000/month in some areas, if this is the area they decide on.

## Conclusion <a name="conclusion"></a>

We set out to recommend neighborhoods for potential renters in the Portland, OR area. Data from Foursquare and Zillow was leveraged to find a relationship between average rent by neighborhood and location of trending venues. Foursquare provides much information on venues, but needs some cleaning and data analysis to yield meaningful results from a data science point of view. Data gathered from Zillow needed cleaning in order to match up with boundary information. Once paired up, the heatmaps generated show a clear trend in the location of venues and the number of venue categories represented. The majority of venues are situated in areas of high and low rental data. This may be indicative of their quality, but that is something that could be explored in the future. It is clear that someone looking to rent should be recommended either a high rent or low rent neighborhood in the southern half of Portland, depending on their budget. Depending on how the data is interpreted, other interested parties may see an opportunity in the northern half of the city. Business owners, for example, may see less competing local venues. On the other hand, these areas do not stand out as much as those on the high and low end of the rent spectrum. Moving forward, more data must be collected to fill in the missing areas of the heatmap.