## Mapping Real Estate Data

In this notebook, we use the Folium library and pandas to map data from a csv file. We are examining the dataset of Kings County real estate files in 2014 and 2015.

In [2]:
!pip install geopandas

Collecting geopandas
  Downloading geopandas-0.10.2-py2.py3-none-any.whl (1.0 MB)
[K     |████████████████████████████████| 1.0 MB 7.5 MB/s 
[?25hCollecting fiona>=1.8
  Downloading Fiona-1.8.21-cp37-cp37m-manylinux2014_x86_64.whl (16.7 MB)
[K     |████████████████████████████████| 16.7 MB 231 kB/s 
[?25hCollecting pyproj>=2.2.0
  Downloading pyproj-3.2.1-cp37-cp37m-manylinux2010_x86_64.whl (6.3 MB)
[K     |████████████████████████████████| 6.3 MB 40.3 MB/s 
Collecting munch
  Downloading munch-2.5.0-py2.py3-none-any.whl (10 kB)
Collecting cligj>=0.5
  Downloading cligj-0.7.2-py3-none-any.whl (7.1 kB)
Collecting click-plugins>=1.0
  Downloading click_plugins-1.1.1-py2.py3-none-any.whl (7.5 kB)
Installing collected packages: munch, cligj, click-plugins, pyproj, fiona, geopandas
Successfully installed click-plugins-1.1.1 cligj-0.7.2 fiona-1.8.21 geopandas-0.10.2 munch-2.5.0 pyproj-3.2.1


The data on home sales comes from a public access csv of the features of King County, Washington home sales from 2014–2015. The data can be found in the data folder of my above GitHub repository. For each home there are around 20 characteristics, including price, latitude, longitude and zipcode. Latitude, longitude and zipcode become important for making the data mappable for our GIS libraries. Below is the .info() prinout of the csv.

In [4]:
#import libraries
import pandas as pd
import folium
import geopandas as gpd


#read in housing data
df = pd.read_csv('https://raw.githubusercontent.com/jadeadams517/Real-Estate-Data-Mapping/main/King_County_Data/kc_house_data.csv')

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21597 entries, 0 to 21596
Data columns (total 21 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   id             21597 non-null  int64  
 1   date           21597 non-null  object 
 2   price          21597 non-null  float64
 3   bedrooms       21597 non-null  int64  
 4   bathrooms      21597 non-null  float64
 5   sqft_living    21597 non-null  int64  
 6   sqft_lot       21597 non-null  int64  
 7   floors         21597 non-null  float64
 8   waterfront     19221 non-null  object 
 9   view           21534 non-null  object 
 10  condition      21597 non-null  object 
 11  grade          21597 non-null  object 
 12  sqft_above     21597 non-null  int64  
 13  sqft_basement  21597 non-null  object 
 14  yr_built       21597 non-null  int64  
 15  yr_renovated   17755 non-null  float64
 16  zipcode        21597 non-null  int64  
 17  lat            21597 non-null  float64
 18  long  

In [6]:
#read in geojson

geo_data = gpd.read_file('https://raw.githubusercontent.com/jadeadams517/Real-Estate-Data-Mapping/main/King_County_Data/kc_zipcodes.geojson')



In [15]:
#instantiate map

map = folium.Map(location = [47.5480,-121.9836], zoom_start = 10, tiles="Stamen Toner")
map

##Fitting Pandas Data
Now we can accordingly add details. We will need to use pandas groupby methods so we can plot values by zipcode. We want to first start with mapping by mean price. We will be using Folium’s Choropleth mapping method.

##Mapping with Folium
Above, we created a new dataframe of each zipcode’s average price. Since we are mapping to ZIPCODE in the geojson file, we converted the zipcodes to strings. Now, we can plot the basic map and add more features from there, using Folium’s choropleth function. The input and output should look like:

In [14]:
from folium.folium import Map
#groupby zipcode and create mean price column
avg_price_zipcode = df.groupby("zipcode")["price"].mean().round(2)
df_zip_price = pd.DataFrame()

#pull groupby values into new dataframe
df_zip_price["ZIPCODE"] = avg_price_zipcode.index
df_zip_price["mean_price"] = avg_price_zipcode.values
df_zip_price["ZIPCODE"] = df_zip_price["ZIPCODE"].astype(str)
print(type(df_zip_price["ZIPCODE"]))

#add choropleth on top of map
map = folium.Map(location = [47.5480,-121.9836], zoom_start = 10, tiles="Stamen Toner")
folium.Choropleth(geo_data=geo_data, data = df_zip_price, columns = ['ZIPCODE','mean_price'],key_on='feature.properties.ZIPCODE',fill_color= "YlOrBr", fill_opacity =0.6,line_opacity=0.2,legend_name = 'MEAN SALE PRICE').add_to(map)
map

Output hidden; open in https://colab.research.google.com to view.

##Improving our Choropleth
Moving forward, choropleth has several built in variables to support understanding. For one, several coloring schemes exist including ‘YlGnBu’ or ‘BluPu’. The GitHub documentation link of Folium, referenced above, is helpful. I’m going to go over some major upgrades we can do to our map.
First, we can change the default values assigned to the color keys. The legend in the top right shows the range of means for each color. They each correspond to certain quantiles of the data. We can change these quantiles by adjusting the threshold scale. I’m going to start a fresh map and add the threshold.

The legend, scaled by price, shows how big a gap there is between the 90th percentile zipcode, and the maximum zipcode. With an average sale price of over $2,000,000, 98039 (the city of Medina) is a true outlier. You can adjust with the quantiles displayed by editing the values in my_thresh.

In [16]:
#merge geopandas and df_zip_price file

geodata_price = geo_data.merge(df_zip_price, on = 'ZIPCODE')
geodata_price.head()

Unnamed: 0,Name,description,OBJECTID,ZIP,ZIPCODE,COUNTY,ZIP_TYPE,COUNTY_NAME,PREFERRED_CITY,Shape_Length,Shape_Area,geometry,mean_price
0,,,1,98001,98001,33,Standard,King County,AUBURN,148134.770976,526121400.0,"POLYGON ((-122.29061 47.35539, -122.29061 47.3...",281194.87
1,,,2,98002,98002,33,Standard,King County,AUBURN,105168.476815,204445200.0,"POLYGON ((-122.22921 47.35375, -122.22992 47.3...",234284.04
2,,,3,98003,98003,33,Standard,King County,FEDERAL WAY,121645.070704,316981200.0,"POLYGON ((-122.30300 47.35745, -122.30393 47.3...",294111.28
3,,,4,98004,98004,33,Standard,King County,BELLEVUE,99252.932327,250546600.0,"POLYGON ((-122.21195 47.64642, -122.21191 47.6...",1356523.99
4,,,5,98005,98005,33,Standard,King County,BELLEVUE,116930.355168,211273300.0,"POLYGON ((-122.15354 47.66056, -122.15358 47.6...",810289.7


##Making the Map Interactive
Next, we are going to make the map more interactive. Two more conditions available us. Highlight allows us to hover our mouse over map objects, in this case, zipcodes. Folium choropleth doesn’t have conditions for interactivity so we must add features on top of the map. We will do this by adding a feature to the map.
folium.features.GeoJson doesn’t allow for multiple data sources, i.e. one for the data and one for the geojson, so we have to merge the price data with the locational data for the mapping. We do this by using GeoPandas to read the geojson file as a dataframe and then merge the price dataframe onto it:
import geopandas as gpd
geo_data = gpd.read_file(‘King_County_Data/kc_zipcodes.geojson’) geodata_price = geo_data.merge(df_zip_price, on = ‘ZIPCODE’)
We can now plot the feature using geodata_price. folium.features.GeoJson has several parameters that are important for running, including tooltip, style_function, and highlight_function. We create the feature as a class instance then add the class to the map:

In [17]:
#adjust threshold scale by adjusting reflected quantiles

my_thresh = df_zip_price["mean_price"].quantile((0,.5,.9,1)).tolist()
map = folium.Map(location = [47.5480,-121.9836], zoom_start = 10, tiles="Stamen Toner")

#update threshold scale, show map
folium.Choropleth(geo_data=geo_data,data = df_zip_price, columns = ['ZIPCODE','mean_price'],key_on='feature.properties.ZIPCODE',fill_color= "YlOrBr", fill_opacity =0.6,line_opacity=0.2,legend_name = 'MEAN SALE PRICE',highlight=True,overlay=True, threshold_scale = my_thresh,smooth_factor=0).add_to(map)

map

Output hidden; open in https://colab.research.google.com to view.

In [19]:
map = folium.Map(location = [47.5480,-121.9836], zoom_start = 10)
folium.Choropleth(geo_data=geo_data,data = df_zip_price, columns = ['ZIPCODE','mean_price'],key_on='feature.properties.ZIPCODE',fill_color= "YlOrBr", fill_opacity =0.6,line_opacity=0.2,legend_name = 'MEAN SALE PRICE',overlay=True,smooth_factor=0).add_to(map)


#create highlight feature to examine individual zipcodes on the map
highlights = folium.features.GeoJson(geodata_price,style_function=lambda x: {'color':'transparent','fillColor':'transparent','weight':0}, highlight_function = lambda x: {'fillColor': '#000000', 
                                'color':'#000000', 
                                'fillOpacity': 0.50, 
                                'weight': 0.1}, tooltip=folium.features.GeoJsonTooltip(fields=["ZIPCODE",'mean_price'],aliases = ["Zipcode: ", "Mean price in USD: "], labels=True,sticky=False))

map.add_child(highlights)
map.keep_in_front(highlights)
folium.LayerControl().add_to(map)
map

Output hidden; open in https://colab.research.google.com to view.

##Adding Markers
Adding labels to mapped data can be important for the reader to understand additional details or highlight the most important points of data. I am going to add a feature marker using folium.Marker and folium.Icon. We can find different icon styles at this link, and use the name in the icon parameter of folium.Icon. For the location parameter, I found the latitude and longitude of downtown Seattle online. The code and output will look like this:

In [20]:
#add marker denoting missing downtown Seattle data
folium.Marker(location = [47.6097,-122.3422],popup="Downtown Seattle home sales were not included in the data set.",icon=folium.Icon(color='red',icon='arrow-down')).add_to(map)
map

Output hidden; open in https://colab.research.google.com to view.