## Zillow API data

### Imports

In [1]:
import requests
import pandas as pd
import numpy as np
import xmltodict
import geopandas
import geopandas.tools
from shapely.geometry import Point
import pickle
from xml.etree import ElementTree
%matplotlib inline

### Collect & Process Data

Set up Zillow API url, and parse the response into a dict

In [2]:
api_key = ''
z_url = 'http://www.zillow.com/webservice/GetRegionChildren.htm?zws-id={}&state=c&city=san+francisco&childtype=neighborhood'.format(api_key)

In [3]:
r = requests.get(z_url)
response = requests.get(z_url)
tree = ElementTree.fromstring(response.content)
doc = xmltodict.parse(r.content)

In [4]:
z_dict = dict(doc['RegionChildren:regionchildren']['response'])
z_dict = z_dict['list']['region']

Extract all the relevant data, store in a list, then drop into a DataFrame

In [5]:
z_data = []
for i in range(len(z_dict)):
    try:
        z_index = int(str(z_dict[i]['zindex']['#text']).replace(" ", ""))
    except:
        z_index = np.nan
    name = str(z_dict[i]['name']).strip()
    lat = float(z_dict[i]['latitude'])
    lon = float(z_dict[i]['longitude'])
    z_data.append([name, z_index, lat, lon])

df = pd.DataFrame(z_data)
df.columns = ['neighborhood','z_index','latitude','longitude']

Convert the latitude and longitude data into geometry Points, and convert to a GeoDataFrame

In [6]:
df['geometry'] = df.apply(lambda row: Point(row['longitude'], row['latitude']), axis=1)
df = geopandas.GeoDataFrame(df, geometry='geometry')
df.crs = {"init": "epsg:4326"}
df.head()

Unnamed: 0,neighborhood,z_index,latitude,longitude,geometry
0,Mission,1106500.0,37.759892,-122.415902,POINT (-122.415902 37.759892)
1,Bernal Heights,1083700.0,37.740357,-122.419694,POINT (-122.419694 37.740357)
2,Central Richmond,1284800.0,37.778349,-122.482161,POINT (-122.482161 37.778349)
3,Excelsior,679900.0,37.72189,-122.429767,POINT (-122.429767 37.72189)
4,Bayview,612400.0,37.729195,-122.39123,POINT (-122.39123 37.729195)


Load in the zip code geospatial data downloaded from the SF open data website

In [7]:
zip_df = geopandas.read_file('zip.geojson')
zip_df.drop(['id','multigeom','objectid','po_name','st_area_sh','st_length_','state',
             'zip','pop10_sqmi','pop2010','sqmi'], axis=1, inplace=True)

Do a spatial join on the two GeoDataFrames so that I have z_index and zip code in the same DataFrame

In [8]:
result = geopandas.tools.sjoin(df, zip_df, how='left')
result.drop(['geometry','index_right','latitude','longitude'], axis=1, inplace=True)

There are multiple neighborhoods for each zip code, so I will take the mean value for each zip code

In [9]:
result = result.groupby('zip_code').mean().reset_index()

### Save Data

Now, all that's left is to pickle the DataFrame to be opened later 

In [10]:
pickle.dump(result, open('z_index_data.pkl', 'wb'))