# Read before running the code

1. The Zillow API have 1000 daily request limitation, please use your own zws_id and gkey (google api) to run the code for your city (each person responsible for two city)
2. Various function were written to pull: 1) 200 lat/long by city per request; 2) Address generated by lat/long; 3) Zillow information based on address. Nothing need to change / update for the function when run through the code.
3. When first run the code, please go the the next Markdown to read the instruction. Before running the code to get housing data, please __enter the city you want to get the data with__.
4. To gather valid data, we removed data that could not generate complete and valid zillow housing information. In addition, we will remove any data that did not belong to the right city.
5. To insure that we have at least 50 valid data for each city, we run a loop at the end if valid data count is less than 50
6. Once we have more than 50 valid data for each city, the code will save the file into the Clean Data Folder
7. When code related to pull Lat/Lng information from Google API have "Index" error, that means we have reached the API limit and need to change the API key
8. When message code form zillow is 7, it means we have reached the API limit and we need to change to another key

In [22]:
# ZILLOW DATA EXTRACTION WRITTEN BY SONIA YANG

# Dependencies
import requests
import urllib
import random
import math
import pandas as pd
import xml.etree.ElementTree as ET
import time
# from config import zws_id, gkey # please use your own Zillow & Google API keys!
zws_id='X1-ZWz1gbvc8dh5vv_1vfab'
gkey="AIzaSyDuR6Ej6fNbaY-gjZRaA0t3THaJw-UNai8"
from urllib.request import urlopen

In [23]:
# FUNCTION to grab the exact address based on longitude and latitude
# modified from here https://gist.github.com/bradmontgomery/5397472
# their example didn't include an API key, but I added it otherwise you'd hit the rate limit easily

def reverse_geocode(latitude, longitude):
    # Did the geocoding request comes from a device with a
    # location sensor? Must be either true or false
    sensor = 'true'

    # Hit Google's reverse geocoder directly
    # NOTE: I *think* their terms state that you're supposed to
    # use google maps if you use their api for anything.
    base = "https://maps.googleapis.com/maps/api/geocode/json?"
    params = "latlng={lat},{lon}&sensor={sen}&key={key}".format(
        lat=latitude,
        lon=longitude,
        sen=sensor,
        key=gkey
    )
    url = "{base}{params}".format(base=base, params=params)
    #print(url)
    response = requests.get(url).json()
    address = response['results'][0]['formatted_address']
    return address

In [24]:
# FUNCTION to generate random lat & lng within a certain radius 
# modified from here: http://hadoopguru.blogspot.com/2014/12/python-generate-random-latitude-and.html
# changed to take in an empty initial dataframe and load in the data + return it
# this calls the reverse geocode function to grab the addresses of each randomly generated lat & lng

def generate_addresses(latitude, longitude, df):
    
    radius = 5000                         #Choose your own radius
    radiusInDegrees=radius/111300            
    r = radiusInDegrees

    counter = 0
    
    for i in range(1,50):                 #Choose number of Lat Long to be generated

        u = float(random.uniform(0.0,1.0))
        v = float(random.uniform(0.0,1.0))

        w = r * math.sqrt(u)
        t = 2 * math.pi * v
        x = w * math.cos(t) 
        y = w * math.sin(t)

        xLat  = x + latitude
        yLng = y + longitude

        df.set_value(counter, "latitude", xLat)
        df.set_value(counter, "longitude", yLng)
        
        #print(format(counter) + ": " + format(xLat) + ", " + format(yLng))
        address = reverse_geocode(xLat, yLng).split(',')
        citystatezip = address[1] + address[2]
        
        df.set_value(counter, "address", address[0])
        df.set_value(counter, "city_state_zip", citystatezip)
        
        # Add to counter
        counter = counter + 1
    
    return df

In [25]:
# FUNCTION to call Zillow API's GetSearchResults and will check to see if a house exists at that address
# message code will be written to dataframe
# zillow url format
# http://www.zillow.com/webservice/GetSearchResults.htm?zws-id=<ZWSID>&address=2114+Bigelow+Ave&citystatezip=Seattle%2C+WA
        
def get_message_codes(df):

    for index, row in df.iterrows():

        try:
            url = 'https://www.zillow.com/webservice/GetSearchResults.htm?zws-id='
            address = row['address']
            citystatezip =row['city_state_zip']


            query_url = url + zws_id + '&address=' + urllib.parse.quote(address) + '&citystatezip=' + urllib.parse.quote(citystatezip) 
            #print(query_url)

            root = ET.parse(urlopen(query_url)).getroot()

            for message in root.iter('message'):
                message_code = message[1].text

            print(format(index) + ": " + message_code)

            df.set_value(index, 'message_code', message_code)

            time.sleep(0.5) #necessary bc bombarding Zillow with API calls doesn't allow enough time to respond to each

        except:
            break
    

In [26]:
# FUNCTION to call Zillow's GetDeepSearchResults and look up Zestimate, bed, and bath
# http://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=<ZWSID>&address=2114+Bigelow+Ave&citystatezip=Seattle%2C+WA
# there are some limitations such as multiple zestimates depending on when the house was sold/if it was sold multiple times
# the code to handle that would get too convoluted so I am just writing in the most recent (according to the API) values
# probably not what we would do in real life
# but a decision we made re: the scope of a classroom project on a short time constraint

def search_zillow(df):
    
    for index, row in df.iterrows():
        try:
            url = 'https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id='
            address = df['address'][index]
            citystatezip = df['city_state_zip'][index]


            query_url = url + zws_id + '&address=' + urllib.parse.quote(address) + '&citystatezip=' + urllib.parse.quote(citystatezip) 


            root = ET.parse(urlopen(query_url)).getroot()

            print("row " + format(index) + ": " + address + citystatezip)
            print(query_url)

            '''
               "year built","lot size","finished sq ft"'''
            
            #zpid
            for zpid in root.iter('zpid'):
                df.set_value(index,'zpid', zpid.text)
            
            # we already have the address from the address + citystatezip variables
            # so we don't need to grab it again
            # same with lat & lng already being in the table
            
            #valuation (high and low)
            for valuation in root.iter('valuationRange'):
                highValuation = valuation[1].text
                lowValuation = valuation[0].text
                df.set_value(index, 'valuation_high', highValuation)
                df.set_value(index, 'valuation_low', lowValuation)
            
            #zestimate
            for zestimate in root.iter('zestimate'):
                zestimate_value = zestimate[0].text

                if zestimate_value is None:
                    print('not for sale')
                else:
                    print ('zestimate (value): ' + format(zestimate[0].text)) 
                    df.set_value(index, 'zestimate', zestimate_value)
             
            #home value index
            for zindexValue in root.iter('zindexValue'):
                df.set_value(index, 'home value index', zindexValue.text)
            
            #tax assessment
            for taxAssessment in root.iter('taxAssessment'):
                df.set_value(index, 'tax assessment', taxAssessment.text)
                
            #tax assessment year
            for taxAssessmentYear in root.iter('taxAssessmentYear'):
                df.set_value(index, 'tax assess year', taxAssessmentYear.text)
                
            #year built
            for yearBuilt in root.iter('yearBuilt'):
                df.set_value(index, 'year built', yearBuilt.text)
             
            #lot size sq ft
            for lotSizeSqFt in root.iter('lotSizeSqFt'):
                df.set_value(index, 'lot size', lotSizeSqFt.text)
            
            #finished sq ft
            for finishedSqFt in root.iter('finishedSqFt'):
                df.set_value(index, 'finished sq ft', finishedSqFt.text)
            
            #bedrooms
            for bedroom in root.iter('bedrooms'):
                bedrooms = bedroom.text
                #print("bedrooms: " + bedrooms)
                df.set_value(index, 'bedrooms', bedrooms)

            #bathrooms
            for bathroom in root.iter('bathrooms'):
                bathrooms = bathroom.text
                #print("bathrooms: " + bathrooms + "\n")
                df.set_value(index, 'bathrooms', bathrooms)           
            
            print('\n')

            time.sleep(0.5) 


        except:
            break


<h2>HOW TO RUN THIS CODE</h2>
<ol>
<li>Initialize an empty dataframe with the fields as marked below</li>
<li>Call the <strong>generate_addresses</strong> function passing in your empty dataframe</li>
<li>Call the <strong>get_message_codes</strong> to update your dataframe with message codes indicating whether or not a valid property exists at each address. <strong>IMPORTANT:</strong> please register your own Zillow account/get your own key for this!! If we all keep using the same one we'll easily hit the rate limit </li>
<li>Drop the rows in the dataframe for which a property does not exist at that address</li>
<li>Call the <strong>search_zillow</strong> function to get the zestimate (aka price of the property), # of bedrooms, and # of bathrooms</li>
<li>I did not include it in my code, but once you get a sample size of data that you are satisfied with for the city, maybe write it out to a CSV so you don't have to keep running this code/can use it later</li>
</ol>

feel free to comment out my print statements while the functions are running if you find them distracting

In [27]:
# Read cities file to pull the Latitude and Longtitude
Cities=pd.read_csv('../Raw_Data/LA_cities_Lat_lng_codes_data.csv')
print(f'{Cities["address"]}')
city1 = input("Please input first city your want to pull data")
selectcity = Cities.loc[Cities["address"] == city1, :]
LAT = selectcity.iloc[0,1]
LNG = selectcity.iloc[0,2]

0       Los Angeles
1        Long Beach
2          Glendale
3         Lancaster
4          Palmdale
5     Santa Clarita
6            Pomona
7          Torrance
8          Pasadena
9         Inglewood
10          Compton
11           Downey
12      West Covina
13          Norwalk
14          Burbank
15       South Gate
16         El Monte
17         Whittier
18         Alhambra
Name: address, dtype: object
Please input first city your want to pull dataPalmdale


In [28]:
# HOW TO RUN ALL THE FUNCTIONS, USING LOS ANGELES AS AN EXAMPLE

# coordinates taken from the CitiesGeo_Output.csv
# we should manually run the following code on each individual city instead of nesting it in another loop
# while this may be hardcoded, it's better than waiting on one gigantic loop that takes forever

# STEP 1: INITALIZE THE DATAFRAME
# if we need any more fields, let me know
la_df = pd.DataFrame({"zpid": '',
                      "address":'',
                      "city_state_zip":'',
                      "latitude":'',
                      "longitude":'',
                      "message_code":'',
                      "zestimate":'',
                      "valuation_high":'',
                      "valuation_low": '',
                      "home value index":'',
                      "tax assessment":'',
                      "tax assess year":'',
                      "year built":'',
                      "lot size":'',
                      "finished sq ft":'',
                      "bedrooms":'',
                      "bathrooms":''}, index=[0])

# reorder the columns
la_df = la_df[["zpid", "address","city_state_zip","latitude","longitude","message_code","zestimate",
               "valuation_high","valuation_low","home value index","tax assessment","tax assess year",
               "year built","lot size","finished sq ft","bedrooms","bathrooms"]]

# STEP 2: GENERATE RANDOM ADDRESSES IN THE DESIGNATED AREA
# pass in the coordinates for Los Angeles plus the empty dataframe
generate_addresses(LAT,LNG, la_df) 

#la_df

Unnamed: 0,zpid,address,city_state_zip,latitude,longitude,message_code,zestimate,valuation_high,valuation_low,home value index,tax assessment,tax assess year,year built,lot size,finished sq ft,bedrooms,bathrooms
0,,2751 E Ave R,Palmdale CA 93550,34.5729,-118.079,,,,,,,,,,,,
1,,37015 Tierra Subida Ave,Palmdale CA 93551,34.5538,-118.139,,,,,,,,,,,,
2,,36535 Oliver Ln,Palmdale CA 93551,34.5454,-118.144,,,,,,,,,,,,
3,,39474-39596 25th St E,Palmdale CA 93550,34.6077,-118.082,,,,,,,,,,,,
4,,37304 Harrow Ct,Palmdale CA 93550,34.5601,-118.104,,,,,,,,,,,,
5,,38906-38998 10th St E,Palmdale CA 93550,34.5914,-118.11,,,,,,,,,,,,
6,,460 Shasta Pl,Palmdale CA 93550,34.5421,-118.121,,,,,,,,,,,,
7,,Country Club Dr,Palmdale CA 93551,34.6015,-118.135,,,,,,,,,,,,
8,,W City Ranch Rd,Palmdale CA 93551,34.5673,-118.158,,,,,,,,,,,,
9,,38910 30th St E,Palmdale CA 93550,34.5891,-118.075,,,,,,,,,,,,


In [29]:
# STEP 3: CALL THE ZILLOW API TO GET MESSAGE CODES
# 0 means there is a valid property at that address
# 508 and anything else means there isn't
# if you get nothing but invalid message codes, re-run STEP 2
# you might have to sign up for a new Zillow account if you keep getting invalid results here
# there is a possibility you hit the rate limit

get_message_codes(la_df)

0: 0
1: 508
2: 508
3: 508
4: 508
5: 508
6: 508
7: 0
8: 508
9: 508
10: 508
11: 508
12: 508
13: 0
14: 0
15: 0
16: 508
17: 508
18: 508
19: 508
20: 508
21: 508
22: 0
23: 508
24: 0
25: 0
26: 508
27: 0
28: 508
29: 508
30: 508
31: 508
32: 0
33: 508
34: 508
35: 0
36: 508
37: 508
38: 508
39: 508
40: 508
41: 508
42: 508
43: 0
44: 508
45: 0
46: 508
47: 508
48: 508


In [30]:
# STEP 4: DROP INVALID ENTRIES FROM DATAFRAME 
# cull all the rows where houses do not exist at the address
# take what is valid (message code of '0')
# the code sometimes might break/not get a response from the server so it's better to take what IS valid

la_df = la_df[la_df.message_code == '0']

# take out items that does not belong to the select city
la_df=la_df[la_df.city_state_zip.str.contains(city1) == True]

la_df

Unnamed: 0,zpid,address,city_state_zip,latitude,longitude,message_code,zestimate,valuation_high,valuation_low,home value index,tax assessment,tax assess year,year built,lot size,finished sq ft,bedrooms,bathrooms
0,,2751 E Ave R,Palmdale CA 93550,34.5729,-118.079,0,,,,,,,,,,,
7,,Country Club Dr,Palmdale CA 93551,34.6015,-118.135,0,,,,,,,,,,,
13,,36800 Sierra Hwy,Palmdale CA 93550,34.5499,-118.107,0,,,,,,,,,,,
14,,1750 E Ave Q14,Palmdale CA 93550,34.5739,-118.096,0,,,,,,,,,,,
15,,Division St,Palmdale CA 93551,34.6076,-118.128,0,,,,,,,,,,,
22,,143 W Ave S-14,Palmdale CA 93551,34.5445,-118.13,0,,,,,,,,,,,
24,,820 W Avenue P,Palmdale CA 93551,34.6016,-118.147,0,,,,,,,,,,,
25,,2634 E Ave Q-15,Palmdale CA 93550,34.5731,-118.083,0,,,,,,,,,,,
27,,38969 Yucca Tree St,Palmdale CA 93551,34.591,-118.15,0,,,,,,,,,,,
32,,450 W Ave O,Palmdale CA 93551,34.6175,-118.132,0,,,,,,,,,,,


In [31]:
# STEP 5: SEARCH ZILLOW AND GET ZESTIMATE, BEDROOMS, & BATHROOMS
# fill the dataframe with the data

search_zillow(la_df)
la_df

row 0: 2751 E Ave R Palmdale CA 93550
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1gbvc8dh5vv_1vfab&address=2751%20E%20Ave%20R&citystatezip=%20Palmdale%20CA%2093550
zestimate (value): 278219
zestimate (value): 377781
zestimate (value): 360431


row 7: Country Club Dr Palmdale CA 93551
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1gbvc8dh5vv_1vfab&address=Country%20Club%20Dr&citystatezip=%20Palmdale%20CA%2093551
zestimate (value): 286090
zestimate (value): 377011
zestimate (value): 313716
zestimate (value): 386083
zestimate (value): 313160
zestimate (value): 314113
zestimate (value): 379747
zestimate (value): 314172
zestimate (value): 285960
zestimate (value): 376242
zestimate (value): 422277
zestimate (value): 376941
zestimate (value): 314133
zestimate (value): 353620
zestimate (value): 313973
zestimate (value): 375427
zestimate (value): 315499
zestimate (value): 440641
zestimate (value): 313998
zestimate (value): 341711
zestimate 

Unnamed: 0,zpid,address,city_state_zip,latitude,longitude,message_code,zestimate,valuation_high,valuation_low,home value index,tax assessment,tax assess year,year built,lot size,finished sq ft,bedrooms,bathrooms
0,65246195,2751 E Ave R,Palmdale CA 93550,34.5729,-118.079,0,360431.0,378453.0,342409.0,252000,289000.0,2017,2004.0,6275,2759.0,4.0,3.0
7,20244263,Country Club Dr,Palmdale CA 93551,34.6015,-118.135,0,313321.0,328987.0,297655.0,252000,269504.0,2017,1965.0,8925,1674.0,3.0,2.0
13,95677143,36800 Sierra Hwy,Palmdale CA 93550,34.5499,-118.107,0,,,,252000,78515.0,2017,,2716236,,,
14,20249421,1750 E Ave Q14,Palmdale CA 93550,34.5739,-118.096,0,144697.0,153379.0,134568.0,252000,45884.0,2017,1989.0,1306,832.0,3.0,2.0
15,95598445,Division St,Palmdale CA 93551,34.6076,-118.128,0,,,,252000,63250.0,2017,,39997,,,
22,20271504,143 W Ave S-14,Palmdale CA 93551,34.5445,-118.13,0,716538.0,773861.0,680711.0,252000,535715.0,2017,1992.0,98010,5622.0,6.0,6.0
24,20242766,820 W Avenue P,Palmdale CA 93551,34.6016,-118.147,0,477938.0,501835.0,454041.0,252000,313674.0,2017,1981.0,41250,2366.0,4.0,3.0
25,20250472,2634 E Ave Q-15,Palmdale CA 93550,34.5731,-118.083,0,301701.0,316786.0,286616.0,252000,143823.0,2017,1989.0,7125,2000.0,4.0,3.0
27,20239363,38969 Yucca Tree St,Palmdale CA 93551,34.591,-118.15,0,278073.0,291977.0,264169.0,252000,141814.0,2017,1953.0,10890,1476.0,3.0,2.0
32,20242991,450 W Ave O,Palmdale CA 93551,34.6175,-118.132,0,362255.0,380368.0,344142.0,252000,307400.0,2017,1987.0,8030,2309.0,5.0,3.0


In [13]:
# do any further data cleaning you need to yourself
# for example, dropping any rows with NaN values
la_df = la_df.dropna(axis=0, how='any')
la_df

# maybe write to CSV to store the data for usage later/before doing plots? so you don't have to rerun everything

Unnamed: 0,zpid,address,city_state_zip,latitude,longitude,message_code,zestimate,valuation_high,valuation_low,home value index,tax assessment,tax assess year,year built,lot size,finished sq ft,bedrooms,bathrooms
3,9728752,1227 Willow Street Pike,Lancaster PA 17602,40.0145,-76.2964,0,677382,1083811,494489,229200,315800.0,2017,1927,261360,4213,6,4.0
11,9759174,288 Rhoda Dr,Lancaster PA 17601,40.0818,-76.3017,0,297885,312779,282991,212200,191700.0,2017,1969,20909,1350,4,2.5
13,9728353,1556 Braxton Dr,Lancaster PA 17602,40.0196,-76.2663,0,211033,221585,196261,229200,126000.0,2017,1989,9147,1367,3,2.0
14,9729126,1005 Willow Street Pike,Lancaster PA 17602,40.0213,-76.3032,0,237178,249037,225319,229200,148300.0,2017,1919,13068,2178,4,1.5


In [32]:
# review current data and see if more data is needed (at least 50 valid data per city)
add_df = pd.DataFrame(la_df)
final_df = add_df
final_df = final_df.reset_index(drop=True)
len(final_df)

13

In [33]:
# If minimum 50 valid data counts is not met, we will loop through the codes above to make sure we have sufficient data
while(len(final_df)<50):
    generate_addresses(LAT,LNG, la_df) 
    get_message_codes(la_df)
    la_df = la_df[la_df.message_code == '0']
    la_df=la_df[la_df.city_state_zip.str.contains(city1) == True]
    search_zillow(la_df)
    la_df = la_df.dropna(axis=0, how='any')
    add_df = add_df.append(la_df, ignore_index=True)
    final_df = add_df.drop_duplicates()
len(final_df)

0: 508
7: 508
13: 508
14: 508
15: 508
22: 508
24: 508
25: 508
27: 0
32: 508
35: 508
43: 508
45: 508
1: 508
2: 508
3: 508
4: 508
5: 508
6: 508
8: 508
9: 508
10: 508
11: 0
12: 0
16: 0
17: 508
18: 508
19: 0
20: 508
21: 508
23: 508
26: 508
28: 0
29: 0
30: 508
31: 0
33: 0
34: 508
36: 508
37: 0
38: 508
39: 508
40: 0
41: 508
42: 508
44: 0
46: 508
47: 0
48: 508
row 27: 39261 10th St E Palmdale CA 93550
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1gbvc8dh5vv_1vfab&address=39261%2010th%20St%20E&citystatezip=%20Palmdale%20CA%2093550
zestimate (value): 340144


row 11: 1656 Korat Dr Palmdale CA 93551
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1gbvc8dh5vv_1vfab&address=1656%20Korat%20Dr&citystatezip=%20Palmdale%20CA%2093551
zestimate (value): 322891


row 12: 1607 Via Verde Ave Palmdale CA 93550
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1gbvc8dh5vv_1vfab&address=1607%20Via%20Verde%20Ave&citystatezip=%20Palmdale%2

23: 0
36: 508
39: 508
46: 508
48: 0
0: 508
1: 508
2: 0
3: 508
4: 0
6: 508
7: 508
8: 0
9: 508
10: 508
11: 508
12: 508
13: 0
14: 508
15: 508
16: 508
17: 508
19: 508
20: 508
21: 508
24: 508
25: 508
26: 0
27: 508
28: 508
29: 508
30: 508
32: 508
33: 0
34: 508
35: 508
37: 508
38: 0
40: 508
41: 508
42: 508
43: 0
44: 508
45: 508
47: 508
row 23: 39201 20th St E Palmdale CA 93550
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1gbvc8dh5vv_1vfab&address=39201%2020th%20St%20E&citystatezip=%20Palmdale%20CA%2093550
not for sale


row 48: 37135 Tovey Ave Palmdale CA 93551
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1gbvc8dh5vv_1vfab&address=37135%20Tovey%20Ave&citystatezip=%20Palmdale%20CA%2093551
zestimate (value): 315233


row 2: 36213 El Camino Dr Palmdale CA 93551
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1gbvc8dh5vv_1vfab&address=36213%20El%20Camino%20Dr&citystatezip=%20Palmdale%20CA%2093551
zestimate (value): 5990

row 23: 36506 China Pl Palmdale CA 93551
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1gbvc8dh5vv_1vfab&address=36506%20China%20Pl&citystatezip=%20Palmdale%20CA%2093551
zestimate (value): 557260


row 24: 2208 E Ave R12 Palmdale CA 93550
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1gbvc8dh5vv_1vfab&address=2208%20E%20Ave%20R12&citystatezip=%20Palmdale%20CA%2093550
zestimate (value): 281597


row 31: 38409 Sphynx Dr Palmdale CA 93551
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1gbvc8dh5vv_1vfab&address=38409%20Sphynx%20Dr&citystatezip=%20Palmdale%20CA%2093551
zestimate (value): 413033


row 36: 37302 Sand Brook Dr Palmdale CA 93550
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1gbvc8dh5vv_1vfab&address=37302%20Sand%20Brook%20Dr&citystatezip=%20Palmdale%20CA%2093550
zestimate (value): 288136


row 38: 2424 Swallow Ln Palmdale CA 93550
https://www.zillow.com/webservice/GetDeepSearc

59

In [34]:
# review final city data before save the file
final_df.head()

Unnamed: 0,zpid,address,city_state_zip,latitude,longitude,message_code,zestimate,valuation_high,valuation_low,home value index,tax assessment,tax assess year,year built,lot size,finished sq ft,bedrooms,bathrooms
0,65246195,W 7th St,Palmdale CA 93551,34.5819,-118.142,0,360431.0,378453.0,342409.0,252000,289000.0,2017,2004.0,6275,2759.0,4.0,3.0
1,20244263,Country Club Dr,Palmdale CA 93551,34.6015,-118.135,0,313321.0,328987.0,297655.0,252000,269504.0,2017,1965.0,8925,1674.0,3.0,2.0
2,95677143,36800 Sierra Hwy,Palmdale CA 93550,34.5499,-118.107,0,,,,252000,78515.0,2017,,2716236,,,
3,20249421,1750 E Ave Q14,Palmdale CA 93550,34.5739,-118.096,0,144697.0,153379.0,134568.0,252000,45884.0,2017,1989.0,1306,832.0,3.0,2.0
4,95598445,Division St,Palmdale CA 93551,34.6076,-118.128,0,,,,252000,63250.0,2017,,39997,,,


In [35]:
# Save file into Clean Data folder
final_df.to_csv(f'../Clean_Data/{city1}_zillow_data.csv')