# Read before running the code

1. The Zillow API have 1000 daily request limitation, please use your own zws_id and gkey (google api) to run the code for your city (each person responsible for two city)
2. Various function were written to pull: 1) 200 lat/long by city per request; 2) Address generated by lat/long; 3) Zillow information based on address. Nothing need to change / update for the function when run through the code.
3. When first run the code, please go the the next Markdown to read the instruction. Before running the code to get housing data, please __enter the city you want to get the data with__.
4. To gather valid data, we removed data that could not generate complete and valid zillow housing information. In addition, we will remove any data that did not belong to the right city.
5. To insure that we have at least 50 valid data for each city, we run a loop at the end if valid data count is less than 50
6. Once we have more than 50 valid data for each city, the code will save the file into the Clean Data Folder
7. When code related to pull Lat/Lng information from Google API have "Index" error, that means we have reached the API limit and need to change the API key
8. When message code form zillow is 7, it means we have reached the API limit and we need to change to another key

In [1]:
# ZILLOW DATA EXTRACTION WRITTEN BY SONIA YANG

# Dependencies
import requests
import urllib
import random
import math
import pandas as pd
import xml.etree.ElementTree as ET
import time
from config import zws_id, gkey # please use your own Zillow & Google API keys!
from urllib.request import urlopen

In [2]:
# FUNCTION to grab the exact address based on longitude and latitude
# modified from here https://gist.github.com/bradmontgomery/5397472
# their example didn't include an API key, but I added it otherwise you'd hit the rate limit easily

def reverse_geocode(latitude, longitude):
    # Did the geocoding request comes from a device with a
    # location sensor? Must be either true or false
    sensor = 'true'

    # Hit Google's reverse geocoder directly
    # NOTE: I *think* their terms state that you're supposed to
    # use google maps if you use their api for anything.
    base = "https://maps.googleapis.com/maps/api/geocode/json?"
    params = "latlng={lat},{lon}&sensor={sen}&key={key}".format(
        lat=latitude,
        lon=longitude,
        sen=sensor,
        key=gkey
    )
    url = "{base}{params}".format(base=base, params=params)
    #print(url)
    response = requests.get(url).json()
    address = response['results'][0]['formatted_address']
    return address

In [3]:
# FUNCTION to generate random lat & lng within a certain radius 
# modified from here: http://hadoopguru.blogspot.com/2014/12/python-generate-random-latitude-and.html
# changed to take in an empty initial dataframe and load in the data + return it
# this calls the reverse geocode function to grab the addresses of each randomly generated lat & lng

def generate_addresses(latitude, longitude, df):
    
    radius = 5000                         #Choose your own radius
    radiusInDegrees=radius/111300            
    r = radiusInDegrees

    counter = 0
    
    for i in range(1,50):                 #Choose number of Lat Long to be generated

        u = float(random.uniform(0.0,1.0))
        v = float(random.uniform(0.0,1.0))

        w = r * math.sqrt(u)
        t = 2 * math.pi * v
        x = w * math.cos(t) 
        y = w * math.sin(t)

        xLat  = x + latitude
        yLng = y + longitude

        df.set_value(counter, "latitude", xLat)
        df.set_value(counter, "longitude", yLng)
        
        #print(format(counter) + ": " + format(xLat) + ", " + format(yLng))
        address = reverse_geocode(xLat, yLng).split(',')
        citystatezip = address[1] + address[2]
        
        df.set_value(counter, "address", address[0])
        df.set_value(counter, "city_state_zip", citystatezip)
        
        # Add to counter
        counter = counter + 1
    
    return df

In [4]:
# FUNCTION to call Zillow API's GetSearchResults and will check to see if a house exists at that address
# message code will be written to dataframe
# zillow url format
# http://www.zillow.com/webservice/GetSearchResults.htm?zws-id=<ZWSID>&address=2114+Bigelow+Ave&citystatezip=Seattle%2C+WA
        
def get_message_codes(df):

    for index, row in df.iterrows():

        try:
            url = 'https://www.zillow.com/webservice/GetSearchResults.htm?zws-id='
            address = row['address']
            citystatezip =row['city_state_zip']


            query_url = url + zws_id + '&address=' + urllib.parse.quote(address) + '&citystatezip=' + urllib.parse.quote(citystatezip) 
            #print(query_url)

            root = ET.parse(urlopen(query_url)).getroot()

            for message in root.iter('message'):
                message_code = message[1].text

            print(format(index) + ": " + message_code)

            df.set_value(index, 'message_code', message_code)

            time.sleep(0.5) #necessary bc bombarding Zillow with API calls doesn't allow enough time to respond to each

        except:
            break
    

In [5]:
# FUNCTION to call Zillow's GetDeepSearchResults and look up Zestimate, bed, and bath
# http://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=<ZWSID>&address=2114+Bigelow+Ave&citystatezip=Seattle%2C+WA
# there are some limitations such as multiple zestimates depending on when the house was sold/if it was sold multiple times
# the code to handle that would get too convoluted so I am just writing in the most recent (according to the API) values
# probably not what we would do in real life
# but a decision we made re: the scope of a classroom project on a short time constraint

def search_zillow(df):
    
    for index, row in df.iterrows():
        try:
            url = 'https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id='
            address = df['address'][index]
            citystatezip = df['city_state_zip'][index]


            query_url = url + zws_id + '&address=' + urllib.parse.quote(address) + '&citystatezip=' + urllib.parse.quote(citystatezip) 


            root = ET.parse(urlopen(query_url)).getroot()

            print("row " + format(index) + ": " + address + citystatezip)
            print(query_url)

            '''
               "year built","lot size","finished sq ft"'''
            
            #zpid
            for zpid in root.iter('zpid'):
                df.set_value(index,'zpid', zpid.text)
            
            # we already have the address from the address + citystatezip variables
            # so we don't need to grab it again
            # same with lat & lng already being in the table
            
            #valuation (high and low)
            for valuation in root.iter('valuationRange'):
                highValuation = valuation[1].text
                lowValuation = valuation[0].text
                df.set_value(index, 'valuation_high', highValuation)
                df.set_value(index, 'valuation_low', lowValuation)
            
            #zestimate
            for zestimate in root.iter('zestimate'):
                zestimate_value = zestimate[0].text

                if zestimate_value is None:
                    print('not for sale')
                else:
                    print ('zestimate (value): ' + format(zestimate[0].text)) 
                    df.set_value(index, 'zestimate', zestimate_value)
             
            #home value index
            for zindexValue in root.iter('zindexValue'):
                df.set_value(index, 'home value index', zindexValue.text)
            
            #tax assessment
            for taxAssessment in root.iter('taxAssessment'):
                df.set_value(index, 'tax assessment', taxAssessment.text)
                
            #tax assessment year
            for taxAssessmentYear in root.iter('taxAssessmentYear'):
                df.set_value(index, 'tax assess year', taxAssessmentYear.text)
                
            #year built
            for yearBuilt in root.iter('yearBuilt'):
                df.set_value(index, 'year built', yearBuilt.text)
             
            #lot size sq ft
            for lotSizeSqFt in root.iter('lotSizeSqFt'):
                df.set_value(index, 'lot size', lotSizeSqFt.text)
            
            #finished sq ft
            for finishedSqFt in root.iter('finishedSqFt'):
                df.set_value(index, 'finished sq ft', finishedSqFt.text)
            
            #bedrooms
            for bedroom in root.iter('bedrooms'):
                bedrooms = bedroom.text
                #print("bedrooms: " + bedrooms)
                df.set_value(index, 'bedrooms', bedrooms)

            #bathrooms
            for bathroom in root.iter('bathrooms'):
                bathrooms = bathroom.text
                #print("bathrooms: " + bathrooms + "\n")
                df.set_value(index, 'bathrooms', bathrooms)           
            
            print('\n')

            time.sleep(0.5) 


        except:
            break


<h2>HOW TO RUN THIS CODE</h2>
<ol>
<li>Initialize an empty dataframe with the fields as marked below</li>
<li>Call the <strong>generate_addresses</strong> function passing in your empty dataframe</li>
<li>Call the <strong>get_message_codes</strong> to update your dataframe with message codes indicating whether or not a valid property exists at each address. <strong>IMPORTANT:</strong> please register your own Zillow account/get your own key for this!! If we all keep using the same one we'll easily hit the rate limit </li>
<li>Drop the rows in the dataframe for which a property does not exist at that address</li>
<li>Call the <strong>search_zillow</strong> function to get the zestimate (aka price of the property), # of bedrooms, and # of bathrooms</li>
<li>I did not include it in my code, but once you get a sample size of data that you are satisfied with for the city, maybe write it out to a CSV so you don't have to keep running this code/can use it later</li>
</ol>

feel free to comment out my print statements while the functions are running if you find them distracting

In [16]:
# Read cities file to pull the Latitude and Longtitude
Cities=pd.read_csv('../Raw_Data/LA_cities_Lat_lng_codes_data.csv')
print(f'{Cities["address"]}')
city1 = input("Please input first city your want to pull data")
selectcity = Cities.loc[Cities["address"] == city1, :]
LAT = selectcity.iloc[0,1]
LNG = selectcity.iloc[0,2]

0       Los Angeles
1        Long Beach
2          Glendale
3         Lancaster
4          Palmdale
5     Santa Clarita
6            Pomona
7          Torrance
8          Pasadena
9         Inglewood
10          Compton
11           Downey
12      West Covina
13          Norwalk
14          Burbank
15       South Gate
16         El Monte
17         Whittier
18         Alhambra
Name: address, dtype: object
Please input first city your want to pull dataTorrance


In [17]:
# HOW TO RUN ALL THE FUNCTIONS, USING LOS ANGELES AS AN EXAMPLE

# coordinates taken from the CitiesGeo_Output.csv
# we should manually run the following code on each individual city instead of nesting it in another loop
# while this may be hardcoded, it's better than waiting on one gigantic loop that takes forever

# STEP 1: INITALIZE THE DATAFRAME
# if we need any more fields, let me know
la_df = pd.DataFrame({"zpid": '',
                      "address":'',
                      "city_state_zip":'',
                      "latitude":'',
                      "longitude":'',
                      "message_code":'',
                      "zestimate":'',
                      "valuation_high":'',
                      "valuation_low": '',
                      "home value index":'',
                      "tax assessment":'',
                      "tax assess year":'',
                      "year built":'',
                      "lot size":'',
                      "finished sq ft":'',
                      "bedrooms":'',
                      "bathrooms":''}, index=[0])

# reorder the columns
la_df = la_df[["zpid", "address","city_state_zip","latitude","longitude","message_code","zestimate",
               "valuation_high","valuation_low","home value index","tax assessment","tax assess year",
               "year built","lot size","finished sq ft","bedrooms","bathrooms"]]

# STEP 2: GENERATE RANDOM ADDRESSES IN THE DESIGNATED AREA
# pass in the coordinates for Los Angeles plus the empty dataframe
generate_addresses(LAT,LNG, la_df) 

#la_df

Unnamed: 0,zpid,address,city_state_zip,latitude,longitude,message_code,zestimate,valuation_high,valuation_low,home value index,tax assessment,tax assess year,year built,lot size,finished sq ft,bedrooms,bathrooms
0,,23536 Evalyn Ave,Torrance CA 90505,33.8117,-118.359,,,,,,,,,,,,
1,,3480 Del Amo Cir N,Torrance CA 90503,33.8338,-118.346,,,,,,,,,,,,
2,,841-899 Sartori Ave,Torrance CA 90501,33.8382,-118.319,,,,,,,,,,,,
3,,3110 Merrill Dr,Torrance CA 90503,33.8301,-118.342,,,,,,,,,,,,
4,,1536 W 222nd St,Torrance CA 90501,33.8255,-118.305,,,,,,,,,,,,
5,,3737 Pacific Coast Hwy,Torrance CA 90505,33.8051,-118.35,,,,,,,,,,,,
6,,4441 Redondo Beach Blvd,Lawndale CA 90260,33.8735,-118.354,,,,,,,,,,,,
7,,3703-3777 Redondo Beach Blvd,Torrance CA 90504,33.8802,-118.34,,,,,,,,,,,,
8,,1517-1521 Crenshaw Blvd,Torrance CA 90501,33.8332,-118.329,,,,,,,,,,,,
9,,1111 Barbara St,Redondo Beach CA 90277,33.831,-118.375,,,,,,,,,,,,


In [18]:
# STEP 3: CALL THE ZILLOW API TO GET MESSAGE CODES
# 0 means there is a valid property at that address
# 508 and anything else means there isn't
# if you get nothing but invalid message codes, re-run STEP 2
# you might have to sign up for a new Zillow account if you keep getting invalid results here
# there is a possibility you hit the rate limit

get_message_codes(la_df)

0: 0
1: 508
2: 508
3: 0
4: 0
5: 508
6: 508
7: 508
8: 508
9: 0
10: 0
11: 508
12: 508
13: 508
14: 0
15: 0
16: 508
17: 508
18: 508
19: 0
20: 0
21: 0
22: 508
23: 508
24: 508
25: 508
26: 508
27: 0
28: 508
29: 0
30: 508
31: 0
32: 0
33: 0
34: 508
35: 508
36: 508
37: 508
38: 508
39: 508
40: 508
41: 0
42: 508
43: 508
44: 508
45: 508
46: 0
47: 0
48: 508


In [19]:
# STEP 4: DROP INVALID ENTRIES FROM DATAFRAME 
# cull all the rows where houses do not exist at the address
# take what is valid (message code of '0')
# the code sometimes might break/not get a response from the server so it's better to take what IS valid

la_df = la_df[la_df.message_code == '0']

# take out items that does not belong to the select city
la_df=la_df[la_df.city_state_zip.str.contains(city1) == True]

la_df

Unnamed: 0,zpid,address,city_state_zip,latitude,longitude,message_code,zestimate,valuation_high,valuation_low,home value index,tax assessment,tax assess year,year built,lot size,finished sq ft,bedrooms,bathrooms
0,,23536 Evalyn Ave,Torrance CA 90505,33.8117,-118.359,0,,,,,,,,,,,
3,,3110 Merrill Dr,Torrance CA 90503,33.8301,-118.342,0,,,,,,,,,,,
4,,1536 W 222nd St,Torrance CA 90501,33.8255,-118.305,0,,,,,,,,,,,
10,,2377 Del Amo Blvd,Torrance CA 90501,33.8473,-118.326,0,,,,,,,,,,,
14,,2801 Sepulveda Blvd,Torrance CA 90505,33.8245,-118.332,0,,,,,,,,,,,
15,,3132 185th St,Torrance CA 90504,33.8626,-118.328,0,,,,,,,,,,,
20,,4109 Emerald St,Torrance CA 90503,33.8421,-118.357,0,,,,,,,,,,,
21,,2127 W 237th St,Torrance CA 90501,33.81,-118.318,0,,,,,,,,,,,
27,,21734 Evalyn Ave,Torrance CA 90503,33.8307,-118.362,0,,,,,,,,,,,
29,,4202 Carmen St,Torrance CA 90503,33.833,-118.358,0,,,,,,,,,,,


In [20]:
# STEP 5: SEARCH ZILLOW AND GET ZESTIMATE, BEDROOMS, & BATHROOMS
# fill the dataframe with the data

search_zillow(la_df)
la_df

row 0: 23536 Evalyn Ave Torrance CA 90505
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz18olpf98c97_2xcn2&address=23536%20Evalyn%20Ave&citystatezip=%20Torrance%20CA%2090505
zestimate (value): 942689


row 3: 3110 Merrill Dr Torrance CA 90503
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz18olpf98c97_2xcn2&address=3110%20Merrill%20Dr&citystatezip=%20Torrance%20CA%2090503
zestimate (value): 457235
zestimate (value): 632955
zestimate (value): 574763
zestimate (value): 456297
zestimate (value): 618609
zestimate (value): 618609
zestimate (value): 565105
zestimate (value): 779474
zestimate (value): 570333
zestimate (value): 533059
zestimate (value): 439905
zestimate (value): 798844
zestimate (value): 709909
zestimate (value): 709529
zestimate (value): 572799
zestimate (value): 582634
zestimate (value): 892815
zestimate (value): 718690
zestimate (value): 450666
zestimate (value): 885144
zestimate (value): 739529
zestimate (value): 481015
zest

Unnamed: 0,zpid,address,city_state_zip,latitude,longitude,message_code,zestimate,valuation_high,valuation_low,home value index,tax assessment,tax assess year,year built,lot size,finished sq ft,bedrooms,bathrooms
0,21334983,23536 Evalyn Ave,Torrance CA 90505,33.8117,-118.359,0,942689,989823,895555,855300.0,156339.0,2017.0,1962.0,6213.0,1764.0,3.0,2.0
3,2145608762,3110 Merrill Dr,Torrance CA 90503,33.8301,-118.342,0,721446,815234,483369,,,,1963.0,174240.0,1315.0,2.0,2.0
4,21265915,1536 W 222nd St,Torrance CA 90501,33.8255,-118.305,0,598835,628777,568893,459700.0,334276.0,2017.0,1952.0,7200.0,1549.0,3.0,2.0
10,21268886,2377 Del Amo Blvd,Torrance CA 90501,33.8473,-118.326,0,645824,678115,613533,,449747.0,2017.0,1949.0,4981.0,1016.0,3.0,1.0
14,21273938,2801 Sepulveda Blvd,Torrance CA 90505,33.8245,-118.332,0,801896,841991,761801,,595241.0,2017.0,1995.0,164221.0,1809.0,3.0,3.0
15,20372467,3132 185th St,Torrance CA 90504,33.8626,-118.328,0,721006,757056,684956,630700.0,320768.0,2017.0,1950.0,6499.0,1256.0,3.0,2.0
20,21332589,4109 Emerald St,Torrance CA 90503,33.8421,-118.357,0,1451973,1655249,1176098,744600.0,541658.0,2017.0,1963.0,6901.0,4302.0,8.0,6.0
21,21279166,2127 W 237th St,Torrance CA 90501,33.81,-118.318,0,875936,919733,832139,741900.0,105357.0,2017.0,1965.0,5784.0,2154.0,5.0,3.0
27,21328017,21734 Evalyn Ave,Torrance CA 90503,33.8307,-118.362,0,957442,1005314,909570,809400.0,934000.0,2017.0,1955.0,6227.0,1740.0,4.0,2.0
29,21333339,4202 Carmen St,Torrance CA 90503,33.833,-118.358,0,930185,976694,883676,809400.0,439353.0,2017.0,1956.0,5601.0,1720.0,3.0,3.0


In [21]:
# do any further data cleaning you need to yourself
# for example, dropping any rows with NaN values
la_df = la_df.dropna(axis=0, how='any')
la_df

# maybe write to CSV to store the data for usage later/before doing plots? so you don't have to rerun everything

Unnamed: 0,zpid,address,city_state_zip,latitude,longitude,message_code,zestimate,valuation_high,valuation_low,home value index,tax assessment,tax assess year,year built,lot size,finished sq ft,bedrooms,bathrooms
0,21334983,23536 Evalyn Ave,Torrance CA 90505,33.8117,-118.359,0,942689,989823,895555,855300,156339.0,2017,1962,6213,1764,3,2.0
4,21265915,1536 W 222nd St,Torrance CA 90501,33.8255,-118.305,0,598835,628777,568893,459700,334276.0,2017,1952,7200,1549,3,2.0
15,20372467,3132 185th St,Torrance CA 90504,33.8626,-118.328,0,721006,757056,684956,630700,320768.0,2017,1950,6499,1256,3,2.0
20,21332589,4109 Emerald St,Torrance CA 90503,33.8421,-118.357,0,1451973,1655249,1176098,744600,541658.0,2017,1963,6901,4302,8,6.0
21,21279166,2127 W 237th St,Torrance CA 90501,33.81,-118.318,0,875936,919733,832139,741900,105357.0,2017,1965,5784,2154,5,3.0
27,21328017,21734 Evalyn Ave,Torrance CA 90503,33.8307,-118.362,0,957442,1005314,909570,809400,934000.0,2017,1955,6227,1740,4,2.0
29,21333339,4202 Carmen St,Torrance CA 90503,33.833,-118.358,0,930185,976694,883676,809400,439353.0,2017,1956,5601,1720,3,3.0
32,20372624,18700 Crenshaw Blvd,Torrance CA 90504,33.8613,-118.325,0,604186,634395,573977,630700,386720.0,2017,1951,5644,850,3,1.0
41,21333846,22733 Anza Ave,Torrance CA 90505,33.8216,-118.36,0,859126,902082,816170,869800,643589.0,2017,1953,5375,1616,3,1.0
46,2117437376,2602 Cabrillo Ave,Torrance CA 90501,33.8214,-118.315,0,753742,829116,693443,595500,461549.0,2017,1990,5505,1952,4,3.0


In [22]:
# review current data and see if more data is needed (at least 50 valid data per city)
add_df = pd.DataFrame(la_df)
final_df = add_df
final_df = final_df.reset_index(drop=True)
len(final_df)

10

In [23]:
# If minimum 50 valid data counts is not met, we will loop through the codes above to make sure we have sufficient data
while(len(final_df)<50):
    generate_addresses(LAT,LNG, la_df) 
    get_message_codes(la_df)
    la_df = la_df[la_df.message_code == '0']
    la_df=la_df[la_df.city_state_zip.str.contains(city1) == True]
    search_zillow(la_df)
    la_df = la_df.dropna(axis=0, how='any')
    add_df = add_df.append(la_df, ignore_index=True)
    final_df = add_df.drop_duplicates()
len(final_df)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.loc[index, col] = value


0: 508
4: 508
15: 508
20: 508
21: 508
27: 508
29: 508
32: 0
41: 0
46: 0
1: 0
2: 0
3: 508
5: 0
6: 508
7: 0
8: 508
9: 0
10: 0
11: 508
12: 508
13: 508
14: 508
16: 0
17: 508
18: 508
19: 508
22: 0
23: 0
24: 508
25: 508
26: 508
28: 508
30: 508
31: 508
33: 508
34: 508
35: 508
36: 508
37: 0
38: 0
39: 508
40: 508
42: 508
43: 508
44: 508
45: 0
47: 508
48: 508
row 32: 1924 Middlebrook Rd Torrance CA 90501
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz18olpf98c97_2xcn2&address=1924%20Middlebrook%20Rd&citystatezip=%20Torrance%20CA%2090501
zestimate (value): 899318


row 41: 22741 Date Ave Torrance CA 90505
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz18olpf98c97_2xcn2&address=22741%20Date%20Ave&citystatezip=%20Torrance%20CA%2090505
zestimate (value): 805505


row 46: 19802 Tomlee Ave Torrance CA 90503
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz18olpf98c97_2xcn2&address=19802%20Tomlee%20Ave&citystatezip=%20Torrance%20C

row 6: 17318 Casimir Ave Torrance CA 90504
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz18olpf98c97_2xcn2&address=17318%20Casimir%20Ave&citystatezip=%20Torrance%20CA%2090504
zestimate (value): 528203


row 8: 2365 Plaza del Amo Torrance CA 90501
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz18olpf98c97_2xcn2&address=2365%20Plaza%20del%20Amo&citystatezip=%20Torrance%20CA%2090501
zestimate (value): 501828


row 9: 1435 W 218th St Torrance CA 90501
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz18olpf98c97_2xcn2&address=1435%20W%20218th%20St&citystatezip=%20Torrance%20CA%2090501
zestimate (value): 669819


row 24: 1648 W 227th St Torrance CA 90501
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz18olpf98c97_2xcn2&address=1648%20W%20227th%20St&citystatezip=%20Torrance%20CA%2090501
not for sale
zestimate (value): 589690
not for sale


row 26: 21730 Ladeene Ave Torrance CA 90503
https://www.z

52

In [24]:
# review final city data before save the file
final_df.head()

Unnamed: 0,zpid,address,city_state_zip,latitude,longitude,message_code,zestimate,valuation_high,valuation_low,home value index,tax assessment,tax assess year,year built,lot size,finished sq ft,bedrooms,bathrooms
0,21334983,1101-1105 Lilienthal Ln,Redondo Beach CA 90278,33.8642,-118.366,0,942689,989823,895555,855300,156339.0,2017,1962,6213,1764,3,2.0
1,21265915,1536 W 222nd St,Torrance CA 90501,33.8255,-118.305,0,598835,628777,568893,459700,334276.0,2017,1952,7200,1549,3,2.0
2,20372467,3132 185th St,Torrance CA 90504,33.8626,-118.328,0,721006,757056,684956,630700,320768.0,2017,1950,6499,1256,3,2.0
3,21332589,4109 Emerald St,Torrance CA 90503,33.8421,-118.357,0,1451973,1655249,1176098,744600,541658.0,2017,1963,6901,4302,8,6.0
4,21279166,2127 W 237th St,Torrance CA 90501,33.81,-118.318,0,875936,919733,832139,741900,105357.0,2017,1965,5784,2154,5,3.0


In [25]:
# Save file into Clean Data folder
final_df.to_csv(f'../Clean_Data/5-1.{city1}_zillow_data.csv')