I spent my time this week looking at a couple of Zillow datasets for the group project, which was intended to be about rent gouging post-wildfire in Sonoma and Napa counties. Through this data exploration, I ended up finding that this data I originally thought promising was going to be insufficient for this project. This led my partner and I to switch directions.

I looked into several [Zillow datasets](https://www.zillow.com/research/data/):
* Zillow Observed Rent Index (ZORI)
"Zillow Observed Rent Index (ZORI): A smoothed measure of the typical observed market rate rent across a given region. ZORI is a repeat-rent index that is weighted to the rental housing stock to ensure representativeness across the entire market, not just those homes currently listed for-rent. The index is dollar-denominated by computing the mean of listed rents that fall into the 40th to 60th percentile range for all homes and apartments in a given region, which is once again weighted to reflect the rental housing stock."

* Zillow Home Value Index (ZHVI)
"A smoothed, seasonally adjusted measure of the typical home value and market changes across a given region and housing type. It reflects the typical value for homes in the 35th to 65th percentile range. The raw version of that mid-tier ZHVI time series is also available..."

# ZORI

First, I downloaded ZORI data at the zip code level. Unfortunately this dataset didn't contain any data for Sonoma or Napa counties. I found this out by opening the .csv in Excel and searching for some of the largest MSA's in the area - including Santa Rosa.

In [4]:
# first, import the geopandas library
import geopandas

# next, import the data
ZORI_zip = geopandas.read_file('../Data/Zip_ZORI_CA.csv')

In [5]:
ZORI_zip

Unnamed: 0,RegionID,RegionName,SizeRank,MsaName,2014-01,2014-02,2014-03,2014-04,2014-05,2014-06,...,2021-01,2021-02,2021-03,2021-04,2021-05,2021-06,2021-07,2021-08,2021-09,geometry
0,97564,94109,14,"San Francisco, CA",2290,2336,2382,2428,2473,2518,...,2654,2645,2635,2626,2619,2613,2606,2603,2600,
1,96107,90250,23,"Los Angeles-Long Beach-Anaheim, CA",1081,1084,1087,1090,1094,1097,...,1689,1696,1702,1708,1715,1721,1727,1734,1740,
2,97771,94565,41,"San Francisco, CA",1451,,,1501,1518,1534,...,2415,2432,2449,2466,2483,2500,2518,2536,2555,
3,96027,90046,45,"Los Angeles-Long Beach-Anaheim, CA",1705,1726,1747,1769,1788,1808,...,2359,2367,2374,2382,2391,2400,2409,2419,2429,
4,97711,94501,89,"San Francisco, CA",1681,1697,1713,1729,,1760,...,2321,2328,2334,2340,2346,2353,2359,2366,2373,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
359,96466,91708,7028,"Riverside, CA",,,2133,2134,2135,2136,...,2759,2808,2857,2905,2955,3005,3055,3107,3159,
360,96234,90755,7127,"Los Angeles-Long Beach-Anaheim, CA",,,,1615,,,...,2266,2280,2293,2307,2321,2335,2350,2365,,
361,96946,92610,7249,"Los Angeles-Long Beach-Anaheim, CA",2202,2205,2208,,2216,2221,...,2727,2759,2791,2824,2857,2890,2923,2958,2993,
362,96087,90211,7515,"Los Angeles-Long Beach-Anaheim, CA",1955,2011,2066,2122,2174,2225,...,2993,2990,2987,2983,2980,2977,2974,2972,2970,


## ZHVI Neighborhood
When the above was leading to a dead end, I tried to look at the home value index. It turns out that the home values cover a wider geographical area than the rent market. Below I'm importing and exploring the data.

In [10]:
# first, import the geopandas library
import geopandas as gpd

# next, import the data
ZHVI_neighborhood = geopandas.read_file('../Data/Neighborhood_zhvi_uc_sfrcondo_tier_0.33_0.67_sm_sa_month.csv')

In [11]:
ZHVI_neighborhood.shape

(16410, 270)

In [12]:
ZHVI_neighborhood.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 16410 entries, 0 to 16409
Columns: 270 entries, RegionID to geometry
dtypes: geometry(1), object(269)
memory usage: 33.8+ MB


In [13]:
ZHVI_neighborhood.head()

Unnamed: 0,RegionID,SizeRank,RegionName,RegionType,StateName,State,City,Metro,CountyName,2000-01-31,...,2020-12-31,2021-01-31,2021-02-28,2021-03-31,2021-04-30,2021-05-31,2021-06-30,2021-07-31,2021-08-31,geometry
0,274772,0,Northeast Dallas,Neighborhood,TX,TX,Dallas,Dallas-Fort Worth-Arlington,Dallas County,152919.0,...,352199.0,356725.0,362165.0,367250.0,371000.0,374851.0,380332.0,385945.0,395018.0,
1,112345,1,Maryvale,Neighborhood,AZ,AZ,Phoenix,Phoenix-Mesa-Scottsdale,Maricopa County,83271.0,...,221874.0,227592.0,230860.0,236779.0,241819.0,250354.0,257117.0,264016.0,269076.0,
2,192689,2,Paradise,Neighborhood,NV,NV,Las Vegas,Las Vegas-Henderson-Paradise,Clark County,152657.0,...,292090.0,295361.0,298475.0,301960.0,306035.0,313424.0,323097.0,332994.0,341428.0,
3,270958,3,Upper West Side,Neighborhood,NY,NY,New York,New York-Newark-Jersey City,New York County,391890.0,...,1381298.0,1371749.0,1366293.0,1358496.0,1372387.0,1374302.0,1373921.0,1375277.0,1386160.0,
4,118208,4,South Los Angeles,Neighborhood,CA,CA,Los Angeles,Los Angeles-Long Beach-Anaheim,Los Angeles County,154000.0,...,579954.0,582101.0,587633.0,595565.0,604113.0,614690.0,626421.0,641772.0,653719.0,


Then I filtered for the counties in question.

In [14]:
ZHVI_neighborhood_Napa_Sonoma = ZHVI_neighborhood[(ZHVI_neighborhood['State']=='CA')& ((ZHVI_neighborhood['CountyName']=='Sonoma County') |(ZHVI_neighborhood['CountyName']=='Napa County'))]
ZHVI_neighborhood_Napa_Sonoma

Unnamed: 0,RegionID,SizeRank,RegionName,RegionType,StateName,State,City,Metro,CountyName,2000-01-31,...,2020-12-31,2021-01-31,2021-02-28,2021-03-31,2021-04-30,2021-05-31,2021-06-30,2021-07-31,2021-08-31,geometry
2291,396534,2353,Larkfield-Wikiup,Neighborhood,CA,CA,Santa Rosa,Santa Rosa,Sonoma County,279813.0,...,713835.0,715399.0,716432.0,722818.0,731259.0,739457.0,747146.0,758879.0,773850.0,
2520,763792,2589,Pueblo Park,Neighborhood,CA,CA,Napa,Napa,Napa County,199007.0,...,638433.0,649564.0,657752.0,658621.0,661870.0,671561.0,695018.0,715431.0,729420.0,
2704,223728,2777,Westwood,Neighborhood,CA,CA,Napa,Napa,Napa County,167241.0,...,548105.0,555968.0,562252.0,559633.0,563969.0,571257.0,592974.0,613312.0,627593.0,
3578,763787,3662,Shurtleff,Neighborhood,CA,CA,Napa,Napa,Napa County,192909.0,...,611400.0,622115.0,627946.0,626955.0,627022.0,634876.0,652703.0,671133.0,679919.0,
3939,136125,4032,Central,Neighborhood,CA,CA,Napa,Napa,Napa County,188945.0,...,702848.0,720191.0,730554.0,736636.0,741844.0,748379.0,762820.0,779692.0,791484.0,
4544,261871,4655,Mcpherson,Neighborhood,CA,CA,Napa,Napa,Napa County,170880.0,...,621506.0,633620.0,639725.0,639327.0,642406.0,652481.0,675026.0,696702.0,711041.0,
4815,763782,4925,Fuller Park,Neighborhood,CA,CA,Napa,Napa,Napa County,195557.0,...,740427.0,755602.0,765463.0,767356.0,770284.0,778726.0,794240.0,816441.0,828589.0,
5034,763793,5148,Bel Aire,Neighborhood,CA,CA,Napa,Napa,Napa County,215269.0,...,635868.0,646849.0,655979.0,653544.0,653662.0,657702.0,678736.0,699149.0,714466.0,
5094,763798,5208,Vineyard Estates,Neighborhood,CA,CA,Napa,Napa,Napa County,292779.0,...,795914.0,809727.0,822053.0,825496.0,831460.0,841522.0,866011.0,891261.0,913592.0,
5859,763797,5992,Linda Vista,Neighborhood,CA,CA,Napa,Napa,Napa County,264740.0,...,747974.0,760750.0,771036.0,771906.0,775333.0,780682.0,804516.0,825702.0,843179.0,


In [16]:
# reset the index
ZHVI_neighborhood_Napa_Sonoma.reset_index()

# check datatypes
ZHVI_neighborhood_Napa_Sonoma.dtypes

RegionID        object
SizeRank        object
RegionName      object
RegionType      object
StateName       object
                ...   
2021-05-31      object
2021-06-30      object
2021-07-31      object
2021-08-31      object
geometry      geometry
Length: 270, dtype: object

I noticed that most of the entries that are supposed to have numerical data are currently objects and set out to try to change them.

In [29]:
# all the data starts at column indexed 9. 
ZHVI_neighborhood_Napa_Sonoma_data = ZHVI_neighborhood_Napa_Sonoma.iloc[:,9:]
# for rows 9 to end, convert the data type to a float. I accessed these with iloc.

# Get the second dimension of the data array
print(ZHVI_neighborhood_Napa_Sonoma_data.shape[1])

for x in range(ZHVI_neighborhood_Napa_Sonoma_data.shape[1]):
    ZHVI_neighborhood_Napa_Sonoma_data.iloc[:,x] = ZHVI_neighborhood_Napa_Sonoma_data.iloc[:,x].astype(float)
ZHVI_neighborhood_Napa_Sonoma_data.dtypes

261


2000-01-31    float64
2000-02-29    float64
2000-03-31    float64
2000-04-30    float64
2000-05-31    float64
               ...   
2021-05-31    float64
2021-06-30    float64
2021-07-31    float64
2021-08-31    float64
geometry      float64
Length: 261, dtype: object

There! The datatypes are converted.