## Nationwide Home Values
#### Abby Hildebrandt Summer 2023

Prompt: 

I want to showcase some of my python skills for my portfolio by making a robust data analysis jupyter notebook using python. I am equipped with a county dataset, which contains zip and fips codes for every county in the United States as well as the geometry of the county. I also have a kaggle dataset of zillow housing price data that shows median rental of different bedroom number units, zillow home value index, and various other housing data, all separated by city name. provide me with 10 prompts of ways I could utilize python in different data visualizations and data science methodologies using this data.

ChatGPT Response: 
1. Geospatial Analysis of Housing Prices: Create choropleth maps using the county geometry data and overlay the Zillow housing price data to visualize the median rental or home value index across different counties in the United States.

2. Trends in Housing Prices over Time: Generate time series plots to visualize the trends in median rental or home value index for specific cities or counties over the years.

3. Correlation Analysis: Perform correlation analysis between housing prices and other variables in the Zillow dataset (e.g., population, unemployment rate) to identify potential relationships.

4. Top Performing Cities/Counties: Identify the top 10 cities or counties with the highest and lowest housing prices (rental or home value index) and create bar plots to compare them.

5. Housing Price Distribution: Create histograms or box plots to visualize the distribution of housing prices (rental or home value index) for different cities or counties.

6. Heatmaps of Housing Price Clusters: Use clustering algorithms (e.g., K-means) to group similar cities or counties based on housing prices and create heatmaps to visualize the clustering results.

7. Interactive Data Dashboard: Develop an interactive dashboard using libraries like Dash or Panel to allow users to explore and filter the housing price data based on various criteria (e.g., city, county, bedroom number, etc.).

8. Time Series Forecasting: Use time series forecasting models (e.g., ARIMA, Prophet) to predict future housing prices for specific cities or counties and visualize the forecasts.

9. Spatial Autocorrelation Analysis: Apply spatial autocorrelation techniques (e.g., Moran's I) to detect spatial patterns in housing prices and visualize the results on a map.

10. Comparative Analysis of Cities/Counties: Compare the housing prices of different cities or counties using side-by-side bar plots or box plots to identify any significant differences.

Important Links: 

https://www.zillow.com/research/data/

### Import Libraries

In [42]:
import pandas as pd
import geopandas as gpd
import psycopg2 as psy
import geotable
from kml2geojson import convert
import itertools
import json
from census import Census
from us import states
import plotly.express as px

#### Read in Zip Code Data

Source: https://www.kaggle.com/datasets/danofer/zipcodes-county-fips-crosswalk?resource=download

In [62]:
b = pd.read_csv("data/ZIP-COUNTY-FIPS_2017-06.csv", dtype=str)
b = b.rename({"ZIP":"zip", "STCOUNTYFP":"fips"}, axis='columns')
b

Unnamed: 0,zip,COUNTYNAME,STATE,fips,CLASSFP
0,36003,Autauga County,AL,01001,H1
1,36006,Autauga County,AL,01001,H1
2,36067,Autauga County,AL,01001,H1
3,36066,Autauga County,AL,01001,H1
4,36703,Autauga County,AL,01001,H1
...,...,...,...,...,...
52884,00850,St. Croix Island,VI,78010,H4
52885,00840,St. Croix Island,VI,78010,H4
52886,00820,St. Croix Island,VI,78010,H4
52887,00830,St. John Island,VI,78020,H4


#### Read in geometry data

In [63]:
c = gpd.read_file("data/usacounties.geojson", dtype=str)
c = c.rename({"FIPS":"fips"}, axis='columns')
c


Unnamed: 0,NAME,STATE_NAME,STATE_FIPS,CNTY_FIPS,fips,geometry
0,Lake of the Woods,Minnesota,27,077,27077,"MULTIPOLYGON (((-95.34283 48.54668, -95.34105 ..."
1,Ferry,Washington,53,019,53019,"MULTIPOLYGON (((-118.85163 47.94956, -118.8484..."
2,Stevens,Washington,53,065,53065,"MULTIPOLYGON (((-117.43883 48.04412, -117.5421..."
3,Okanogan,Washington,53,047,53047,"MULTIPOLYGON (((-118.97209 47.93915, -118.9740..."
4,Pend Oreille,Washington,53,051,53051,"MULTIPOLYGON (((-117.43858 48.99992, -117.0320..."
...,...,...,...,...,...,...
3136,Skagway-Hoonah-Angoon,Alaska,02,232,02232,"MULTIPOLYGON (((-137.80952 58.71648, -137.4674..."
3137,Yukon-Koyukuk,Alaska,02,290,02290,"MULTIPOLYGON (((-161.04770 62.20469, -160.9942..."
3138,Southeast Fairbanks,Alaska,02,240,02240,"MULTIPOLYGON (((-146.96382 63.46070, -146.9573..."
3139,Denali,Alaska,02,068,02068,"MULTIPOLYGON (((-152.98947 62.74900, -152.4877..."


---

#### Merge Zip, FIPS, & Geometry 

---

In [65]:
# Group DataFrame A by 'fips' and create a list of zip codes for each group
zip_codes = b.groupby('fips')['zip'].apply(list)

# Merge the zip_codes Series with DataFrame B on 'fips'
c = c.merge(zip_codes, on='fips')

# Rename the new column to 'zip_codes'
c = c.rename(columns={'zip_code': 'zip_codes'})
c

Unnamed: 0,NAME,STATE_NAME,STATE_FIPS,CNTY_FIPS,fips,geometry,zip
0,Lake of the Woods,Minnesota,27,077,27077,"MULTIPOLYGON (((-95.34283 48.54668, -95.34105 ...","[56686, 56711, 56623, 56741, 56673]"
1,Ferry,Washington,53,019,53019,"MULTIPOLYGON (((-118.85163 47.94956, -118.8484...","[99138, 99140, 99146, 99107, 99118, 99160, 991..."
2,Stevens,Washington,53,065,53065,"MULTIPOLYGON (((-117.43883 48.04412, -117.5421...","[99013, 99148, 99026, 99126, 99040, 99006, 991..."
3,Okanogan,Washington,53,047,53047,"MULTIPOLYGON (((-118.97209 47.93915, -118.9740...","[98846, 98829, 99116, 98813, 98833, 98819, 988..."
4,Pend Oreille,Washington,53,051,53051,"MULTIPOLYGON (((-117.43858 48.99992, -117.0320...","[99139, 99180, 99152, 99006, 99119, 99156, 990..."
...,...,...,...,...,...,...,...
3131,Yakutat,Alaska,02,282,02282,"MULTIPOLYGON (((-142.19485 60.02709, -142.1832...",[99689]
3132,Yukon-Koyukuk,Alaska,02,290,02290,"MULTIPOLYGON (((-161.04770 62.20469, -160.9942...","[99757, 99758, 99733, 99740, 99768, 99724, 996..."
3133,Southeast Fairbanks,Alaska,02,240,02240,"MULTIPOLYGON (((-146.96382 63.46070, -146.9573...","[99764, 99738, 99776, 99731, 99780, 99732, 99737]"
3134,Denali,Alaska,02,068,02068,"MULTIPOLYGON (((-152.98947 62.74900, -152.4877...","[99729, 99744, 99743, 99755]"


Rental Costs Time Series by State

In [43]:
a = pd.read_csv("data/City_MedianRentalPrice_AllHomes.csv")
a_grouped = a.groupby('State').mean()
a_grouped = a_grouped.drop(columns = ['SizeRank', 'Unnamed: 0'])
a_grouped


The default value of numeric_only in DataFrameGroupBy.mean is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.



Unnamed: 0_level_0,2010-01,2010-02,2010-03,2010-04,2010-05,2010-06,2010-07,2010-08,2010-09,2010-10,...,2019-03,2019-04,2019-05,2019-06,2019-07,2019-08,2019-09,2019-10,2019-11,2019-12
State,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AK,,,,,,,,,,,...,1648.75,1642.5,1598.75,1568.75,1586.25,1618.75,1603.125,1587.5,1617.5,1598.75
AL,,,,862.5,850.0,862.5,925.0,995.0,831.0,804.5,...,1145.73913,1174.782609,1163.23913,1148.913043,1156.086957,1150.630435,1171.826087,1173.695652,1166.695652,1163.021739
AR,,,,,,,,,,,...,1114.84375,1129.375,1130.15625,1122.1875,1115.15625,1115.78125,1113.28125,1111.25,1110.15625,1111.71875
AZ,,1300.0,1165.9,1143.125,1138.461538,1123.423077,1114.192308,1132.615385,1076.3,1069.533333,...,1531.859375,1552.0,1553.484375,1562.859375,1575.359375,1582.140625,1594.765625,1565.546875,1556.609375,1551.890625
CA,,2282.5,2249.0,2340.277778,2382.777778,2444.444444,2533.055556,2767.272727,2505.833333,2397.666667,...,2942.944664,2961.187747,2976.782609,2989.721344,2997.79249,2989.013834,2963.454545,2951.086957,2950.677866,2960.675889
CO,,,,,,,,,,1100.0,...,1986.722222,2011.111111,2019.194444,2009.305556,2005.652778,2011.652778,2011.166667,1970.694444,1988.222222,1998.013889
CT,,,,,,2108.5,2425.0,1775.0,1657.0,1555.0,...,1927.616279,1975.906977,2009.872093,2019.05814,1992.034884,1976.360465,1995.453488,1968.872093,1967.186047,1971.616279
DC,,2275.0,1972.5,1999.0,2000.0,2125.0,2300.0,2200.0,2180.0,2200.0,...,2600.0,2650.0,2700.0,2750.0,2730.0,2700.0,2700.0,2700.0,2700.0,2730.0
DE,,,,,,,,,,,...,1285.625,1300.0,1346.875,1337.5,1367.5,1380.625,1371.875,1361.125,1323.75,1325.0
FL,895.0,1216.916667,1204.970588,1232.35,1226.0,1229.25,1282.125,1420.978261,1432.724138,1373.782051,...,1910.64978,1916.314978,1909.936123,1912.909692,1916.4163,1931.352423,1932.535242,1926.193833,1925.30837,1927.284141


In [49]:
a_grouped.columns = pd.to_datetime(a_grouped.columns)
