# **DTSA 5304 Fundamentals of Visualization Final Project**  
Jake McConnell  
12/8/25

### Introduction  

Recently I listened to a podcast that had Rory Sutherland, a British advertising executive, as the guest. In the podcast, he claimed that the reason Texas home prices were low comparative to places like California was due to higher property taxes in Texas. I wanted to determine if this statement was credible or not.  
  
More clearly, if property taxes increase does the value of a house decrease or is there a better predictor for determing the value of a house?
  
For this analysis, I will be using data that I compiled from multiple sources online. The references for these data sources will be listed in the reference section. This data is from 2023 and will be loaded in using Pandas library and the visualization will be performed using the Altair library. In order complete this analysis I will need to clean and combine all my data into 1 dataframe. From there I can begin exploring the data through Altair plots. Let's get started!

In [45]:
import pandas as pd
import altair as alt

In [46]:
data = pd.read_csv(r'C:\Users\jrmba\Documents\College\Master of Data Science CU Bolder\Fundamentals of Data Visualization\State Data.csv')
data.head()

Unnamed: 0,State,Average Income 2023 ($),Region,Division,Total Area (sq mi),Land Area (sq mi),Water Area (sq mi),Population 2023,GDP 2023 ($Mil)
0,Alabama,60660,South,East South Central,52420,50645,1775,5117673,304936
1,Alaska,98190,West,Pacific,665384,570641,94743,736510,68056
2,Arizona,82660,West,Mountain,113990,113594,396,7473027,522767
3,Arkansas,63250,South,West South Central,53179,52035,1143,3069463,178606
4,California,89870,West,Pacific,163695,155779,7916,39198693,3870379


In [47]:
data1 = pd.read_csv(r'C:\Users\jrmba\Documents\College\Master of Data Science CU Bolder\Fundamentals of Data Visualization\Property Taxes by State and County, 2025  Tax Foundation Maps.csv')
data1.head()

Unnamed: 0,State,County,"Median Housing Value, 2023 ($)","Median Property Taxes Paid, 2023 ($)(5-Year Estimate)",Effective Property Tax Rate (2023)
0,Alabama,Autauga County,197900.0,564,0.2850%
1,Alabama,Baldwin County,287000.0,881,0.3070%
2,Alabama,Barbour County,109900.0,415,0.3776%
3,Alabama,Bibb County,132600.0,271,0.2044%
4,Alabama,Blount County,169700.0,508,0.2994%


### Date Cleaning

We can see that *data1* has information for each county while *data* only has state data. To make these two sets compatible I will average, by state, the columns: Median Housing Value, 2023 , Median Property Taxes Paid, 2023, and Effective Property Tax Rate (2023). 

In [48]:
data1 = data1.drop(columns=['County'])
data1 = data1.dropna()
data1['Effective Property Tax Rate (2023)'] = data1['Effective Property Tax Rate (2023)'].str.replace("%","",regex=False)
data1['Median Property Taxes Paid, 2023 ($)(5-Year Estimate)'] = data1['Median Property Taxes Paid, 2023 ($)(5-Year Estimate)'].str.replace(",","",regex=False)
data1[['Median Housing Value, 2023 ($)','Median Property Taxes Paid, 2023 ($)(5-Year Estimate)','Effective Property Tax Rate (2023)']] = data1[['Median Housing Value, 2023 ($)','Median Property Taxes Paid, 2023 ($)(5-Year Estimate)','Effective Property Tax Rate (2023)']].astype('float64')
data1 = data1.groupby('State').mean()
data1[['Median Housing Value, 2023 ($)','Median Property Taxes Paid, 2023 ($)(5-Year Estimate)']] = data1[['Median Housing Value, 2023 ($)','Median Property Taxes Paid, 2023 ($)(5-Year Estimate)']].round(2)
#data1.head()

Now that our *data1* data frame is averaged, we can combine both *data* and *data1*.

In [None]:
State_df = pd.merge(data, data1, on='State')
#State_df.head()

Unnamed: 0,State,Average Income 2023 ($),Region,Division,Total Area (sq mi),Land Area (sq mi),Water Area (sq mi),Population 2023,GDP 2023 ($Mil),"Median Housing Value, 2023 ($)","Median Property Taxes Paid, 2023 ($)(5-Year Estimate)",Effective Property Tax Rate (2023)
0,Alabama,60660,South,East South Central,52420,50645,1775,5117673,304936,149701.49,511.49,0.339172
1,Alaska,98190,West,Pacific,665384,570641,94743,736510,68056,283580.77,2187.42,0.771877
2,Arizona,82660,West,Mountain,113990,113594,396,7473027,522767,243360.0,1349.13,0.58066
3,Arkansas,63250,South,West South Central,53179,52035,1143,3069463,178606,131966.67,705.24,0.531793
4,California,89870,West,Pacific,163695,155779,7916,39198693,3870379,569022.41,4045.48,0.713778


### Data Exploration

Perfect! Let's start our exploration and determine if there is a correlation between housing value and property tax rate by graphing the data.

In [73]:
Tax_plot = alt.Chart(State_df).mark_circle().encode(
    x = "Effective Property Tax Rate (2023)",
    y = "Median Housing Value, 2023 ($)",
    color = "Region",
    tooltip=['State','Effective Property Tax Rate (2023)','Median Housing Value, 2023 ($)']
).properties(
    width=500,
    title="Effective Property Tax Rate vs Median Housing Value"
)
Tax_plot

There is no clear trend indicating that as property tax rates increase the value of a house decreases. Let's compare other factors now.

In [75]:
Pop_plot = alt.Chart(State_df).mark_circle().encode(
    x = "Population 2023",
    y = "Median Housing Value, 2023 ($)",
    color = "Region",
    tooltip=['State','Population 2023','Median Housing Value, 2023 ($)']
).properties(
    width=500,
    title="Population vs Median Housing Value"
)

GDP_plot = alt.Chart(State_df).mark_circle().encode(
    x = "GDP 2023 ($Mil)",
    y = "Median Housing Value, 2023 ($)",
    color = "Region",
    tooltip=['State','GDP 2023 ($Mil)','Median Housing Value, 2023 ($)']
).properties(
    width=500,
    title="GDP vs Median Housing Value"
)

Inc_plot = alt.Chart(State_df).mark_circle().encode(
    x = "Average Income 2023 ($)",
    y = "Median Housing Value, 2023 ($)",
    color = "Region",
    tooltip=['State','Median Housing Value, 2023 ($)','Average Income 2023 ($)']
).properties(
    width=500,
    title="Average Income vs Median Housing Value"
)

Tax_plot & Pop_plot | GDP_plot & Inc_plot

One graph stands out from the others. The first 3 are rather ambiguous and there are no clear trends. However, the Average Income vs Median Housing Value graph has a clear trend that increases right and up showing that as the average household income increases, the median housing value increases.

### Discussion  

This project is intended for real estate professionals and investors. Unfortunately, I don't have an expert to consult with so instead I chose family. My mother, father, and brother evaluated my design. Although they have limited knowledge of visualization design, they were able to follow my design process and deduce from the graphs above that the average household income had the most significant impact on the cost of a of a house.  

In future iterations, I would like to collect the average sq ft of houses in each state. It goes without saying that the price of a house will increase based on the size of the house. However,  perhaps the property taxes have an affect on the price per sq foot. Maybe there is a relationship that as property tax rates increase the price per sq ft drops leading to a greater affordability of larger houses in states with higher property taxes? Another change could be to make my plots one color instead of being colored based on the region. I left the color as is to add more information, but it may be unnecessary.

### Conclusion  

From the limited data collected, Rory Sutherland's assertion that higher property taxes reduces the cost of houses proves incorrect. However, as mentioned earlier, there may be other data that I could collect and find relationships with property taxes that indirectly affect the affordability of houses. Further analysis should be conducted to find if there is an indirect route that property taxes can affect the cost of a house.  

# References  
1. https://taxfoundation.org/data/all/state/property-taxes-by-state-county/  
2. https://fred.stlouisfed.org/release/tables?eid=259515&rid=249  
3. https://en.wikipedia.org/wiki/List_of_regions_of_the_United_States  
4. https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_area  
5. https://www.census.gov/data/tables/time-series/demo/popest/2020s-state-total.html  
6. https://www.bea.gov/data/gdp/gdp-state