# Austin Housing Market Analysis

## Visual 1: Property Value Distribution Over Time (Austin vs. U.S.)

### Purpose
This visualization shows how Austin's housing market has shifted toward higher-priced homes over the past decade compared to national trends. By highlighting Austin's disproportionate share of expensive homes, it directly supports our argument about affordability challenges.

### Design & Interactivity
- **Stacked bar chart** showing the proportion of homes within each price range by region (Austin vs. U.S.) for each year
- **Year slider** allows viewers to move through time (2015-2023)
- **Hover tool** reveals specific shares and numerical values
- Makes it easy to compare how each price tier changes over the years

### Colors, Legends, and Annotations
- **Sequential color palette** differentiates price bins:
  - Lower price categories: lighter blue tones
  - Middle tiers: greens and yellows
  - Higher tiers: deep orange and red hues
- **Legend**: "Home Price Ranges" appears to the right of the chart
- Clear visual gradient from affordable to expensive homes

### Static Preview
Below is a static snapshot showing the trend across three key years (2015, 2019, 2023):

![Property Value Distribution](images/property_value_visualization.png)

### Key Insights
Use the year slider in the interactive chart below to observe:
- How Austin's share of lower-priced homes (< $200,000) has declined significantly (from 38% in 2015 to 5.7% in 2023)
- The growing proportion of homes in higher price ranges ($400,000+)
- How Austin's distribution compares to the U.S. average over time

**Dramatic Change**: Between 2015 and 2023, Austin lost 32.3 percentage points of affordable housing while expensive homes ($500K+) increased by 38.0 percentage points.


In [None]:
import pandas as pd
import altair as alt

# Load the data
df = pd.read_csv("data files/austin_property_value.csv")

# Create consolidated price bins based on the requirement
def consolidate_price_bins(value_bucket_id):
    """Consolidate 26 price buckets into 7 major price ranges"""
    if value_bucket_id <= 12:  # Less than $100,000
        return "< $100,000"
    elif value_bucket_id <= 16:  # $100,000 to $199,999
        return "$100,000 - $199,999"
    elif value_bucket_id <= 18:  # $200,000 to $299,999
        return "$200,000 - $299,999"
    elif value_bucket_id <= 19:  # $300,000 to $399,999
        return "$300,000 - $399,999"
    elif value_bucket_id <= 20:  # $400,000 to $499,999
        return "$400,000 - $499,999"
    elif value_bucket_id <= 21:  # $500,000 to $749,999
        return "$500,000 - $749,999"
    else:  # $750,000 or more
        return "$750,000+"

# Apply consolidation
df['Price Range'] = df['Value Bucket ID'].apply(consolidate_price_bins)

# Define order for price ranges
price_order = [
    "< $100,000",
    "$100,000 - $199,999",
    "$200,000 - $299,999",
    "$300,000 - $399,999",
    "$400,000 - $499,999",
    "$500,000 - $749,999",
    "$750,000+"
]

# Create an order column for sorting
order_map = {price: i for i, price in enumerate(price_order)}
df['Price Range Order'] = df['Price Range'].map(order_map)

# Aggregate by the new price ranges
df_grouped = df.groupby(['Place', 'Year', 'Price Range', 'Price Range Order'], as_index=False).agg({
    'Property Value by Bucket': 'sum',
    'share': 'sum'
})

# Rename columns for clarity
df_grouped = df_grouped.rename(columns={
    'share': 'Share',
    'Property Value by Bucket': 'Count'
})

# Ensure all combinations exist (fill missing with 0)
places = df_grouped['Place'].unique()
years = df_grouped['Year'].unique()

from itertools import product
complete_index = pd.DataFrame(list(product(places, years, price_order)), 
                               columns=['Place', 'Year', 'Price Range'])
complete_index['Price Range Order'] = complete_index['Price Range'].map(order_map)
df_grouped = complete_index.merge(df_grouped, on=['Place', 'Year', 'Price Range', 'Price Range Order'], how='left')
df_grouped['Share'] = df_grouped['Share'].fillna(0)
df_grouped['Count'] = df_grouped['Count'].fillna(0)

# Create a color scale - sequential from light blue to deep red
# Lower price categories: lighter blues
# Middle tiers: greens and yellows
# Higher tiers: deep orange and red hues
color_scale = alt.Scale(
    domain=price_order,
    range=['#deebf7', '#9ecae1', '#4daf4a', '#ffff33', '#ff7f00', '#e31a1c', '#800026']
)

# Create base chart with year slider
base = alt.Chart(df_grouped).mark_bar().encode(
    x=alt.X('Place:N', title='Region', axis=alt.Axis(labelAngle=0)),
    y=alt.Y('Share:Q', title='Share of Homes', axis=alt.Axis(format='%'), stack='normalize'),
    color=alt.Color('Price Range:N', 
                    scale=color_scale, 
                    legend=alt.Legend(title='Home Price Ranges', orient='right'),
                    sort=price_order),
    order=alt.Order('Price Range Order:Q'),
    tooltip=[
        alt.Tooltip('Place:N', title='Region'),
        alt.Tooltip('Year:Q', title='Year'),
        alt.Tooltip('Price Range:N', title='Price Range'),
        alt.Tooltip('Share:Q', title='Share', format='.2%'),
        alt.Tooltip('Count:Q', title='Number of Homes', format=',')
    ]
).properties(
    width=600,
    height=400,
    title={
        "text": "Visual 1: Property Value Distribution Over Time (Austin vs. U.S.)",
        "subtitle": "Proportion of homes within each price range by region - Use slider to explore different years",
        "fontSize": 16,
        "anchor": "start"
    }
)

# Add year slider for interactivity
slider = alt.binding_range(min=int(df_grouped['Year'].min()), 
                           max=int(df_grouped['Year'].max()), 
                           step=1, 
                           name='Year: ')
select_year = alt.selection_point(name='Year', fields=['Year'], 
                                   bind=slider, value=[{'Year': 2015}])

chart = base.add_params(select_year).transform_filter(select_year)

# Display the chart
chart