## Trend Analysis and understanding the % variation of a property is crucial for both buyers and sellers. 
### In this notebook, I will try to understand the % fluctualtion Year over Year.

### Importing Libraries 

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
import warnings
warnings.filterwarnings('ignore')

###  Loading csv data

In [None]:
all_home_prices =pd.read_csv('/kaggle/input/zillow-home-value-index/ZHVI.csv')

In [None]:
all_home_prices.head()

In [None]:
all_home_prices.shape

In [None]:
all_home_prices.info()

Based on the data, we see that it's a cleaned data, no nulls.

### Replacing the column name from 'Unnamed: 0' to 'Duration'

In [None]:
all_home_prices.rename(columns={'Unnamed: 0': 'Duration'}, inplace=True)

In [None]:
all_home_prices.info()

In [None]:
# Convert 'Duration' to datetime format and add index for faster retrival
all_home_prices['Duration'] = pd.to_datetime(all_home_prices['Duration'], format='%Y-%m-%d')
all_home_prices.set_index('Duration', inplace=True)

In [None]:
all_home_prices.head()

## Explanation of formula used to calculate YoY

#### Given Data:

* feb_2020_price: Price in February 2020 (547607.8042)
* jan_2020_price: Price in January 2020 (545361.7393)

### Formula:

The formula for YoY percentage change is:
YoY Change = ((Current Month Price - Previous Month Price) / Previous Month Price) * 100


### Calculation:

Calculate the price difference:
Price Difference = February 2020 Price - January 2020 Price
Price Difference = 547607.8042 - 545361.7393
Price Difference = 2246.0649

Calculate the YoY percentage change:
YoY Change = (Price Difference / January 2020 Price) * 100
YoY Change = (2246.0649 / 545361.7393) * 100
YoY Change = 0.4118%

In [None]:
# Select the top 5 states with the highest average price appreciation 
top_states = all_home_prices.mean().sort_values(ascending=False).head(4).index.tolist()

# Check the top states
print(top_states) 

# Filter data for the last 5 years
start_date = all_home_prices.index[-60]  # Using monthly data for the last 5 years

print(start_date)

last_5_years_data = all_home_prices.loc[start_date:]

# Calculate year-over-year (YoY) percentage change for the last 5 years

df_pct_change_last_5 = last_5_years_data[top_states].pct_change() * 100
print(df_pct_change_last_5)

# Calculate year-over-year (YoY) percentage change for the last 5 years
df_pct_change_last_5 = last_5_years_data[top_states].pct_change() * 100

# Plot YoY price changes for the top 5 states for the last 10 years
plt.figure(figsize=(12, 6))
for state in top_states:
    plt.plot(df_pct_change_last_5[state], label=state)

# Add data labels to the plot
'''
for state in top_states:
    for x, y in zip(df_pct_change_last_5.index, df_pct_change_last_5[state]):
        plt.text(x, y, f'{y:.f}%', ha='center', va='bottom', fontsize=8)
'''


plt.title('Year-over-Year Price Change for Top 5 States (Last 5 Years)')
plt.xlabel('Date')
plt.ylabel('YoY Price Change (%)')
plt.legend()
plt.grid(True)
plt.show()


## **What is CAGR (Compound Annual Growth Rate)?**

* **Definition:** CAGR is a financial metric that represents the average annualized rate of return of an investment over a specific period. 
* **Key Features:**
    * **Smoothed Rate:** CAGR provides a smoothed rate of return, even if the actual growth of the investment wasn't consistent year-over-year.
    * **Compounding:** CAGR assumes that any profits earned during each period are reinvested, allowing for compounding growth.
    * **Comparison Tool:** CAGR is used to compare the performance of different investments over the same time period.

**CAGR** is a valuable tool for real estate analysis because it provides a standardized measure of long-term growth. By understanding the historical CAGR of different markets and properties, investors can make more informed decisions about where and how to allocate their capital.


In [None]:
# Calculate compound annual growth rate (CAGR)
def calculate_cagr(start_value, end_value, years):
    return (end_value / start_value) ** (1 / years) - 1

start_year = all_home_prices.index[0].year
end_year = all_home_prices.index[-1].year
years = end_year - start_year

cagr = all_home_prices[top_states].iloc[-1] / all_home_prices[top_states].iloc[0]
cagr = cagr ** (1 / years) - 1


# Visualize CAGR with a bar plot
plt.figure(figsize=(8, 6))
sns.barplot(x=top_states, y=cagr)

# Add data labels to the bars
for i, v in enumerate(cagr):
    plt.text(i, v, f'{v:.4f}%', ha='center', va='bottom', fontsize=10)

plt.title('Compound Annual Growth Rate (CAGR) for Top 5 States')
plt.xlabel('State')
plt.ylabel('CAGR')
plt.show()

## **Interpretation of the CAGR Values:**

The CAGR values represent the estimated average annual growth rate for housing prices in each of the listed states over the 5 year period in the analysis. 

* **Hawaii: 0.063296**
    - This translates to a CAGR of approximately 6.33%. 
    - It suggests that, on average, housing prices in Hawaii have increased by about 6.33% per year.

* **California: 0.061061**
    - This translates to a CAGR of approximately 6.11%.
    - It suggests that, on average, housing prices in California have increased by about 6.11% per year.

* **The District of Columbia: 0.056205**
    - This translates to a CAGR of approximately 5.62%.
    - It suggests that, on average, housing prices in the District of Columbia have increased by about 5.62% per year.

* **Massachusetts: 0.050402**
    - This translates to a CAGR of approximately 5.04%.
    - It suggests that, on average, housing prices in Massachusetts have increased by about 5.04% per year.

**Key Takeaways:**

* **Hawaii** shows the highest CAGR, indicating the strongest historical growth in housing prices among these states.
* **California** has the second-highest CAGR, suggesting a significant historical appreciation in housing values.
* **The District of Columbia** and **Massachusetts** also demonstrate strong historical growth, with CAGRs above 5%.


In [None]:
!pip install -q ipywidgets
print("Installation Complete")

## Calculate the percentage change and renders trend based on selected state and year (till current year)

In [None]:
# Calculates the year-over-year percentage change for each state in the DataFrame.


def calculate_year_over_year_change(df):
    # Create a copy of the DataFrame to avoid modifying the original
    df_copy = df.copy()

    # Shift the DataFrame by one year
    df_shifted = df_copy.shift(periods=12) 

    # Calculate year-over-year change
    year_over_year_change = ((df_copy - df_shifted) / df_shifted) * 100

    # Remove the first row (since it has no previous year data for comparison)
    year_over_year_change = year_over_year_change.iloc[12:] 

    # Handle potential NaN values (e.g., due to zero values in the denominator)
    year_over_year_change = year_over_year_change.fillna(0) 

    return year_over_year_change

# Select all states
all_states = all_home_prices.columns[1:]  #  first column is 'Duration'; opmiting it

# Filter data for the last 15 years (assuming monthly data)
start_date = all_home_prices.index[-180]  # Select last 180 months (15 years)
last_15_years_data = all_home_prices.loc[start_date:]


# Calculate YoY percentage change for the last 15 years
df_pct_change_last_15 = calculate_year_over_year_change(last_15_years_data) 


# Create a dropdown for state selection
state_dropdown = widgets.Dropdown(
    options=sorted(all_states),
    description='Select State:',
    disabled=False,
)

# Create a dropdown for start year selection (limited to the last 5 years)
start_year_dropdown = widgets.Dropdown(
    options=last_15_years_data.index.year.unique(),  # Use unique years from last 5 years data
    description='Select Start Year:',
    disabled=False,
)

# Create an interactive output
output = widgets.Output()

def on_button_clicked(b):
    selected_state = state_dropdown.value
    start_year = start_year_dropdown.value

    # Filter data based on selected state and start year
    filtered_data = df_pct_change_last_15[selected_state][df_pct_change_last_15.index.year >= start_year]

    # Clear previous output
    #output.clear_output()

    with output:
        plt.figure(figsize=(12, 6))
        plt.plot(filtered_data, label=selected_state)

        # Add data labels to the line plot
        for x, y in zip(filtered_data.index, filtered_data):
            plt.text(x, y, f'{y:.2f}%', ha='center', va='bottom', fontsize=8)

        plt.title(f'Year-over-Year Price Change for {selected_state} (Since {start_year})')
        plt.xlabel('Date')
        plt.ylabel('YoY Price Change (%)')
        plt.legend()
        plt.grid(True)
        plt.show()

# Create a button
button = widgets.Button(description="Analyze")
button.on_click(on_button_clicked)

# Display the UI elements with spacing
display(widgets.Label('Select State:'), state_dropdown)
display(widgets.Label('Select Start Year:'), start_year_dropdown)
display(button, output)

