### Data analysis

##### Current Home Value

In [8]:
import pandas as pd
import numpy as np

# Read in the data
df = pd.read_csv('pearl_city_sales.csv')

# Convert 'Sale Date' to datetime and extract year
df['Sale Date'] = pd.to_datetime(df['Sale Date'])
df['Year'] = df['Sale Date'].dt.year

# Assign weights: most recent year gets highest weight
most_recent_year = df['Year'].max()
df['weight'] = df['Year'] - df['Year'].min() + 1

# Calculate weighted average sale price
weighted_avg_price = np.average(df['Sale Price'], weights=df['weight'])

print(f"Estimated current value of a typical home in Pearl City: ${weighted_avg_price:,.0f}")


Estimated current value of a typical home in Pearl City: $919,278


#### Best Time to Sell

In [12]:
def assign_season(month):
    if month in [12, 1, 2]:
        return 'Winter'
    elif month in [3, 4, 5]:
        return 'Spring'
    elif month in [6, 7, 8]:
        return 'Summer'
    else:
        return 'Fall'

def best_time_to_sell_by_season(df):
    """
    Determines the best season(s) to sell based on average sale price.
    Returns the season name(s) with the highest average sale price.
    """
    df['Sale Date'] = pd.to_datetime(df['Sale Date'])
    df['Month'] = df['Sale Date'].dt.month
    df['Season'] = df['Month'].apply(assign_season)
    seasonal_avg = df.groupby('Season')['Sale Price'].mean().round(0).astype(int)
    max_avg = seasonal_avg.max()
    best_seasons = seasonal_avg[seasonal_avg == max_avg].index.tolist()
    print("Average sale price by season:")
    for season, avg in seasonal_avg.sort_values(ascending=False).items():
        print(f"  {season}: ${avg:,.0f}")
    print(f"\nBest time to sell: {', '.join(best_seasons)} (highest average sale price)")

# Example usage:
df = pd.read_csv('pearl_city_sales.csv')
best_time_to_sell_by_season(df)

Average sale price by season:
  Summer: $952,480
  Fall: $941,502
  Spring: $921,252
  Winter: $867,560

Best time to sell: Summer (highest average sale price)


#### Best potential home improvements

In [14]:
df = pd.read_csv('pearl_city_sales.csv')

# Select relevant features for correlation analysis
features = ['Square Footage', 'Bedrooms', 'Bathrooms', 'Garage Spaces']
correlations = df[features + ['Sale Price']].corr()['Sale Price'].drop('Sale Price').sort_values(ascending=False)

print("Correlation of features with Sale Price:")
for feature, corr in correlations.items():
    print(f"  {feature}: {corr:.2f}")

print("\nFeatures with the highest positive correlation to sale price are likely to yield the best ROI if improved.")

Correlation of features with Sale Price:
  Bedrooms: 0.08
  Garage Spaces: 0.03
  Bathrooms: -0.13
  Square Footage: -0.20

Features with the highest positive correlation to sale price are likely to yield the best ROI if improved.


Based on the data, increasing the number of bedrooms or adding garage spaces may provide a modest positive impact on sale price. Improvements to bathrooms or expanding square footage are unlikely to yield a strong return on investment in this market.