##__Analyze the housing dataset with a focus on handling date and categorical data to gain insights into house sales over time and the influence of house characteristics on its price. 
Given : housing_data.csv

## __Steps to Perform:__
- Convert the __YearBuilt__ and __YearRemodAdd__ columns to datetime format (if not converted)
- Extract useful components from the date like the year, month, or day
- Calculate the time difference between the year the house was built and the year it was remodeled
- Perform necessary arithmetic operations
- Count the number of occurrences of each category in categorical features
- Create dummy variables for categorical variables

In [1]:
import pandas as pd

# Load the dataset
housing_data = pd.read_csv('housing_data.csv')


In [2]:

# Convert YearBuilt and YearRemodAdd to datetime format
housing_data['YearBuilt'] = pd.to_datetime(housing_data['YearBuilt'], format='%Y')
housing_data['YearRemodAdd'] = pd.to_datetime(housing_data['YearRemodAdd'], format='%Y')


In [3]:

# Extract useful components from date
housing_data['YearBuilt_Year'] = housing_data['YearBuilt'].dt.year
housing_data['YearRemodAdd_Year'] = housing_data['YearRemodAdd'].dt.year


In [4]:

# Calculate the time difference between YearBuilt and YearRemodAdd
housing_data['YearsSinceRemodel'] = housing_data['YearRemodAdd_Year'] - housing_data['YearBuilt_Year']


In [5]:

# Count the occurrences of each category in categorical features
categorical_cols = housing_data.select_dtypes(include=['object']).columns
category_counts = {col: housing_data[col].value_counts() for col in categorical_cols}


In [6]:

# Convert categorical variables into dummy variables
housing_data = pd.get_dummies(housing_data, columns=categorical_cols, drop_first=True)


In [7]:

# Display a preview of the processed data
print(housing_data.head())


   Unnamed: 0  LotFrontage  LotArea  OverallQual  OverallCond  YearBuilt  \
0           0         65.0     8450            7            5 2003-01-01   
1           1         80.0     9600            6            8 1976-01-01   
2           2         68.0    11250            7            5 2001-01-01   
3           3         60.0     9550            7            5 1915-01-01   
4           4         84.0    14260            8            5 2000-01-01   

  YearRemodAdd  MasVnrArea  BsmtFinSF1  BsmtFinSF2  ...  SaleType_ConLI  \
0   2003-01-01       196.0         706           0  ...           False   
1   1976-01-01         0.0         978           0  ...           False   
2   2002-01-01       162.0         486           0  ...           False   
3   1970-01-01         0.0         216           0  ...           False   
4   2000-01-01       350.0         655           0  ...           False   

   SaleType_ConLw  SaleType_New  SaleType_Oth  SaleType_WD  \
0           False         Fals

In [8]:

# Optionally, save the processed data to a CSV file for further analysis
housing_data.to_csv("processed_housing_data.csv", index=False)

