# Data Preparation and Model Training

This notebook demonstrates the workflow for preparing the house price dataset, training a regression model, making predictions, and analyzing results.

## 1. Import Required Libraries and Modules

We start by importing necessary libraries and modules, and setting up the Python path to allow imports from the project directory.

In [None]:
import importlib
import sys
import os
# Add the parent directory to sys.path to allow imports from my_project
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..')))

import my_project.dataset as ds
importlib.reload(ds)


## 2. Initialize and Prepare Data

We initialize the data module and process the raw dataset to prepare it for training.

In [None]:
# Initialize data module and process data
dm = ds.HousePricingDataModule(data_dir="../data/raw/house_price_regression_dataset.csv")
dm.prepare_data()

## 3. Train the Model

We train the house price regression model using the processed data. Training parameters such as batch size, learning rate, and number of epochs are specified.

In [None]:
from my_project.modeling import train, predict
import importlib
import argparse

sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..')))

# Simulate command-line arguments for train.main()
args = argparse.Namespace(
    batch_size=64,
    num_workers=4,
    lr=1e-3,
    weight_decay=0.0,
    epochs=20,
)

# Start training
train.main(args)

## 4. Make Predictions

After training, we use the trained model to make predictions on the test dataset. The predictions are saved to a CSV file for further analysis.

In [None]:
# Run prediction
output_csv = predict.run_predict(
    data_dir="data/processed",
    models_dir="models",
    output_path="models/test_predictions.csv",
    device="auto",
    target_col="House_Price",
)


## 5. Analyze Results

Finally, we load the predictions and display the first few rows to inspect the results.

In [None]:
import pandas as pd
df_preds = pd.read_csv("../models/test_predictions.csv")
df_preds.head()