# Project 4: Predicting a Continuous Target with Regression (Titanic)

**Name:** Saratchandra Golla    
**Date:** November 15, 2025 

**Objective:** Build and evaluate various regression models (Linear, Ridge, Elastic Net, Polynomial) to predict the continuous variable fare using features from the Titanic dataset.

## Section 1: Import and Inspect the Data

### Imports

We import all necessary libraries at the top for professionalism, including those for data manipulation, visualization, model building, and evaluation.

In [8]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Ridge, ElasticNet
from sklearn.preprocessing import PolynomialFeatures, StandardScaler
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# Set the random state for reproducibility
RANDOM_STATE = 123

### Inspect the Data

We load the Titanic dataset and perform an initial inspection to understand its structure and check for missing values.

In [9]:
# Load Titanic dataset from seaborn
titanic = sns.load_dataset("titanic")

print("--- Data Head (First 5 Rows) ---")
print(titanic.head())

print("\n--- Data Information and Missing Values ---")
titanic.info()

--- Data Head (First 5 Rows) ---
   survived  pclass     sex   age  sibsp  parch     fare embarked  class  \
0         0       3    male  22.0      1      0   7.2500        S  Third   
1         1       1  female  38.0      1      0  71.2833        C  First   
2         1       3  female  26.0      0      0   7.9250        S  Third   
3         1       1  female  35.0      1      0  53.1000        S  First   
4         0       3    male  35.0      0      0   8.0500        S  Third   

     who  adult_male deck  embark_town alive  alone  
0    man        True  NaN  Southampton    no  False  
1  woman       False    C    Cherbourg   yes  False  
2  woman       False  NaN  Southampton   yes   True  
3  woman       False    C  Southampton   yes  False  
4    man        True  NaN  Southampton    no   True  

--- Data Information and Missing Values ---
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 15 columns):
 #   Column       Non-Null Count  Dt

## Section 2: Data Exploration and Preparation  
We prepare the data by handling missing values in age and fare, engineering a new feature (family_size), and converting categorical features needed for modeling.

In [10]:
# Impute missing values for 'age' using the median
titanic['age'] = titanic['age'].fillna(titanic['age'].median())
print(f"Missing 'age' values after imputation: {titanic['age'].isnull().sum()}")

# Drop rows with missing 'fare' (though typically few, we follow the instruction)
titanic.dropna(subset=['fare'], inplace=True)
print(f"Total rows after dropping missing 'fare' values: {len(titanic)}")

# Create 'family_size' numeric variable
titanic['family_size'] = titanic['sibsp'] + titanic['parch'] + 1

# Convert categorical features to numeric (for use in Case 4 later)
# We will use 'pclass' (Ordinal, already numeric-like) and 'sex' (Binary)
# Convert 'sex' to a binary numeric feature (0 for female, 1 for male)
titanic['sex_numeric'] = titanic['sex'].apply(lambda x: 1 if x == 'male' else 0)

print("\n--- Summary of Prepared Data ---")
print(titanic[['age', 'fare', 'family_size', 'sex_numeric']].describe())

Missing 'age' values after imputation: 0
Total rows after dropping missing 'fare' values: 891

--- Summary of Prepared Data ---
              age        fare  family_size  sex_numeric
count  891.000000  891.000000   891.000000   891.000000
mean    29.361582   32.204208     1.904602     0.647587
std     13.019697   49.693429     1.613459     0.477990
min      0.420000    0.000000     1.000000     0.000000
25%     22.000000    7.910400     1.000000     0.000000
50%     28.000000   14.454200     1.000000     1.000000
75%     35.000000   31.000000     2.000000     1.000000
max     80.000000  512.329200    11.000000     1.000000
