# Support Vector Regression (SVR) Model

This notebook demonstrates the use of Support Vector Regression (SVR) for predicting California housing prices.  
SVR is a kernel-based method that works well for capturing non-linear relationships while being robust to outliers.  

Key strengths of SVR:
- Handles **non-linear patterns** in data effectively (with RBF kernel).  
- Robust to **outliers**, since only support vectors influence the model.  
- Provides a balance between **model complexity** (controlled by `C`) and **error tolerance** (controlled by `epsilon`).  

While SVR can be computationally expensive on large datasets, it provides a valuable comparison point to tree-based models (Decision Tree, Random Forest) by approaching regression from a completely different perspective.


# Data Loading

We load the processed training dataset (24 features + target) from the `/data/train` directory.  
Since the data is already preprocessed, we can directly use it for training the SVR model.


In [23]:
#importing
import pandas as pd
import numpy as np
from pathlib import Path
from sklearn.svm import SVR
from sklearn.model_selection import cross_val_score, GridSearchCV
from sklearn.metrics import mean_squared_error
import joblib

# Paths
PROJECT_DIR = Path("/Users/sukainaalkhalidy/Desktop/CMSE492/ca_housing_project")
TRAIN_PROCESSED_FP = PROJECT_DIR / "data" / "train" / "housing_train_processed.csv"
MODEL_FP = PROJECT_DIR / "models" / "svr_model.pkl"

# Load processed dataset
housing = pd.read_csv(TRAIN_PROCESSED_FP)
print("Processed train shape:", housing.shape)

X = housing.drop("median_house_value", axis=1)
y = housing["median_house_value"]

# Handle missing values
X = X.fillna(X.median(numeric_only=True))


Processed train shape: (16512, 24)


# Model Fitting

We initialize and train a Support Vector Regression (SVR) model with default hyperparameters.  
The training RMSE is reported as a baseline measure.


In [24]:
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error
import numpy as np

# Initialize the SVR model
# Using the Radial Basis Function (RBF) kernel to capture non-linear relationships
svr_reg = SVR(kernel="rbf")

# Fit the model on the processed training data
svr_reg.fit(X, y)

# Generate predictions on the training set
predictions = svr_reg.predict(X)

# Evaluate performance using RMSE (Root Mean Squared Error)
mse = mean_squared_error(y, predictions)
rmse = np.sqrt(mse)

print("Training RMSE:", rmse)


Training RMSE: 117845.90056517681


# Cross-Validation

We evaluate the SVR model using 3-fold cross-validation.  
(SVR is computationally expensive, so we reduce folds for runtime.)


In [19]:
from sklearn.model_selection import cross_val_score
import numpy as np

# Use the plain SVR model you trained earlier
scores = cross_val_score(svr_reg, X, y,
                         scoring="neg_mean_squared_error",
                         cv=3,
                         n_jobs=1)

rmse_scores = np.sqrt(-scores)
print("RMSE scores:", rmse_scores)
print("Mean RMSE:", rmse_scores.mean())
print("Std deviation:", rmse_scores.std())


RMSE scores: [119264.7294137  118900.17351597 115424.15386263]
Mean RMSE: 117863.01893076643
Std deviation: 1730.9481729371757


# Hyperparameter Tuning

We perform hyperparameter tuning on a reduced search space  
to keep runtime manageable. GridSearchCV is used to search over `C` and `gamma`.


In [20]:
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVR

# Define small parameter grid
param_grid = {
    "C": [1, 10],
    "epsilon": [0.1, 0.2],
    "gamma": ["scale", 0.1],
    "kernel": ["rbf"]
}

# Initialize GridSearchCV
grid = GridSearchCV(
    SVR(),
    param_grid,
    cv=2,   # fewer folds for speed
    scoring="neg_mean_squared_error",
    n_jobs=1
)

# Use a subset (20% of data) for faster runtime
X_small = X.sample(frac=0.2, random_state=42)
y_small = y.loc[X_small.index]

grid.fit(X_small, y_small)

print("Best parameters:", grid.best_params_)

# Extract best model
best_svr = grid.best_estimator_


Best parameters: {'C': 10, 'epsilon': 0.1, 'gamma': 'scale', 'kernel': 'rbf'}


# Model Saving

We save the tuned SVR model to the `/models` directory  
so it can be reused later without retraining.


In [22]:
import joblib
from pathlib import Path

# Define absolute project directory (adjust if needed)
PROJECT_DIR = Path("/Users/sukainaalkhalidy/Desktop/CMSE492/ca_housing_project")
MODELS_DIR = PROJECT_DIR / "models"

# Ensure the /models directory exists
MODELS_DIR.mkdir(parents=True, exist_ok=True)

MODEL_FP = MODELS_DIR / "svr_model.pkl"

# Save tuned model if available, else fallback to default
model_to_save = best_svr if "best_svr" in globals() else svr_reg
joblib.dump(model_to_save, MODEL_FP)

print(f"Model saved to {MODEL_FP}")


Model saved to /Users/sukainaalkhalidy/Desktop/CMSE492/ca_housing_project/models/svr_model.pkl
