# Breast Cancer Dataset Analysis Using Ridge Regression

In this notebook, we perform a Ridge regression analysis on the Breast Cancer dataset. Ridge regression, also known as Tikhonov regularization, is used to address multicollinearity by adding a regularization term to the linear regression model.

## Steps Covered

1. **Importing Libraries**:
   - We import necessary libraries from `scikit-learn` for regression modeling, data scaling, and dataset loading.

2. **Loading the Dataset**:
   - We load the Breast Cancer dataset, which includes various features related to breast cancer tumors and the corresponding target labels (malignant or benign).

3. **Data Preprocessing**:
   - We standardize the features using `StandardScaler` to ensure that all features have a mean of 0 and a standard deviation of 1. This is crucial for the performance of Ridge regression.

4. **Fitting the Ridge Regression Model**:
   - We initialize and fit a Ridge regression model with a specific regularization parameter (`alpha = 0.5`) to the standardized features and target data.

5. **Evaluating Model Performance**:
   - We calculate the R² score to assess the model’s performance on the training data.

6. **Hyperparameter Tuning**:
   - We experiment with various `alpha` values to see how different levels of regularization impact the model’s performance. The `alpha` parameter controls the strength of the regularization.

## Analysis

The analysis involves comparing how different `alpha` values affect the R² score of the Ridge regression model. This helps in understanding the trade-off between model complexity and regularization.

By examining these metrics, we aim to identify the optimal regularization strength for our Ridge regression model to balance model fit and generalization.


In [1]:
from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_breast_cancer

In [2]:
dataset = load_breast_cancer()
X = dataset.data
y = dataset.target

In [3]:
scaler = StandardScaler()
X_Std = scaler.fit_transform(X)

In [4]:
regr = Ridge(alpha=0.5)
regr.fit(X_Std, y)


In [5]:
r2_score = regr.score(X_Std, y)
print("R^2 Score:", r2_score)


R^2 Score: 0.7730918442067221


In [6]:
for alpha in [0.1, 0.3, 0.4, 0.5, 0.8, 1.0]:
    model = Ridge(alpha = alpha)
    model.fit(X_Std, y)
    pred = model.predict(X_Std)
    accuracy = model.score(X_Std, y)
    print(f'Alpha::{alpha}, Accuracy::{accuracy}, Coefficients:: {model.coef_}')

Alpha::0.1, Accuracy::0.7740173735099757, Coefficients:: [ 2.96738185e-01 -1.81834771e-02 -1.99458069e-01 -3.33365545e-02
  2.27125762e-03  1.98436491e-01 -1.23100111e-01 -8.20503058e-02
 -6.82486354e-04  2.94796526e-03 -1.35303179e-01  5.62690008e-03
  3.95915610e-02  5.55373657e-02 -4.65851993e-02 -2.38920964e-03
  1.10087339e-01 -6.01918858e-02 -1.33340898e-02  1.87842759e-02
 -7.49914626e-01 -4.71895845e-02  2.03061394e-02  4.80864628e-01
 -1.64141515e-02 -1.36892998e-03 -7.84148383e-02 -3.46964756e-02
 -3.69641232e-02 -8.10277885e-02]
Alpha::0.3, Accuracy::0.7734809450841977, Coefficients:: [ 0.08796755 -0.01657565 -0.09004858  0.03830639  0.00389263  0.18965527
 -0.12313995 -0.08162552  0.00128693  0.00535487 -0.15701716  0.00725522
  0.03920486  0.07272301 -0.0460952  -0.00244932  0.10830285 -0.0553671
 -0.01228172  0.01920661 -0.58591606 -0.05037482 -0.01494035  0.38813092
 -0.01892336  0.00367975 -0.07735731 -0.04095457 -0.03993007 -0.08418761]
Alpha::0.4, Accuracy::0.77327350