# Programming Assignment 4 - Simple Linear vs. Ridge Regression


In the historical heart of Boston, Bob seeks to understand the intricacies of the real estate market. With a linear regression model at his side, Bob wonders if he can improve his predictions. Given your expertise in machine learning, he turns to you for guidance. Specifically, he wants to unravel the factors influencing the median value of homes across different Boston neighborhoods.

To assist Bob, you decide to:
*  Implement the closed-form solution for linear regression.
* Apply a polynomial transformation to increase model flexibility.
* Utilize ridge regression to control model complexity.
* Apply 10-fold cross-validation for more reliable performance estimates.


Bob is curious and wants to see a comparison between linear and ridge regression, both with and without polynomial transformations, on the same dataset. Thus, the challenge begins!

 Variables in order:
* CRIM:     per capita crime rate by town
*  ZN:       proportion of residential land zoned for lots over 25,000 sq.ft.
* INDUS:    proportion of non-retail business acres per town
* CHAS:     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
* NOX:      nitric oxides concentration (parts per 10 million)
* RM:       average number of rooms per dwelling
* AGE:      proportion of owner-occupied units built prior to 1940
* DIS:      weighted distances to five Boston employment centres
* RAD:      index of accessibility to radial highways
* TAX:      full-value property-tax rate per \$10,000
* PTRATIO:  pupil-teacher ratio by town
* B:        $1000(Bk - 0.63)^2$ where Bk is the proportion of blacks by town
* LSTAT:    \% lower status of the population
* MEDV:     Median value of owner-occupied homes in \$1000's

Note: The Boston Housing dataset, especially the 'B' variable, touches upon serious ethical and societal concerns related to race and inequality. Reflect upon these issues, and consider strategies such as excluding the 'B' column from analyses.

With this context, let's assist Bob in his real estate endeavors!


## 1 Setup and Data Preparation
Import Libraries



In [162]:
import numpy as np  # Fundamental package for linear algebra and multidimensional arrays
import pandas as pd  # Data analysis and manipulation tool

# Transform features to polynomial features for model flexibility
from sklearn.preprocessing import PolynomialFeatures

# Split arrays or matrices into random train and test subsets
from sklearn.model_selection import train_test_split

# Scale features to zero mean and unit variance, commonly used for normalization
from sklearn.preprocessing import StandardScaler

# Provides train/test indices to split data into train/test sets while performing cross-validation
from sklearn.model_selection import KFold

# Calculates MSE
from sklearn.metrics import mean_squared_error



Load the Dataset


In [163]:
# Define feature names
# Specifying the names of the columns in our dataset makes it easier to understand and reference them.
feature_names = ["CRIM", "ZN", "INDUS", "CHAS", "NOX", "RM", "AGE", "RAD", "PTRATIO", "B", "LSTAT", "MEDV"]

# Load the data
# We read data from a CSV (Comma-Separated Values) file into a DataFrame. DataFrame is a 2D labeled data structure in pandas.
filename = '/content/Boston_housing.csv'
df = pd.read_csv(filename, sep='\s+', header=None)

# Display basic information about the dataset
# It's good practice to inspect the dataset's size and first few rows to ensure it's loaded correctly and understand its structure.
print("Dataset shape:", df.shape)
print(df.head())

# Extract features and target
# Machine learning typically involves using features (independent variables) to predict a target (dependent variable).
# Here, we separate the dataset into features (X) and target (y).
X = np.array(df.iloc[:, :13])  # All columns up to the 13th are features
y = np.array(df.iloc[:, 13]).reshape(-1, 1)  # The 13th column is our target, and we reshape it to a 2D array for compatibility.

# Preview data
# It's also good practice to preview the data after separation to ensure everything looks as expected.
print("\nFirst 5 rows of X:\n", X[:5])
print("First 5 values of y:\n", y[:5])
print("X shape:", X.shape)
print("y shape:", y.shape)


Dataset shape: (506, 14)
        0     1     2   3      4      5     6       7   8      9     10  \
0  0.00632  18.0  2.31   0  0.538  6.575  65.2  4.0900   1  296.0  15.3   
1  0.02731   0.0  7.07   0  0.469  6.421  78.9  4.9671   2  242.0  17.8   
2  0.02729   0.0  7.07   0  0.469  7.185  61.1  4.9671   2  242.0  17.8   
3  0.03237   0.0  2.18   0  0.458  6.998  45.8  6.0622   3  222.0  18.7   
4  0.06905   0.0  2.18   0  0.458  7.147  54.2  6.0622   3  222.0  18.7   

       11    12    13  
0  396.90  4.98  24.0  
1  396.90  9.14  21.6  
2  392.83  4.03  34.7  
3  394.63  2.94  33.4  
4  396.90  5.33  36.2  

First 5 rows of X:
 [[6.3200e-03 1.8000e+01 2.3100e+00 0.0000e+00 5.3800e-01 6.5750e+00
  6.5200e+01 4.0900e+00 1.0000e+00 2.9600e+02 1.5300e+01 3.9690e+02
  4.9800e+00]
 [2.7310e-02 0.0000e+00 7.0700e+00 0.0000e+00 4.6900e-01 6.4210e+00
  7.8900e+01 4.9671e+00 2.0000e+00 2.4200e+02 1.7800e+01 3.9690e+02
  9.1400e+00]
 [2.7290e-02 0.0000e+00 7.0700e+00 0.0000e+00 4.6900e-01 7.

Checking for missing values

After getting the data, it's always a good practice to check for missing values in the dataset. Luckily for us, this dataset has no missing values. Here's how you can verify that:


In [164]:
# 2. Check for Missing Values:
print("Missing values in X:", np.isnan(X).sum())
print("Missing values in y:", np.isnan(y).sum())

Missing values in X: 0
Missing values in y: 0


## Implementing 10-Fold Cross-Validation
With the data now loaded into X and y, your next task is to implement the code to select the optimal regularization and polynomial transformation. Utilize 10-fold cross-validation to assess the various configurations.



## 10-Fold Cross-Validation with Feature Scaling and Polynomial Transformation

Cross-validation is a method to assess the performance of a machine learning model on unseen data by dividing the data into a set number of groups, or "folds".

### Why 10-Fold Cross-Validation?

In 10-fold cross-validation, the dataset is randomly divided into ten parts or folds. The idea is to iteratively train the model on 9 of these folds and test it on the tenth. This is done ten times, once for each fold acting as the validation set. By doing so, we're ensuring that each data point gets to be in a validation set exactly once.

### Feature Scaling Within Cross-Validation

Feature scaling ensures that all features contribute equally to the model performance, which is particularly important for algorithms sensitive to feature magnitudes.

When doing cross-validation, it's crucial that we don't introduce data leakage by scaling using statistics from the entire dataset. Instead:
1. Divide the data into training and validation sets.
2. Fit the scaler on the training set.
3. Apply the scaling to both the training and validation sets using this scaler.

### Polynomial Transformation Within Cross-Validation

Polynomial transformations capture more intricate data relationships by adding polynomial features. Here's how you incorporate it into cross-validation:
1. Divide the data into training and validation sets.
2. Fit the polynomial transformer on the training set.
3. Transform both the training and validation sets using this transformer.
4. Fit the scaler on the transformed training set
4. Apply the scaling to both the transformed training and transformed validation sets using this scaler.

---
### Note on Cross-Validation Error Calculation

In most lecture notes and literature on k-fold cross-validation, the procedure for calculating the cross-validation error typically involves computing the mean of the errors obtained from each fold. However, in the context of our analysis, given the relatively small size of the dataset and the possibility of unequal numbers of samples in each fold, this traditional approach might not be mathematically rigorous.

To address this, our approach for calculating the cross-validation error will deviate slightly from the traditional method. Instead of merely averaging the errors from each fold, we will sum up the errors across all folds and then divide by $ N $, the total number of training examples. This ensures that our error estimate is unbiased and takes into account the potential discrepancy in the number of samples across different folds.

Mathematically, the cross-validation error, $ E_{cv} $, for this assignment is computed as:
$$  E_{\text{cv}} = \frac{1}{N} \sum_{i=1}^{k} \sum_{j \in \text{fold } i} (y^{(j)}- \hat{y}^{(j)})^2
 $$
where $ k $ is the number of folds, $ y^{(j)} $ is the true target value of the $j^{th} $ example, and $ \hat{y}^{(j)} $ is the predicted value for the same example.

---


# Your code goes here

Feel free to add any helper functions you may need.

### Part a) 10-fold Cross Validation using Linear Regression

In [165]:
def linear_regression(X, y):
    # use np.linalg.pinv(a)
    # Compute the weights using the closed-form solution
    #### TO-DO #####
    return np.linalg.pinv(X.T.dot(X)).dot(X.T).dot(y)
    ##############

 Next implement Squared Error. It measures the average squared difference between the estimated values (predictions) and the actual values (true values). Mathematically, it is represented as: $  \sum_{i=1}^{N} (y^{(i)} - \hat{y}^{(i)})^2 $


In [166]:
def squared_error(y_test, y_pred):
    #### TO-DO #####
    # Calculate the squared differences
    return mean_squared_error(y_test, y_pred)
    ##############


In [167]:
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error

def k_fold_linear_regression(X, y, k=10):
    """
    Perform k-fold cross-validation for linear regression.
    """
    kf = KFold(n_splits=k, random_state=42, shuffle=True)

    cve_values = []
    ise_values = []

    for train_index, val_index in kf.split(X):
        X_train, X_val = X[train_index], X[val_index]
        y_train, y_val = y[train_index], y[val_index]

        # Scaling the features without data leakage
        scaler = StandardScaler()
        X_train_scaled = scaler.fit_transform(X_train)
        X_val_scaled = scaler.transform(X_val)

        # Adding a bias term (constant column for intercept)
        X_train_scaled = np.hstack((np.ones((X_train_scaled.shape[0], 1)), X_train_scaled))
        X_val_scaled = np.hstack((np.ones((X_val_scaled.shape[0], 1)), X_val_scaled))

        # Fit the model on training data
        beta = linear_regression(X_train_scaled, y_train)

        # Predicting on the validation set
        y_pred = X_val_scaled.dot(beta)
        # Calculating and storing the Mean Squared Error for each fold
        cve = mean_squared_error(y_val, y_pred)
        cve_values.append(cve)

        # Predicting on the TRAINING set for the in-sample error
        y_train_pred = X_train_scaled.dot(beta)
        # Calculating in-sample error for this fold
        fold_ise = mean_squared_error(y_train, y_train_pred)
        ise_values.append(fold_ise)

    e_in = np.mean(ise_values)
    e_cv = np.mean(cve_values)

    print(f"10-Fold Linear Regression")
    print(f"e_in:{e_in}")
    print(f"e_cv:{e_cv}")

    return e_in, e_cv

k_fold_linear_regression(X, y, k=10)


10-Fold Linear Regression
e_in:21.818586996144017
e_cv:23.364203007530946


(21.818586996144017, 23.364203007530946)

### Part b) Adding Ridge Regression
Enhance the previous code to include Ridge Regression.

In [168]:
def ridge_regression(X, y, alpha):
    # Compute the weights using the closed-form solution
    #### TO-DO #####

    ones_column = np.ones((X.shape[0], 1))
    X_eq = np.hstack((ones_column, X))
    identity = np.eye(X_eq.shape[1])
    identity[0][0] = 0

    # Compute the weights using the closed-form solution
    w = np.linalg.inv(X_eq.T @ X_eq + len(X_eq) * alpha * identity) @ X_eq.T @ y
    ##############
    return w

In [169]:
def k_fold_ridge_regression(X, y, k=10, lambdas=list(np.logspace(-5, 1, num=15))):
    """
    Perform k-fold cross-validation for ridge regression with various lambda values without polynomial transformations.
    """
    best_lambda = None
    best_error = float('inf')

    kf = KFold(n_splits=k, random_state=10, shuffle=True)

    for alpha in lambdas:
        mse_values = []

        for train_index, test_index in kf.split(X):
            X_train, X_test = X[train_index], X[test_index]
            y_train, y_test = y[train_index], y[test_index]

            # Scaling the features without data leakage
            scaler = StandardScaler()
            X_train_scaled = scaler.fit_transform(X_train)
            X_test_scaled = scaler.transform(X_test)

            # Training the model using the closed-form solution for ridge regression
            beta = ridge_regression(X_train_scaled, y_train, alpha)

            # Predicting on the TEST set
            y_pred = X_test_scaled.dot(beta[1:]) + beta[0]

            # Calculating and storing the Mean Squared Error for this fold
            mse = mean_squared_error(y_test, y_pred)
            mse_values.append(mse)

        avg_mse = np.mean(mse_values)
        print(f"Alpha: {alpha:.5f}, Average E_cv: {avg_mse:.5f}")

        # Check if this alpha yields a smaller average error than the current best_error
        if avg_mse < best_error:
            best_error = avg_mse
            best_lambda = alpha

    print(f"best_error:{best_error}, best_lambda:{best_lambda}")
    return best_lambda, best_error


In [170]:
#Use your code to answer question b)
#### TO-DO ####
k_fold_ridge_regression(X, y, k=10)
##############

Alpha: 0.00001, Average E_cv: 23.74027
Alpha: 0.00003, Average E_cv: 23.74017
Alpha: 0.00007, Average E_cv: 23.73992
Alpha: 0.00019, Average E_cv: 23.73926
Alpha: 0.00052, Average E_cv: 23.73754
Alpha: 0.00139, Average E_cv: 23.73323
Alpha: 0.00373, Average E_cv: 23.72381
Alpha: 0.01000, Average E_cv: 23.71115
Alpha: 0.02683, Average E_cv: 23.73427
Alpha: 0.07197, Average E_cv: 23.95989
Alpha: 0.19307, Average E_cv: 24.85101
Alpha: 0.51795, Average E_cv: 27.63276
Alpha: 1.38950, Average E_cv: 34.44903
Alpha: 3.72759, Average E_cv: 45.89593
Alpha: 10.00000, Average E_cv: 59.77760
best_error:23.711151123941193, best_lambda:0.01


(0.01, 23.711151123941193)

### Part c) Adding Polynomial Transformations and Ridge Regression
Extend their code to incorporate polynomial transformations combined with Ridge Regression.

In [171]:
def k_fold_poly_ridge(X, y, k=10, lambdas=list(np.logspace(-5, 1, num=15)), degrees=[3]):
    """
    Perform k-fold cross-validation for ridge regression with various lambda values and polynomial transformations.
    """
    best_lambda = None
    best_degree = None
    best_error = float('inf')
    print(lambdas)
    kf = KFold(n_splits=k, random_state=10, shuffle=True)

    for degree in degrees:
        poly = PolynomialFeatures(degree=degree)

        for alpha in lambdas:
            mse_values = []

            for train_index, test_index in kf.split(X):
                X_train, X_test = X[train_index], X[test_index]
                y_train, y_test = y[train_index], y[test_index]

                # Scaling the features without data leakage
                scaler = StandardScaler()
                X_train_scaled = scaler.fit_transform(X_train)
                X_test_scaled = scaler.transform(X_test)

                # Applying polynomial transformation
                X_poly_train = poly.fit_transform(X_train_scaled)
                X_poly_test = poly.transform(X_test_scaled)

                # Training the model using the closed-form solution for ridge regression
                beta = ridge_regression(X_poly_train, y_train, alpha)

                # Predicting on the TEST set
                y_pred = X_poly_test.dot(beta[1:]) + beta[0]

                # Calculating and storing the Mean Squared Error for this fold
                mse = mean_squared_error(y_test, y_pred)
                mse_values.append(mse)

            avg_mse = np.mean(mse_values)
            print(f"Degree: {degree}, Alpha: {alpha:.5f}, Average E_cv: {avg_mse:.5f}")

            # Check if this combination of degree and alpha yields a smaller average error than the current best_error
            if avg_mse < best_error:
                best_error = avg_mse
                best_lambda = alpha
                best_degree = degree

    return best_lambda, best_degree, best_error

k_fold_poly_ridge(X, y, k=10, lambdas=list(np.logspace(-5, 1, num=15)), degrees=[2])


[1e-05, 2.6826957952797274e-05, 7.196856730011514e-05, 0.00019306977288832496, 0.0005179474679231213, 0.0013894954943731374, 0.003727593720314938, 0.01, 0.026826957952797246, 0.07196856730011514, 0.19306977288832497, 0.5179474679231213, 1.389495494373136, 3.727593720314938, 10.0]
Degree: 2, Alpha: 0.00001, Average E_cv: 14.19192
Degree: 2, Alpha: 0.00003, Average E_cv: 14.17653
Degree: 2, Alpha: 0.00007, Average E_cv: 14.12939
Degree: 2, Alpha: 0.00019, Average E_cv: 14.00923
Degree: 2, Alpha: 0.00052, Average E_cv: 13.74932
Degree: 2, Alpha: 0.00139, Average E_cv: 13.28898
Degree: 2, Alpha: 0.00373, Average E_cv: 12.67617
Degree: 2, Alpha: 0.01000, Average E_cv: 12.13625
Degree: 2, Alpha: 0.02683, Average E_cv: 11.92398
Degree: 2, Alpha: 0.07197, Average E_cv: 12.24760
Degree: 2, Alpha: 0.19307, Average E_cv: 13.57718
Degree: 2, Alpha: 0.51795, Average E_cv: 16.67230
Degree: 2, Alpha: 1.38950, Average E_cv: 22.03917
Degree: 2, Alpha: 3.72759, Average E_cv: 30.63089
Degree: 2, Alpha: 1

(0.026826957952797246, 2, 11.92398392723783)

In [172]:
from sklearn.preprocessing import PolynomialFeatures

def k_fold_poly_ridge(X, y, degree=2, k=10, lambdas=list(np.logspace(-5, 1, num=15))):
    best_lambda = None
    best_error = float('inf')
    N = len(X)

    poly = PolynomialFeatures(degree=degree)
    X_poly = poly.fit_transform(X)

    kf = KFold(n_splits=k, random_state=10, shuffle=True)

    for alpha in lambdas:
        total_mse = 0
        total_in_sample_mse = 0

        for train_index, test_index in kf.split(X_poly):
            X_train, X_test = X_poly[train_index], X_poly[test_index]
            y_train, y_test = y[train_index], y[test_index]

            # Scaling the features without data leakage
            scaler = StandardScaler()
            X_train_scaled = scaler.fit_transform(X_train)
            X_test_scaled = scaler.transform(X_test)

            # Training the model using the closed-form solution for ridge regression
            beta = ridge_regression(X_train_scaled, y_train, alpha)

            # Predicting on the TEST set
            y_pred = X_test_scaled.dot(beta[1:])+ beta[0]
            mse = mean_squared_error(y_test, y_pred, squared=False)
            total_mse += mse ** 2 * len(test_index)  # Accumulating squared errors

            # Calculating in-sample error
            y_train_pred = X_train_scaled.dot(beta[1:])+ beta[0]
            in_sample_mse = mean_squared_error(y_train, y_train_pred, squared=False)
            total_in_sample_mse += in_sample_mse ** 2 * len(train_index)  # Accumulating squared errors

        e_cv = total_mse / N  # Calculating cross-validation error
        e_in = total_in_sample_mse / N  # Calculating in-sample error

        print(f"Degree: {degree}, Alpha: {alpha:.5f}, E_in: {e_in:.5f}, E_cv: {e_cv:.5f}")

        # Check if this combination of degree and alpha yields a smaller average error than the current best_error
        if e_cv < best_error:
            best_error = e_cv
            best_lambda = alpha

    return best_lambda, best_error

best_lambda_poly, best_error_poly = k_fold_poly_ridge(X, y, degree=2, k=10, lambdas=list(np.logspace(-5, 1, num=15)))
print(f"Best Lambda with Polynomial Transformation: {best_lambda_poly}, Best E_cv: {best_error_poly:.5f}")


Degree: 2, Alpha: 0.00001, E_in: 51.97156, E_cv: 14.00488
Degree: 2, Alpha: 0.00003, E_in: 52.36401, E_cv: 13.80012
Degree: 2, Alpha: 0.00007, E_in: 53.32126, E_cv: 13.52200
Degree: 2, Alpha: 0.00019, E_in: 55.27321, E_cv: 13.20981
Degree: 2, Alpha: 0.00052, E_in: 58.59941, E_cv: 12.96272
Degree: 2, Alpha: 0.00139, E_in: 63.43724, E_cv: 12.85781
Degree: 2, Alpha: 0.00373, E_in: 70.12868, E_cv: 12.88773
Degree: 2, Alpha: 0.01000, E_in: 79.51064, E_cv: 13.14772
Degree: 2, Alpha: 0.02683, E_in: 93.64010, E_cv: 14.00832
Degree: 2, Alpha: 0.07197, E_in: 115.59037, E_cv: 15.95248
Degree: 2, Alpha: 0.19307, E_in: 142.47135, E_cv: 18.55947
Degree: 2, Alpha: 0.51795, E_in: 169.13159, E_cv: 21.09848
Degree: 2, Alpha: 1.38950, E_in: 199.52394, E_cv: 23.97442
Degree: 2, Alpha: 3.72759, E_in: 248.76167, E_cv: 28.97993
Degree: 2, Alpha: 10.00000, E_in: 333.45412, E_cv: 38.07114
Best Lambda with Polynomial Transformation: 0.0013894954943731374, Best E_cv: 12.85781


In [173]:

# 1. Apply the polynomial transformation with degree 2 to the entire dataset

poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

# 2. Fit the Ridge Regression model to the entire dataset
beta_best = ridge_regression(X_poly, y, 0.0013894954943731374)

# 3. Scale and transform the given features
test_data = np.array([0.1,11,7,0,0.4,6,70,4,6,300,16,360,10]).reshape(1, -1)
test_data_poly = poly.transform(test_data)

predicted_price = test_data_poly.dot(beta_best[1:])+ beta_best[0]

print(f"Predicted average house price for the given features: {predicted_price[0][0]:.2f}")
len(beta_best)

Predicted average house price for the given features: 23.19


106

In [176]:
def generate_output_file_with_insights():
    with open("output.txt", "w") as f:

        # Part (a)
        f.write("a) List of E_cv and average E_in for different λ values:\n")
        for lambda_val in [0] + list(np.logspace(-5, 1, num=15)):
            e_in, e_cv = k_fold_ridge_regression(X, y, k=10, lambdas=[lambda_val])
            f.write(f"Lambda: {lambda_val}, E_in: {e_in}, E_cv: {e_cv}\n")

        # Part (b)
        best_lambda_poly, best_error_poly = k_fold_poly_ridge(X, y, degree=2, k=10, lambdas=list(np.logspace(-5, 1, num=15)))

        # 1. Apply the polynomial transformation with degree 2 to the entire dataset
        poly = PolynomialFeatures(degree=2)
        X_poly = poly.fit_transform(X)

        # 2. Fit the Ridge Regression model to the entire dataset
        beta_best = ridge_regression(X_poly, y, best_lambda_poly)

        # 3. Scale and transform the given features
        test_data = np.array([0.1, 11, 7, 0, 0.4, 6, 70, 4, 6, 300, 16, 360, 10]).reshape(1, -1)
        test_data_poly = poly.transform(test_data)
        predicted_price = test_data_poly.dot(beta_best[1:]) + beta_best[0]

        f.write("\nb) Model Selection and Prediction:\n")
        f.write("Given a choice, I would select the Ridge Regression model with a Polynomial Transformation. ")
        f.write(f"\nSpecified Parameters:\n{beta_best}")
        # Part (c)
        insights = """
        \nc) Insights:

        1. The Ridge Regression model's performance indicates the potential presence of multicollinearity or other complex relationships within the features of the Boston Housing dataset.

        2. The selection of the optimal λ demonstrates the importance of regularization in preventing overfitting, especially when the dataset has many features or when polynomial transformations are used.

        3. Polynomial transformations helped to improve the model's performance, suggesting that some relationships between the variables are non-linear in nature.

        4. The performance difference between various λ values stresses the importance of hyperparameter tuning in machine learning models.

        5. It's essential to scale features, especially when using Ridge Regression, to ensure that each feature's weight is determined without the influence of its scale.

        In conclusion, the experiments indicate the significance of feature engineering, model selection, and hyperparameter tuning in determining the performance of regression models.
        """
        f.write(insights)

generate_output_file_with_insights()


Alpha: 0.00000, Average E_cv: 23.74032
best_error:23.740321100278067, best_lambda:0
Alpha: 0.00001, Average E_cv: 23.74027
best_error:23.74026574931244, best_lambda:1e-05
Alpha: 0.00003, Average E_cv: 23.74017
best_error:23.740172756333124, best_lambda:2.6826957952797274e-05
Alpha: 0.00007, Average E_cv: 23.73992
best_error:23.739924186815507, best_lambda:7.196856730011514e-05
Alpha: 0.00019, Average E_cv: 23.73926
best_error:23.739263817150167, best_lambda:0.00019306977288832496
Alpha: 0.00052, Average E_cv: 23.73754
best_error:23.73753826322511, best_lambda:0.0005179474679231213
Alpha: 0.00139, Average E_cv: 23.73323
best_error:23.733230498055857, best_lambda:0.0013894954943731374
Alpha: 0.00373, Average E_cv: 23.72381
best_error:23.72381173190798, best_lambda:0.003727593720314938
Alpha: 0.01000, Average E_cv: 23.71115
best_error:23.711151123941193, best_lambda:0.01
Alpha: 0.02683, Average E_cv: 23.73427
best_error:23.7342697287092, best_lambda:0.026826957952797246
Alpha: 0.07197, Av