        For all the classes and methods in this assignment you will use the PyTorch library. You should use a double precision data type and the device is either "cpu" or "cuda".



        1. (5 points) Create your own PyTorch class that implements the method of SCAD regularization and variable selection (smoothly clipped absolute deviations) for linear models. Your development should be based on the following references:

https://andrewcharlesjones.github.io/journal/scad.html
https://www.jstor.org/stable/27640214?seq=1
            Test your method one a real data set, and determine a variable selection based on features importance according to SCAD.


In [None]:
#given beta star, do 500 sim of x and y.

In [94]:
import torch # we are going to use pytorch instead of numpy because it's much faster.
import torch.nn as nn
# from ignite.contrib.metrics.regression import R2Score
import torch.optim as optim
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
import seaborn as sns
import numpy as np
import pandas as pd
from sklearn import linear_model
from scipy.optimize import minimize
from scipy.linalg import toeplitz
from sklearn.metrics import mean_absolute_error as mae, mean_squared_error as mse, r2_score as R2
from sklearn.model_selection import train_test_split as tts

In [2]:
device = torch.device("cpu")
dtype = torch.float64

## Question 1

In [3]:
#beta, lambda, alpha
#convert this to torch, it looks like all of the methods are there already
#initialize beta hat with a sparsity pattern
#what is lambda? regularization term. do we get from the dataset or do we initialize ourselves
def scad_penalty(beta_hat, lambda_val, a_val):
    is_linear = (torch.abs(beta_hat) <= lambda_val)
    is_quadratic = torch.logical_and(lambda_val < torch.abs(beta_hat), torch.abs(beta_hat) <= a_val * lambda_val)
    is_constant = (a_val * lambda_val) < torch.abs(beta_hat)

    linear_part = lambda_val * torch.abs(beta_hat) * is_linear
    quadratic_part = (2 * a_val * lambda_val * torch.abs(beta_hat) - beta_hat**2 - lambda_val**2) / (2 * (a_val - 1)) * is_quadratic
    constant_part = (lambda_val**2 * (a + 1)) / 2 * is_constant
    return linear_part + quadratic_part + constant_part

def scad_derivative(beta_hat, lambda_val, a_val):
    return lambda_val * ((beta_hat <= lambda_val) + (a_val * lambda_val - beta_hat)*((a_val * lambda_val - beta_hat) > 0) / ((a_val - 1) * lambda_val) * (beta_hat > lambda_val))

In [4]:
class StandardScaler:
    def __init__(self):
        self.mean = None
        self.std = None

    def fit(self, data):
        """
        Compute the minimum and maximum value of the data for scaling.

        Args:
        - data (torch.Tensor): Input data tensor.
        """
        self.mean = torch.mean(data, dim=0, keepdim=True)
        self.std = torch.std(data, dim=0, keepdim=True)+1e-10

    def transform(self, data):
        """
        Scale the data based on the computed minimum and maximum values.

        Args:
        - data (torch.Tensor): Input data tensor.

        Returns:
        - torch.Tensor: Scaled data tensor.
        """
        if self.mean is None or self.std is None:
            raise ValueError("Scaler has not been fitted yet. Please call 'fit' with appropriate data.")

        scaled_data = (data - self.mean) / (self.std)
        return scaled_data

    def fit_transform(self, data):
        """
        Fit to data, then transform it.

        Args:
        - data (torch.Tensor): Input data tensor.

        Returns:
        - torch.Tensor: Scaled data tensor.
        """
        self.fit(data)
        return self.transform(data)

In [5]:
class MinMaxScaler:
    def __init__(self):
        self.min = None
        self.max = None

    def fit(self, data):
        """
        Compute the minimum and maximum value of the data for scaling.

        Args:
        - data (torch.Tensor): Input data tensor.
        """
        self.min = torch.min(data, dim=0, keepdim=True).values
        self.max = torch.max(data, dim=0, keepdim=True).values

    def transform(self, data):
        """
        Scale the data based on the computed minimum and maximum values.

        Args:
        - data (torch.Tensor): Input data tensor.

        Returns:
        - torch.Tensor: Scaled data tensor.
        """
        if self.min is None or self.max is None:
            raise ValueError("Scaler has not been fitted yet. Please call 'fit' with appropriate data.")

        scaled_data = (data - self.min) / (self.max - self.min)
        return scaled_data

    def fit_transform(self, data):
        """
        Fit to data, then transform it.

        Args:
        - data (torch.Tensor): Input data tensor.

        Returns:
        - torch.Tensor: Scaled data tensor.
        """
        self.fit(data)
        return self.transform(data)

In [151]:
class ElasticNet(nn.Module):
    def __init__(self, input_size, alpha=1.0, l1_ratio=0.5):
        """
        Initialize the ElasticNet regression model.

        Args:
            input_size (int): Number of input features.
            alpha (float): Regularization strength. Higher values of alpha
                emphasize L1 regularization, while lower values emphasize L2 regularization.
            l1_ratio (float): The ratio of L1 regularization to the total
                regularization (L1 + L2). It should be between 0 and 1.

        """
        super(ElasticNet, self).__init__()
        self.input_size = input_size
        self.alpha = alpha
        self.l1_ratio = l1_ratio

        # Define the linear regression layer
        self.linear = nn.Linear(input_size, 1).double()

    def forward(self, x):
        """
        Forward pass of the ElasticNet model.

        Args:
            x (Tensor): Input data with shape (batch_size, input_size).

        Returns:
            Tensor: Predicted values with shape (batch_size, 1).

        """
        return self.linear(x)

    def loss(self, y_pred, y_true):
        """
        Compute the ElasticNet loss function.

        Args:
            y_pred (Tensor): Predicted values with shape (batch_size, 1).
            y_true (Tensor): True target values with shape (batch_size, 1).

        Returns:
            Tensor: The ElasticNet loss.

        """
        mse_loss = nn.MSELoss()(y_pred, y_true)
        l1_reg = torch.norm(self.linear.weight, p=1)
        l2_reg = torch.norm(self.linear.weight, p=2)

        penalty_term = self.alpha * (
                    self.l1_ratio * l1_reg + (1 - self.l1_ratio) * l2_reg
                )
        loss = (1/2)*mse_loss + penalty_term
        return loss,penalty_term

    def fit(self, X, y, num_epochs=100, learning_rate=0.01):
        """
        Fit the ElasticNet model to the training data.

        Args:
            X (Tensor): Input data with shape (num_samples, input_size).
            y (Tensor): Target values with shape (num_samples, 1).
            num_epochs (int): Number of training epochs.
            learning_rate (float): Learning rate for optimization.

        """
        optimizer = optim.SGD(self.parameters(), lr=learning_rate)

        for epoch in range(num_epochs):
            self.train()
            optimizer.zero_grad()
            y_pred = self(X)
            loss, penalty_term = self.loss(y_pred, y)
            loss.backward()
            optimizer.step()

            if (epoch + 1) % 10 == 0:
                print(f"Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item(), penalty_term}")

    def predict(self, X):
        """
        Predict target values for input data.

        Args:
            X (Tensor): Input data with shape (num_samples, input_size).

        Returns:
            Tensor: Predicted values with shape (num_samples, 1).

        """
        self.eval()
        with torch.no_grad():
            y_pred = self(X)
        return y_pred
    def get_coefficients(self):
        """
        Get the coefficients (weights) of the linear regression layer.

        Returns:
            Tensor: Coefficients with shape (output_size, input_size).

        """
        return self.linear.weight


In [7]:
class sqrtLasso(nn.Module):
    def __init__(self, input_size, alpha=0.1):
        """
        Initialize the  regression model.


        """
        super(sqrtLasso, self).__init__()
        self.input_size = input_size
        self.alpha = alpha


        # Define the linear regression layer
        self.linear = nn.Linear(input_size, 1).double()

    def forward(self, x):
        """
        Forward pass of the model.

        Args:
            x (Tensor): Input data with shape (batch_size, input_size).

        Returns:
            Tensor: Predicted values with shape (batch_size, 1).

        """
        return self.linear(x)

    def loss(self, y_pred, y_true):
        """
        Compute the loss function.

        Args:
            y_pred (Tensor): Predicted values with shape (batch_size, 1).
            y_true (Tensor): True target values with shape (batch_size, 1).

        Returns:
            Tensor: The loss.

        """
        mse_loss = nn.MSELoss()(y_pred, y_true)
        l1_reg = torch.norm(self.linear.weight, p=1,dtype=torch.float64)
        # l2_reg = torch.norm(self.linear.weight, p=2,dtype=torch.float64)

        loss = (len(y_true)*mse_loss)**(1/2) + self.alpha * (l1_reg)

        return loss

    def fit(self, X, y, num_epochs=200, learning_rate=0.01):
        """
        Fit the model to the training data.

        Args:
            X (Tensor): Input data with shape (num_samples, input_size).
            y (Tensor): Target values with shape (num_samples, 1).
            num_epochs (int): Number of training epochs.
            learning_rate (float): Learning rate for optimization.

        """
        optimizer = optim.Adam(self.parameters(), lr=learning_rate)

        for epoch in range(num_epochs):
            self.train()
            optimizer.zero_grad()
            y_pred = self(X)
            loss = self.loss(y_pred, y)
            loss.backward()
            optimizer.step()

            if (epoch + 1) % 100 == 0:
                print(f"Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item()}")

    def predict(self, X):
        """
        Predict target values for input data.

        Args:
            X (Tensor): Input data with shape (num_samples, input_size).

        Returns:
            Tensor: Predicted values with shape (num_samples, 1).

        """
        self.eval()
        with torch.no_grad():
            y_pred = self(X)
        return y_pred
    def get_coefficients(self):
        """
        Get the coefficients (weights) of the linear regression layer.

        Returns:
            Tensor: Coefficients with shape (output_size, input_size).

        """
        return self.linear.weight

In [8]:
def scad_penalty(beta_hat, lambda_val, a_val):
    is_linear = (torch.abs(beta_hat) <= lambda_val)
    is_quadratic = torch.logical_and(lambda_val < torch.abs(beta_hat), torch.abs(beta_hat) <= a_val * lambda_val)
    is_constant = (a_val * lambda_val) < torch.abs(beta_hat)

    linear_part = lambda_val * torch.abs(beta_hat) * is_linear
    quadratic_part = (2 * a_val * lambda_val * torch.abs(beta_hat) - beta_hat**2 - lambda_val**2) / (2 * (a_val - 1)) * is_quadratic
    constant_part = (lambda_val**2 * (a_val + 1)) / 2 * is_constant
    return linear_part + quadratic_part + constant_part

In [11]:
from google.colab import files
files = files.upload()

Saving concrete.csv to concrete.csv


In [12]:
conc = pd.read_csv('concrete.csv')
conc.shape

(1030, 9)

In [72]:
y = conc.iloc[:,-1].values
x = conc.iloc[:, :-1].values

In [93]:
ss = StandardScaler()

In [95]:
xtrain, xtest, ytrain, ytest = tts(x,y,test_size=0.2,shuffle=True,random_state=123)
xtrain = ss.fit_transform(xtrain)
xtest = ss.transform(xtest)

In [96]:
xtrain = torch.tensor(xtrain)
xtest = torch.tensor(xtest)
ytrain = torch.tensor(ytrain)
ytest = torch.tensor(ytest)

  xtrain = torch.tensor(xtrain)
  xtest = torch.tensor(xtest)
  ytrain = torch.tensor(ytrain)
  ytest = torch.tensor(ytest)


In [25]:
class SCAD(nn.Module):
    def __init__(self, input_size, alpha=0.1, lambda_val = .95):
        """
        Initialize the  regression model.


        """
        super(SCAD, self).__init__()
        self.input_size = input_size
        self.alpha = alpha
        self.lambda_val = lambda_val

        # Define the linear regression layer
        self.linear = nn.Linear(input_size, 1).double()

    def scad_penalty(self, beta_hat, lambda_val, a_val):
        is_linear = (torch.abs(beta_hat) <= lambda_val)
        is_quadratic = torch.logical_and(lambda_val < torch.abs(beta_hat), torch.abs(beta_hat) <= a_val * lambda_val)
        is_constant = (a_val * lambda_val) < torch.abs(beta_hat)

        linear_part = lambda_val * torch.abs(beta_hat) * is_linear
        quadratic_part = (2 * a_val * lambda_val * torch.abs(beta_hat) - beta_hat**2 - lambda_val**2) / (2 * (a_val - 1)) * is_quadratic
        constant_part = (lambda_val**2 * (a_val + 1)) / 2 * is_constant
        return linear_part + quadratic_part + constant_part

    def forward(self, x):
        """
        Forward pass of the model.

        Args:
            x (Tensor): Input data with shape (batch_size, input_size).

        Returns:
            Tensor: Predicted values with shape (batch_size, 1).

        """
        return self.linear(x)

    def loss(self, y_pred, y_true):
        """
        Compute the loss function.

        Args:
            y_pred (Tensor): Predicted values with shape (batch_size, 1).
            y_true (Tensor): True target values with shape (batch_size, 1).

        Returns:
            Tensor: The loss.

        """
        mse_loss = nn.MSELoss()(y_pred, y_true)

        beta =torch.tensor([-1,2,2,0,0,0,2,-1])
        beta = beta.reshape(-1,1)

        num_zeros = self.input_size - len(beta)

        # Create a tensor of zeros with the required shape
        additional_zeros = torch.zeros((num_zeros, 1))
        #betas = torch.concat([beta,torch.repeat(0,self.input_size-len(beta)).reshape(-1,1)],axis=0)
        betas = torch.cat((beta, additional_zeros), dim=0)

        scad_pen_term = scad_penalty(betas,self.lambda_val,self.alpha)

        loss = (1/2)*mse_loss + scad_pen_term

        return loss

    def fit(self, X, y, num_epochs=200, learning_rate=0.01):
        """
        Fit the model to the training data.

        Args:
            X (Tensor): Input data with shape (num_samples, input_size).
            y (Tensor): Target values with shape (num_samples, 1).
            num_epochs (int): Number of training epochs.
            learning_rate (float): Learning rate for optimization.

        """
        optimizer = optim.Adam(self.parameters(), lr=learning_rate)

        for epoch in range(num_epochs):
            self.train()
            optimizer.zero_grad()
            y_pred = self(X)
            loss = self.loss(y_pred, y)
            loss.backward()
            optimizer.step()

            if (epoch + 1) % 100 == 0:
                print(f"Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item()}")

    def predict(self, X):
        """
        Predict target values for input data.

        Args:
            X (Tensor): Input data with shape (num_samples, input_size).

        Returns:
            Tensor: Predicted values with shape (num_samples, 1).

        """
        self.eval()
        with torch.no_grad():
            y_pred = self(X)
        return y_pred
    def get_coefficients(self):
        """
        Get the coefficients (weights) of the linear regression layer.

        Returns:
            Tensor: Coefficients with shape (output_size, input_size).

        """
        return self.linear.weight

In [174]:
class SCAD(nn.Module):
    def __init__(self, input_size, alpha=0.01, lambda_val=0.78):
        super(SCAD, self).__init__()
        self.input_size = input_size
        self.a_val = alpha
        self.lambda_val = lambda_val
        self.linear = nn.Linear(input_size, 1).double()

    def scad_penalty(self, beta_hat, lambda_val, a_val):
        is_linear = (torch.abs(beta_hat) <= lambda_val)
        is_quadratic = torch.logical_and(lambda_val < torch.abs(beta_hat), torch.abs(beta_hat) <= a_val * lambda_val)
        is_constant = (a_val * lambda_val) < torch.abs(beta_hat)

        linear_part = lambda_val * torch.abs(beta_hat) * is_linear
        quadratic_part = (2 * a_val * lambda_val * torch.abs(beta_hat) - beta_hat**2 - lambda_val**2) / (2 * (a_val - 1)) * is_quadratic
        constant_part = (lambda_val**2 * (a_val + 1)) / 2 * is_constant
        return linear_part + quadratic_part + constant_part

    def forward(self, x):
        return self.linear(x)

    def loss(self, y_pred, y_true):
        mse_loss = nn.MSELoss()(y_pred, y_true)
        beta_hat = self.linear.weight
        scad_pen_term = self.scad_penalty(beta_hat,self.lambda_val,self.a_val)
        scad_pen_term = torch.mean(scad_pen_term)
        loss = (1 / 2) * mse_loss + scad_pen_term
        return loss,scad_pen_term

    def fit(self, X, y, num_epochs=200, learning_rate=0.01):
        optimizer = optim.Adam(self.parameters(), lr=learning_rate)
        for epoch in range(num_epochs):
            self.train()
            optimizer.zero_grad()
            y_pred = self(X)
            y_true = y.view(-1, 1)
            loss,scad_pen_term = self.loss(y_pred, y_true)
            loss.backward()
            optimizer.step()
            if (epoch + 1) % 100 == 0:
                print(f"Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item(),scad_pen_term}")

    def predict(self, X):
        self.eval()
        with torch.no_grad():
            y_pred = self(X)
        return y_pred

    def get_coefficients(self):
        return self.linear.weight

In [108]:
model = SCAD(input_size=xtrain.shape[1])

In [109]:
model.fit(xtrain,ytrain,num_epochs=20000,learning_rate=0.01)

Epoch [100/20000], Loss: (718.5927987275438, tensor(0.4762, dtype=torch.float64, grad_fn=<MeanBackward0>))
Epoch [200/20000], Loss: (665.4216087842962, tensor(0.3232, dtype=torch.float64, grad_fn=<MeanBackward0>))
Epoch [300/20000], Loss: (620.321765121233, tensor(0.3232, dtype=torch.float64, grad_fn=<MeanBackward0>))
Epoch [400/20000], Loss: (581.000314521797, tensor(0.3232, dtype=torch.float64, grad_fn=<MeanBackward0>))
Epoch [500/20000], Loss: (546.0279416489645, tensor(0.3232, dtype=torch.float64, grad_fn=<MeanBackward0>))
Epoch [600/20000], Loss: (514.3266978674164, tensor(0.3232, dtype=torch.float64, grad_fn=<MeanBackward0>))
Epoch [700/20000], Loss: (485.10980125515744, tensor(0.3232, dtype=torch.float64, grad_fn=<MeanBackward0>))
Epoch [800/20000], Loss: (457.90884473267056, tensor(0.3958, dtype=torch.float64, grad_fn=<MeanBackward0>))
Epoch [900/20000], Loss: (432.1893236412573, tensor(0.3683, dtype=torch.float64, grad_fn=<MeanBackward0>))
Epoch [1000/20000], Loss: (407.846488

In [110]:
mse(model.predict(xtest),ytest)

104.2929170816051

In [111]:
model.get_coefficients()

Parameter containing:
tensor([[11.5211,  7.9807,  4.7786, -3.6877,  1.9179,  0.6645,  0.6535,  7.1890]],
       dtype=torch.float64, requires_grad=True)

In [179]:
model1 = sqrtLasso(input_size=xtrain.shape[1])

In [180]:
model1.fit(xtrain,ytrain, num_epochs = 20000,learning_rate = .01)

  return F.mse_loss(input, target, reduction=self.reduction)


Epoch [100/20000], Loss: 1101.1778712341327
Epoch [200/20000], Loss: 1075.5073496126747
Epoch [300/20000], Loss: 1050.061087838795
Epoch [400/20000], Loss: 1024.8547351912694
Epoch [500/20000], Loss: 999.9046351470631
Epoch [600/20000], Loss: 975.227477592403
Epoch [700/20000], Loss: 950.8425891913314
Epoch [800/20000], Loss: 926.7707714303557
Epoch [900/20000], Loss: 903.0333785883208
Epoch [1000/20000], Loss: 879.6546560832107
Epoch [1100/20000], Loss: 856.6606169196449
Epoch [1200/20000], Loss: 834.0780221710646
Epoch [1300/20000], Loss: 811.9368169185057
Epoch [1400/20000], Loss: 790.2688108696594
Epoch [1500/20000], Loss: 769.1066081038941
Epoch [1600/20000], Loss: 748.4858760286678
Epoch [1700/20000], Loss: 728.4438321742853
Epoch [1800/20000], Loss: 709.0179905831
Epoch [1900/20000], Loss: 690.2481981232718
Epoch [2000/20000], Loss: 672.1747941328301
Epoch [2100/20000], Loss: 654.8369557044236
Epoch [2200/20000], Loss: 638.2743479930638
Epoch [2300/20000], Loss: 622.524811915448

In [114]:
mse(model1.predict(xtest),ytest)

275.09446915315567

In [116]:
model1.get_coefficients()

Parameter containing:
tensor([[-0.0012, -0.0012,  0.0020, -0.0012,  0.0007, -0.0017, -0.0008, -0.0007]],
       dtype=torch.float64, requires_grad=True)

In [183]:
model2 = ElasticNet(input_size=xtrain.shape[1],alpha=0.01,l1_ratio=0.5)

In [184]:
model2.fit(xtrain,ytrain, num_epochs = 20000,learning_rate = .01)

Epoch [10/20000], Loss: (660.640437261402, tensor(0.0063, dtype=torch.float64, grad_fn=<MulBackward0>))
Epoch [20/20000], Loss: (565.809773785003, tensor(0.0061, dtype=torch.float64, grad_fn=<MulBackward0>))
Epoch [30/20000], Loss: (488.24991689396484, tensor(0.0059, dtype=torch.float64, grad_fn=<MulBackward0>))
Epoch [40/20000], Loss: (424.81483359827274, tensor(0.0057, dtype=torch.float64, grad_fn=<MulBackward0>))
Epoch [50/20000], Loss: (372.9318436775245, tensor(0.0056, dtype=torch.float64, grad_fn=<MulBackward0>))
Epoch [60/20000], Loss: (330.4969942689609, tensor(0.0054, dtype=torch.float64, grad_fn=<MulBackward0>))
Epoch [70/20000], Loss: (295.7895942836346, tensor(0.0053, dtype=torch.float64, grad_fn=<MulBackward0>))
Epoch [80/20000], Loss: (267.40237134594105, tensor(0.0052, dtype=torch.float64, grad_fn=<MulBackward0>))
Epoch [90/20000], Loss: (244.18436635023937, tensor(0.0050, dtype=torch.float64, grad_fn=<MulBackward0>))
Epoch [100/20000], Loss: (225.1942390072803, tensor(0

In [185]:
mse(model2.predict(xtest),ytest)

275.0824991880445

In [90]:
conc.iloc[:, :-1]

Unnamed: 0,cement,slag,ash,water,superplastic,coarseagg,fineagg,age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360
...,...,...,...,...,...,...,...,...
1025,276.4,116.0,90.3,179.6,8.9,870.1,768.3,28
1026,322.2,0.0,115.6,196.0,10.4,817.9,813.4,28
1027,148.5,139.4,108.6,192.7,6.1,892.4,780.0,28
1028,159.1,186.7,0.0,175.6,11.3,989.6,788.9,28


##By looking at the loss above, we can see that SCAD is providing much better results than sqrtLasso and elastic net. In this example I scaled the data using standard scaler and did a tts

# By looking at the results from getting the weights of the model, we can also see which features are likely to be impactful to the output, which are: 1,2,3,4,5, and 8. This is because they are further away from zero.

## q2
        2. (4 points) Based on the simulation design explained in class, generate 500 data sets where the input features have a strong correlation structure (you may consider a 0.9) and apply ElasticNet, SqrtLasso and SCAD to check which method produces the best approximation of an ideal solution, such as a "betastar" you design with a sparsity pattern.



         3. (1 point) Host your project in your GitHub space.

In [119]:
def make_correlated_features(num_samples,p,rho):
  vcor = []
  for i in range(p):
    vcor.append(rho**i)
  r = toeplitz(vcor)
  mu = np.repeat(0,p)
  x = np.random.multivariate_normal(mu, r, size=num_samples)
  return x

In [120]:
rho =0.9 # closer to one = higher corr
p = 20 #features
n = 500
vcor = []
for i in range(p):
  vcor.append(rho**i)

In [121]:
x = make_correlated_features(n,p,rho)

In [132]:
x = torch.tensor(x)

In [122]:
beta =np.array([-1,2,3,0,0,0,0,2,-1,4])
beta = beta.reshape(-1,1)
betastar = np.concatenate([beta,np.repeat(0,p-len(beta)).reshape(-1,1)],axis=0)

In [133]:
y = x@betastar + 1.5*np.random.normal(size=(n,1))

In [143]:
x = torch.tensor(x,device=device)
y = torch.tensor(y,device=device)

  x = torch.tensor(x,device=device)
  y = torch.tensor(y,device=device)


In [175]:
model_scad = SCAD(input_size=x.shape[1])

In [176]:
model_scad.fit(x,y,num_epochs=20000,learning_rate=0.01)

Epoch [100/20000], Loss: (3.970210735046212, tensor(0.5940, dtype=torch.float64, grad_fn=<MeanBackward0>))
Epoch [200/20000], Loss: (3.0097428616869437, tensor(0.5631, dtype=torch.float64, grad_fn=<MeanBackward0>))
Epoch [300/20000], Loss: (2.713854042146505, tensor(0.4829, dtype=torch.float64, grad_fn=<MeanBackward0>))
Epoch [400/20000], Loss: (2.498473498005278, tensor(0.4579, dtype=torch.float64, grad_fn=<MeanBackward0>))
Epoch [500/20000], Loss: (2.3261197343149154, tensor(0.4484, dtype=torch.float64, grad_fn=<MeanBackward0>))
Epoch [600/20000], Loss: (2.12067892129961, tensor(0.3818, dtype=torch.float64, grad_fn=<MeanBackward0>))
Epoch [700/20000], Loss: (1.9878870600610798, tensor(0.3644, dtype=torch.float64, grad_fn=<MeanBackward0>))
Epoch [800/20000], Loss: (1.8876137933064547, tensor(0.3544, dtype=torch.float64, grad_fn=<MeanBackward0>))
Epoch [900/20000], Loss: (1.8317304156186325, tensor(0.3764, dtype=torch.float64, grad_fn=<MeanBackward0>))
Epoch [1000/20000], Loss: (1.7655

In [152]:
model_en = ElasticNet(input_size=x.shape[1],alpha=0.01,l1_ratio=0.5)

In [154]:
model_en.fit(x,y,num_epochs=20000,learning_rate=0.01)

Epoch [10/20000], Loss: (1.2036475423759185, tensor(0.0859, dtype=torch.float64, grad_fn=<MulBackward0>))
Epoch [20/20000], Loss: (1.203112475424171, tensor(0.0860, dtype=torch.float64, grad_fn=<MulBackward0>))
Epoch [30/20000], Loss: (1.2025848792632357, tensor(0.0860, dtype=torch.float64, grad_fn=<MulBackward0>))
Epoch [40/20000], Loss: (1.202064636856501, tensor(0.0861, dtype=torch.float64, grad_fn=<MulBackward0>))
Epoch [50/20000], Loss: (1.201551455084848, tensor(0.0862, dtype=torch.float64, grad_fn=<MulBackward0>))
Epoch [60/20000], Loss: (1.201045579623426, tensor(0.0863, dtype=torch.float64, grad_fn=<MulBackward0>))
Epoch [70/20000], Loss: (1.200546724573865, tensor(0.0864, dtype=torch.float64, grad_fn=<MulBackward0>))
Epoch [80/20000], Loss: (1.2000547817022766, tensor(0.0865, dtype=torch.float64, grad_fn=<MulBackward0>))
Epoch [90/20000], Loss: (1.1995694677752309, tensor(0.0866, dtype=torch.float64, grad_fn=<MulBackward0>))
Epoch [100/20000], Loss: (1.199091029975385, tensor

In [141]:
model_sl = sqrtLasso(input_size=x.shape[1])

In [142]:
model_sl.fit(x,y, num_epochs = 20000,learning_rate = .009 )

Epoch [100/20000], Loss: 51.80284647549593
Epoch [200/20000], Loss: 46.83016926410336
Epoch [300/20000], Loss: 43.69513091237008
Epoch [400/20000], Loss: 41.07820357037198
Epoch [500/20000], Loss: 39.058964776290175
Epoch [600/20000], Loss: 37.51248298943701
Epoch [700/20000], Loss: 36.344010497352805
Epoch [800/20000], Loss: 35.52069845687445
Epoch [900/20000], Loss: 34.981071348522775
Epoch [1000/20000], Loss: 34.61557839427149
Epoch [1100/20000], Loss: 34.375098434251115
Epoch [1200/20000], Loss: 34.221865309711355
Epoch [1300/20000], Loss: 34.12757798379472
Epoch [1400/20000], Loss: 34.07195825371007
Epoch [1500/20000], Loss: 34.04307266099307
Epoch [1600/20000], Loss: 34.02874772868155
Epoch [1700/20000], Loss: 34.021387128422845
Epoch [1800/20000], Loss: 34.01775889830015
Epoch [1900/20000], Loss: 34.01604424561523
Epoch [2000/20000], Loss: 34.01531030815597
Epoch [2100/20000], Loss: 34.01497777824369
Epoch [2200/20000], Loss: 34.01484889650445
Epoch [2300/20000], Loss: 34.014811

## it looks like in this case that the elastic net performs slightly better than the SCAD, with the sqrtLasso coming in a distance third. However, the difference in the size of the penalty term is where elastic net and scad differ, with the penalty term of SCAD being twice as large as that of Elastic Net, which may lead to SCAD being less likely to be overfit when looking at training and testing data. This is backed up when looking at the results from part one, although more testing is needed


