In [1]:
import numpy as np
training_size = 100000
validation_size = 50000
test_size = 50000

In [2]:
def f(x):
  return 100*x**3 - 17*x**2 + 3*x + 11

**Data Generation:**
We generate the data according to the following function:
> $f(x) = 100x^3 - 17x^2 + 3x + 11$

But in real life, the data don't abide by such a nice polynomial function. There are some noises. Let's say that the noise follows a normal distribution with mean = 0, and standard deviation = 0.1 
So, we will generate our data using the following function:
> $g(x) = 100x^3 - 17x^2 + 3x + 11 + ϵ $

Where, $ϵ \sim \mathcal{N}(0, 0.1)$. Let's say, our training data size is $n$. Then we generate $x_{1}, x_{2}, \cdots, x_{n}$ from a uniform distribution $[0, 1]$. Then for each $x_{i}$, we compute $y_{i} = g(x_{i})$. 

Now, let our hypothesis be a cubic polynomial of $x$. Hence, $h(x) = ax^3 + bx^2 + cx + d$. From the training dataset, we can create $n$ linear equations with 4 unknowns $(a, b, c, d)$. 
 >$ax_{1}^3 + bx_{1}^2 + cx_{1} + d = y_{1}$

 > $ax_{2}^3 + bx_{2}^2 + cx_{2} + d = y_{2}$

 > ...

 > $ax_{n}^3 + bx_{n}^2 + cx_{n} + d = y_{n}$

 Notice that this is a system of linear equations. So, we can solve this problem of polynomial regression using just linear regression. And linear regression can be solved using many techniques such as least squares, gradient descent, calculus and so on. The data generation part is shown below.

In [3]:
def generate(a, b, mean, std, size):
  x = np.random.uniform(a, b, size)
  y = [f(a) for a in x]
  noise = np.random.normal(mean, std, size)
  y = y + noise
  y = np.array(y)
  return x, y

In [4]:
train_x, train_y = generate(0, 1, 0, 0.1, training_size)
validation_x, validation_y = generate(0, 1, 0, 0.1, validation_size)
test_x, test_y = generate(0, 1, 0, 0.1, test_size)
print(train_x)
print(train_y)

[0.20093004 0.71121649 0.4852239  ... 0.5260509  0.00817612 0.60059367]
[11.85437583 40.50269755 19.85436883 ... 22.34172921 11.17225145
 28.33371072]


**Regularization and Weight Decay**
In linear regression our objective function is the mean square error. We want to minimize that. 
>> $E = \frac{\sum_{i = 1}^{n} (y_{i} - h(x_{i}))^{2}}{n}$

But, a complex hypothesis might minimize this empirical error. And we might pick this hypothesis over a simpler one. So, in order to penalize more complex hypothesis we can minimize the following objective function.

>> $\frac{\sum_{i = 1}^{n} (y_{i} - h(x_{i}))^{2}}{n} + \alpha$ $c(h)$

Where $c(h)$ means the complexity of the hypothesis $h$. There are many ways of defining the complexity of a hypothesis. Two of the most popular ways is to take:

$c(h) = |w_{1}|^{2} + |w_{2}|^{2} + \cdots + |w_{n}|^{2}$, or


$c(h) = |w_{1}| + |w_{2}| + \cdots + |w_{n}|$

where $w_{i}$ are the parameters of the hypothesis. If we use the former formula as the complexity, the regression problem is called ridge regression and if we use the second formula, the regression problem is called lasso. This sort fo regularization is called 'weight decay'.


**Validation**: Our model includes the choice of the degree of the polynomial, coefficients of the polynomial, $\alpha$ the regularization parameter, $\lambda$ the learning rate etc. How do we know which combinations of these parameters are going to yield the best result? We cannot rely on the empirical error to decide which model performs the best. Because our models were trained according to the training data. But what if we had access to some spare data? Let's call this the validation data. In that case, we could test our different models in this validation data and choose the best performing data? This process is called validation.

**Cross Validation**: Sometimes, we do not have access to spare data. So, we need to treat part of our training data as the validation data. Sometimes, the training data is split into $k$ parts. Then we treat each data segment as the validation data and the rest $k-1$ segments are treated as the training data. Then the final validation error is the average of the validation errors in all of these $k$ validation data segments. And finally all of the models go through this k-fold cross validation. The model that performs the best is chosen for the testing phase.

**The Task**

***Part 1:***
1.   You have 6 models. 
> * linear polynomial regressor of degree 4
> * linear polynomial regressor of degree 3
> * lasso polynomial regressor of degree 4 with $\alpha$ = 0.01
> * lasso polynomial regressor of degree 3 with $\alpha$ = 0.01
> * ridge polynomial regressor of degree 4 with $\alpha$ = 0.1
> * ridge polynomial regressor of degree 3 with $\alpha$ = 0.1


2.   Train on the given training data.
3.   Test on the given validation data.
4.   Choose the model that performs the best.
5.   Test your chosen model on the testing data.
***Part 2:***
1. Suppose, we don't have the validation data. Use the training data to perform 5-fold validation and then choose the best model.



In [5]:
# #########PART-1#########

from sklearn.linear_model import LinearRegression
x_2 = []
for i in train_x:
    x_2.append(i**2)
x_2 = np.array(x_2)
x_3 = []
for i in train_x:
    x_3.append(i**3)
x_3 = np.array(x_3)
ones = []
for i in train_x:
    ones.append(1)
ones = np.array(ones)

X3 = np.matrix([x_3,x_2,train_x,ones])
X3 = X3.transpose()
reg = LinearRegression().fit(X3, train_y)
print(reg.coef_)


[ 99.9876187  -16.98375718   2.99315254   0.        ]




In [6]:
xsqr = []
for i in validation_x:
    xsqr.append(i**2)
xsqr = np.array(xsqr)
xcube = []
for i in validation_x:
    xcube.append(i**3)
xcube = np.array(xcube)
ones = []
for i in validation_x:
    ones.append(1)
ones = np.array(ones)

validationX3 = np.matrix([xcube,xsqr,validation_x,ones])
print(validationX3)

validationX3 = validationX3.transpose()

print(validationX3)


#reg = LinearRegression().fit(validationX, validation_y)

prediction = reg.predict(validationX3)


[[0.78225062 0.17926129 0.43832732 ... 0.13310757 0.44515229 0.02224487]
 [0.84897964 0.31792475 0.57703084 ... 0.26069648 0.58300518 0.07909577]
 [0.92140091 0.56384816 0.75962546 ... 0.51058445 0.76354776 0.28123971]
 [1.         1.         1.         ... 1.         1.         1.        ]]
[[0.78225062 0.84897964 0.92140091 1.        ]
 [0.17926129 0.31792475 0.56384816 1.        ]
 [0.43832732 0.57703084 0.75962546 1.        ]
 ...
 [0.13310757 0.26069648 0.51058445 1.        ]
 [0.44515229 0.58300518 0.76354776 1.        ]
 [0.02224487 0.07909577 0.28123971 1.        ]]




In [7]:
error = (np.sum((validation_y-prediction)**2))/len(validation_y)
print(error)

0.0100394505224592


In [8]:
#######DEGREEE 4

from sklearn.linear_model import LinearRegression
x_2 = []
for i in train_x:
    x_2.append(i**2)
x_2 = np.array(x_2)
x_3 = []
for i in train_x:
    x_3.append(i**3)
x_3 = np.array(x_3)
x_4 = []
for i in train_x:
    x_4.append(i**4)
x_4 = np.array(x_4)
ones = []
for i in train_x:
    ones.append(1)
ones = np.array(ones)

X4 = np.matrix([x_4,x_3,x_2,train_x,ones])
X4 = X4.transpose()
reg = LinearRegression().fit(X4, train_y)



In [9]:
xsqr = []
for i in validation_x:
    xsqr.append(i**2)
xsqr = np.array(xsqr)
xcube = []
for i in validation_x:
    xcube.append(i**3)
xcube = np.array(xcube)
ones = []
xfour = []
for i in validation_x:
    xfour.append(i**4)
xfour = np.array(xfour)
for i in validation_x:
    ones.append(1)
ones = np.array(ones)

validationX4 = np.matrix([xfour,xcube,xsqr,validation_x,ones])

validationX4 = validationX4.transpose()

#reg = LinearRegression().fit(validationX, validation_y)

prediction = reg.predict(validationX4)



In [10]:
error = (sum((validation_y-prediction)**2))/len(validation_y)
print(error)

0.01003973512829331


In [11]:
print(0.010144788182358493>0.010144842913519457)

False


In [12]:
#######lasso polynomial regressor of degree 4 with α = 0.01
from sklearn import linear_model
clf = linear_model.Lasso(alpha=0.01)

clf.fit(X4,train_y)
pred = clf.predict(validationX4)




In [13]:
error = (sum((validation_y-pred)**2))/len(validation_y)
print(error)

0.05045280483847748


In [14]:
#######lasso polynomial regressor of degree 3 with α = 0.01
from sklearn import linear_model
clf = linear_model.Lasso(alpha=0.01)

clf.fit(X3,train_y)
pred = clf.predict(validationX3)



In [15]:
error = (sum((validation_y-pred)**2))/len(validation_y)
print(error)


0.08532104473042325


In [16]:
#########ridge polynomial regressor of degree 4 with α = 0.1

from sklearn.linear_model import Ridge
clf = Ridge(alpha=.1)
clf.fit(X4, train_y)
prediction = clf.predict(validationX4)



In [17]:
error = (sum((validation_y-prediction)**2))/len(validation_y)
print(error)

0.011308480513397183


In [18]:
#########ridge polynomial regressor of degree 3 with α = 0.1

clf = Ridge(alpha=.1)
clf.fit(X3, train_y)
prediction = clf.predict(validationX3)



In [19]:
error = (sum((validation_y-prediction)**2))/len(validation_y)
print(error)

0.01009264716627077


In [20]:
from sklearn.utils import shuffle
train_x,train_y=shuffle(train_x,train_y)

lent = len(train_x)
(foldsize) = lent//5

error = 0.0
for i in range(5):
    
    for j in range(foldsize):
        testx = testx.append(train_x[i*foldsize:])
        train_x = [x for x in train_x if x not in testx]
        testy = testy.append(train_y[i*foldsize:])
        train_y = [x for x in train_y if x not in testy]

    
    x_2 = []
    for i in train_x:
        x_2.append(i**2)
    x_2 = np.array(x_2)
    x_3 = []
    for i in train_x:
        x_3.append(i**3)
    x_3 = np.array(x_3)
    ones = []
    for i in train_x:
        ones.append(1)
    ones = np.array(ones)

    X3 = np.matrix([x_3,x_2,train_x,ones])
    X3 = X3.transpose()
    model = LinearRegression().fit(X3, train_y)


    x_2 = []
    for i in testx:
        x_2.append(i**2)
    x_2 = np.array(x_2)
    x_3 = []
    for i in testx:
        x_3.append(i**3)
    x_3 = np.array(x_3)
    ones = []
    for i in testx:
        ones.append(1)
    ones = np.array(ones)

    Xnew = np.matrix([x_3,x_2,testx,ones])
    Xnew = Xnew.transpose()

    pred = model.predict(Xnew)
    err = sum((testy-pred)**2)/len(testy)
    print(err)
    

NameError: ignored

In [None]:


x_2 = []
for i in X_train:
    x_2.append(i**2)
x_2 = np.array(x_2)
x_3 = []
for i in X_train:
    x_3.append(i**3)
x_3 = np.array(x_3)
ones = []
for i in X_train:
    ones.append(1)
ones = np.array(ones)

X3 = np.matrix([x_3,x_2,X_train,ones])
X3 = X3.transpose()
model = LinearRegression().fit(X3, y_train)


x_2 = []
for i in X_test:
    x_2.append(i**2)
x_2 = np.array(x_2)
x_3 = []
for i in X_test:
    x_3.append(i**3)
x_3 = np.array(x_3)
ones = []
for i in X_test:
    ones.append(1)
ones = np.array(ones)

Xnew = np.matrix([x_3,x_2,X_test,ones])
Xnew = Xnew.transpose()

pred = model.predict(Xnew)

In [None]:
err = sum((y_test-pred)**2)/len(y_test)
print(err)