# Polynomial Regression and Regularization

In [1]:
import turicreate as tc

In [2]:
# 1) The table below is a dataset to be used for polynomial regression. 
# There are two features 𝑥1 and 𝑥2, and the label is not shown here.
x = {
    'x1': [1, -1, 2, -2, 2],
    'x2': [10, -10, -10, 10, 1]
}
# a) Hardcode the data into an SFrame and display it.
data = tc.SFrame(x)
print(data)

+----+-----+
| x1 |  x2 |
+----+-----+
| 1  |  10 |
| -1 | -10 |
| 2  | -10 |
| -2 |  10 |
| 2  |  1  |
+----+-----+
[5 rows x 2 columns]



Write the most general 4th order polynomial that can be created from these two features:

**4th Order Polynomial:**

a*x1^4 + b*x1^3*x2 + c*x1^2*x2^2 + d*x1*x2^3 + e*x2^4 + f*x1^2 + g*x2^2 + h*x1 + i*x2 + j

In [3]:
# c) Add columns to the datarame to hold all 
# combinations of the features up to the 4th order and display it.
data['x1^2'] = data['x1'] ** 2
data['x1^3'] = data['x1'] ** 3
data['x1^4'] = data['x1'] ** 4
data['x2^2'] = data['x2'] ** 2
data['x2^3'] = data['x2'] ** 3
data['x2^4'] = data['x2'] ** 4
data['x1*x2'] = data['x1'] * data['x2']
data['x1^2*x2'] = data['x1'] ** 2 * data['x2']
data['x1*x2^2'] = data['x1'] * data['x2'] ** 2
data['x1^3*x2'] = data['x1'] ** 3 * data['x2']
data['x1^2*x2^2'] = data['x1'] ** 2 * data['x2'] ** 2
data['x1*x2^3'] = data['x1'] * data['x2'] ** 3

print(data)

+----+-----+------+------+------+-------+---------+---------+-------+---------+
| x1 |  x2 | x1^2 | x1^3 | x1^4 |  x2^2 |   x2^3  |   x2^4  | x1*x2 | x1^2*x2 |
+----+-----+------+------+------+-------+---------+---------+-------+---------+
| 1  |  10 | 1.0  | 1.0  | 1.0  | 100.0 |  1000.0 | 10000.0 |   10  |   10.0  |
| -1 | -10 | 1.0  | -1.0 | 1.0  | 100.0 | -1000.0 | 10000.0 |   10  |  -10.0  |
| 2  | -10 | 4.0  | 8.0  | 16.0 | 100.0 | -1000.0 | 10000.0 |  -20  |  -40.0  |
| -2 |  10 | 4.0  | -8.0 | 16.0 | 100.0 |  1000.0 | 10000.0 |  -20  |   40.0  |
| 2  |  1  | 4.0  | 8.0  | 16.0 |  1.0  |   1.0   |   1.0   |   2   |   4.0   |
+----+-----+------+------+------+-------+---------+---------+-------+---------+
+---------+---------+-----------+---------+
| x1*x2^2 | x1^3*x2 | x1^2*x2^2 | x1*x2^3 |
+---------+---------+-----------+---------+
|  100.0  |   10.0  |   100.0   |  1000.0 |
|  -100.0 |   10.0  |   100.0   |  1000.0 |
|  200.0  |  -80.0  |   400.0   | -2000.0 |
|  -200.0 |  -80

In [4]:
# 2) Consider the following table showing a feature 𝑥 and label 𝑦 of five samples.
xy = {
    'x': [1, 2, 3, 4, 5],
    'y': [2, 2.5, 6, 14.5, 34]
}
# a) First, hardcode this data into an SFrame.
data2 = tc.SFrame(xy)
print(data2)

# Now, consider that you have trained a 2nd order polynomial regression model: 𝑦̂=2𝑥2−5𝑥+4.
def predict(x):
    return 2 * x**2 - 5 * x + 4

# b) Add a column to the SFrame with the predictions
data2['predictions'] = data2['x'].apply(predict)
print(data2)

# c) Calculate and display the mean absolute error (MAE) using code
mean_abs_error = sum( abs(data2['y']-data2['predictions']) ) / len(data2)
print("mean absolute error =", mean_abs_error)


+---+------+
| x |  y   |
+---+------+
| 1 | 2.0  |
| 2 | 2.5  |
| 3 | 6.0  |
| 4 | 14.5 |
| 5 | 34.0 |
+---+------+
[5 rows x 2 columns]

+---+------+-------------+
| x |  y   | predictions |
+---+------+-------------+
| 1 | 2.0  |      1      |
| 2 | 2.5  |      2      |
| 3 | 6.0  |      7      |
| 4 | 14.5 |      16     |
| 5 | 34.0 |      29     |
+---+------+-------------+
[5 rows x 3 columns]

mean absolute error = 1.8


In [5]:
# The regularization parameter is 𝜆 = 0.1; write code to calculate and display the following:
# d) The total lasso regression error of our model (using the L1 norm and MAE)

𝜆 = 0.1
coefficients = [2, -5, 4]

l1 = sum(abs(num) for num in coefficients)
lasso_regression_error = mean_abs_error + 𝜆 * l1
print("Total Lasso Regression Error: ", lasso_regression_error)

Total Lasso Regression Error:  2.9000000000000004


In [6]:
# e) The total ridge regression error of our model (using the L2 norm and MAE)
l2 = sum(pow(num, 2) for num in coefficients)
ridge_regression_error = mean_abs_error + 𝜆 * l2
print("Total Ridge Regression Error: ", ridge_regression_error)

Total Ridge Regression Error:  6.3
