# Polynomial Regression

In this exercise you need to use polynomial regression to estimate the height reached by a ball thrown into air. The motion of the ball is controlled by the motion equation with uniform acceleration (in our case given by the gravity) that is a quadratic model. You need to estimate the initial height of the ball (h), the initial speed at which it was launched (v) and the gravity acceleration (g). The equation of the motion is : $y = h + vt + \frac{1}{2} gt^2$ .
In the height.csv file you can find the measured height values (subject to noise) while in time.csv file you can find the corresponding time instants.

In [1]:
#import the required packages
%matplotlib nbagg
import matplotlib.pyplot as plt
import csv
from scipy import stats
import numpy as np
import sklearn as sl
from sklearn import linear_model

In [2]:
# load the data from the time.csv (features) and height.csv (measured values) files
with open('data/time.csv', 'r') as f:
    reader = csv.reader(f, delimiter=',')
    # get all the rows as a list
    x = list(reader)
    # transform x into numpy array
    x = np.array(x).astype(float)
    
with open('data/height.csv', 'r') as f2:
    reader2 = csv.reader(f2, delimiter=',')
    # get all the rows as a list
    y = list(reader2)
    # transform data into numpy array
    y = np.array(y).astype(float)
    
print(x.shape)
print(y.shape)

(201, 1)
(201, 1)


In [3]:
# try to perform a linear interpolation (it does not work properly, why ?)
# you can use stats.linregress or linear_model.LinearRegression

reg_lin  = sl.linear_model.LinearRegression()
reg_lin.fit(x,y)

print('slope (sl.linear_model): ', float(reg_lin.coef_),'  intercept (sl.linear_model):', float(reg_lin.intercept_));
print('correlation coefficient:', np.sqrt(reg_lin.score(x.reshape(-1,1), y.reshape(-1,1))))

slope (sl.linear_model):  0.2242393093296862   intercept (sl.linear_model): 4.215375108703984
correlation coefficient: 0.08764757852933465


**Answer**
It does not work properly (low correlation coefficient) because we are using a linear model to fit a quadratic phenomenon.

In [4]:
# use polynomial regression (the feature vectors have three components:
# they contain all 1s (for bias), the input data $x$ and their squared values $x^2$
# for the regression you can use linear_model.LinearRegression

dataX = np.zeros([201,3])
h = 0 #need to compute!
v = 0 #need to compute!
g = 0 #need to compute!
sc = 0 #need to compute!

dataX[:,0] = 1 # bias
dataX[:,1] = x.ravel() # inpuut data
dataX[:,2] = x.ravel()**2

reg = sl.linear_model.LinearRegression()
reg.fit(dataX,y)
#print('slope (sl.linear_model): ', reg.coef_,'  intercept (sl.linear_model):', float(reg.intercept_))

h = float(reg.intercept_)
v = float(reg.coef_[:,1])
g = float(reg.coef_[:,2]) # actually is 1/2 of g
sc = reg.score(dataX,y)

print('initial position: ', h,'  initial speed:', v, ' gravity acceleration:', g )
# reg.score produces the square of the correlation coefficient
print('correlation coefficient:', np.sqrt(sc))

initial position:  0.9649949791503953   initial speed: 10.024380403461107  gravity acceleration: -4.900070547065708
correlation coefficient: 0.9977953125219581


# Question

Explain what do you conclude looking at the linear and polynomial fitting.

In [5]:
# plot the input data and the estimated models

plt.figure()
plt.plot(x, y, 'o', label='Original data')
plt.plot(x, float(reg_lin.intercept_) + float(reg_lin.coef_)*x, 'g-.', label='Fitted line linear model')
plt.plot(x, h + v*x + g*(x**2), 'r-', label='Fitted line quadratic model')
plt.legend()
plt.xlabel('Position x')
plt.ylabel('Time y')
plt.show()


<IPython.core.display.Javascript object>

**Answer**
As one can easily expect, the linear fitting is not suited for this dataset, it underfits the phenomenon and this will lead to a grater true error. On the contrary, the polynomial (parabolic) regression is the best hypothesis for these data since we know they follow the (quadratic) equation of motion.  