## Import required packages

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import linear_model

##  Load the data

In [None]:
data = pd.read_csv("AIML_DS_REGR01_SIMPLEPENDULUMOSCILLATIONDATA.txt", sep=" ", header=None, names=['l', 't'])

To understand the design of the data, let us print first five rows and last five rows from the dataset

In [None]:
# First five rows from the dataset
data.head()

In [None]:
# Last five rows from the dataset
data.tail()

In [None]:
data.shape

Get the values of the data for the visualization 

In [None]:
# Store the l and t column values in two variables
l = data['l'].values
t = data['t'].values

In [None]:
l.shape, t.shape

In [None]:
# Plot l vs t
plt.figure(figsize=(12,10))
plt.plot(l, t)
plt.show()

The above graph does not look like a straightline which was obtained by connecting the points in the order of their occurrence.; it may be easier to see if we only plot the points.

In [None]:
plt.figure(figsize=(12, 10))
plt.plot(l, t, '.', color = 'black' )
plt.show()

The above graph is still not a straightline. As we know from the domain information that  $l∝t^2$.  

Let us  plot  $ l $ vs $ t^2 $   instead of $l$ vs $t$:

In [None]:
tsq = t * t

In [None]:
plt.figure(figsize=(16,10))
plt.plot(l, tsq, '.', color='black')
plt.show()

**Note**: Our dataset is of shape (90, 2), where feature shape is (90,) and labels shape is (90,). With this when we train, the values will be considered as a single set of features. Hence, to overcome this, we are reshaping the data into 2 dimensions (90,1) where each value is taken as a single feature.

The label will always be a single value, so we don't need to reshape it. But in further experiments where we perform train_test_split we should maintain the same no.of samples for both features and labels.

Hence reshaping $l$ and $t$ to a 2-D array.

In this notebook, you could try with and without a reshaping label.

To know more about reshaping, you can refer to the below link :
https://numpy.org/doc/stable/reference/generated/numpy.reshape.html

In [None]:
l.shape, t.shape

In [None]:
length = l.reshape(-1, 1) 
length.shape

In [None]:
tsq1 = tsq.reshape(-1, 1)
tsq1.shape

In [None]:
# Create Linear Regression object
regr = linear_model.LinearRegression()

# Train the model using the training set
regr.fit(length, tsq1)

pred_tsq1 = regr.predict(length)

In [None]:
# Plot a scatter representing l vs tsq
plt.scatter(length, tsq1,  color='black')

# Plot the line predicted using linear regression model
plt.plot(length, pred_tsq1, color='blue', linewidth=3)

plt.show()