### CSV Course Specifications

This is a demonstration of the course specifications linear regression algorithm using a CSV for input. This allows students to use larger data sets.

This can also be viewed as a Python script in [5.export_import.py](5.export_import.py.) and [5.test_import.py](5.test_import.py).

#### Step 1

Load the required dependencies including [pickle](https://docs.python.org/3/library/pickle.html) the native Python library for serialising data objects:

> [!Caution]
> The pickle module is not secure. Only unpickle data you trust.

In [4]:
# Import frameworks
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('../../style_Matplotlib_charts.mplstyle')

from sklearn.linear_model import LinearRegression
import pickle

####  Step 2
- Open and parse the CSV file and store the data as variable array objects.

In [5]:
training_data = pd.read_csv('course_specifications_data.csv', delimiter=',')
x = np.array(training_data.iloc[:,1]).reshape(-1, 1)
y = np.array(training_data.iloc[:,0])

FileNotFoundError: [Errno 2] No such file or directory: 'course_specifications_data.csv'

#### Optional Step

Apply your Python skills and query the data set about how many training examples there are and inspect a sample of the data.

In [None]:
m = len(x)
print(f"Number of training examples is: {m}")
table = pd.DataFrame({
    training_data.columns[0]: x.flatten(),  # Flatten x for easy display
    training_data.columns[1]: y
})
print(table.head())

#### Optional Step

Plot the feature `x` and target `y` data on a graph using the column headings as the graph labels.

In [None]:
# Plot the data points
plt.scatter(x, y, marker='x', c='r')
# Set the title
plt.title("NESA Course Specifications Data")
# Set the y-axis label
plt.ylabel(f'Training {training_data.columns[0]}')
# Set the x-axis label
plt.xlabel(f'Training {training_data.columns[1]}')
plt.show()

#### Step 3

Use the [sklearn](https://scikit-learn.org/stable/) library to fit the model to the training data which will provide a line of best fit with the lowest cost based on a [sklearn](https://scikit-learn.org/stable/) algorithm.

In [None]:
# Create the model
my_model = LinearRegression()
# Fit the model to the data
my_model.fit(x, y)

#### Step 4

Save the model to file.

In [None]:
# save the model to disk
filename = 'my_saved_model.sav'
pickle.dump(my_model, open(filename, 'wb'))

#### Step 5
In a separate Python implementation import the file and make a prediction.

In [None]:
filename = 'my_saved_model.sav'
loaded_model = pickle.load(open(filename, 'rb'))
predict = np.array([4]).reshape(1, -1)
result = loaded_model.predict(predict)
print(result[0])

#### Optional Step

Plot the features, targets and model (linear regression).

In [None]:
y_pred = my_model.predict(x)
plt.plot(x, y_pred)
plt.scatter(x, y, marker='x', c='r')
plt.title("NESA Course Specifications Data")
plt.ylabel(f'Training {training_data.columns[0]}')
plt.xlabel(f'Training {training_data.columns[1]}')
plt.show()

#### Optional Step

Use the model for predictions and plot them on the visualisation that is saved as file `graph.png` for use in a Python Flask UI API endpoint.

In [None]:
predict = np.array([4]).reshape(1, -1)
y_prediction = my_model.predict(predict)

y_pred = my_model.predict(x)
plt.plot(x, y_pred)
plt.scatter(x, y, marker='x', c='r')
plt.scatter(predict, y_prediction, marker='D', c='r', zorder=10, s=100)
plt.text(y_prediction, predict, f"Target {y_prediction[0]} is prediction from {predict[0,0]} input")
plt.title("NESA Course Specifications Data")
plt.ylabel(f'Training {training_data.columns[0]}')
plt.xlabel(f'Training {training_data.columns[1]}')
plt.savefig('graph.png')   # save the figure to file
plt.close()    # close the figure window
