# ML101 Lab: ML vs traditional software

We have been talking about the difference between ML (sometimes refered to as Software 2.0) and traditional software development. Here is a simplified example to drive home the difference in procress between traditional software development and ML.

We will use a use case of temperatures in Celsuis and in Fahrenheit.

## Traditional Software vs ML:

We mentioned that in traditional software the algorithm or Formula is explicitly coded, whereas in ML, the pattern/algorithm/formula, which is a mapping of the input to the desired output, is learnt by the model, and NOT explicitly coded up.

Let's look at a simple example of converting temperatures from Celsuis to Fahrenheit.

## Traditional Software:

We know that there is an equation that can help us convert Temperatures from C to F which is defined as:   
**Temperature_F= (1.8 x Temperature_C) + 32**.  
So we can simply code that up and deploy our system

In [None]:
def temp_c2f(temp_c):
  return round((1.8 * temp_c) + 32, 2)

# ask user for an input temperature in Celsius
temp_c = float(input("Enter the Temperature in Celsius :\n"))

# convert temperature from Celsius to Fahrenheit
temp_f = temp_c2f(temp_c)

# return the Fahrenheit temperature to the user:
print("Temperature in Fahrenheit :", temp_f, "F")

Once you develop your algorithm and test it, you deploy it and monitor it. Every time there is a change in the algorithm, you would edit your code and redeploy.

## Machine Learning system

Let's imagine we do not have the formula for the Celsuis to Fahrenheit conversion. What we do have is **Data**. In your company, you find an excel sheet, with a column that is labeled Temperature_C and another that is labeled Temperature_F. These were collected from thermometers in the field. However, the Fahrenheit thermometer broke recently and it will cost too much to fix due to material shortage but some of your processes need the temperature in Fahrenheit to function properly. Your system is still collecting temperatures in Celsius, so you wonder if you can derive or **predict** the Temperature in Fahrenheit from the Temperature in Celsius since it will be a hassle to fix the Fahrenheit data collectors.

In [None]:
#download data_file:

!wget -O temperatures.csv https://drive.usercontent.google.com/uc?id=1bcl86iqr3XxZ-2pdp_UOoV6_hsxvYM3_&authuser=0&export=download

In [None]:
import pandas as pd

data_file = "temperatures.csv"
data = pd.read_csv(data_file, header=None)
data.columns = ["temp_c", "temp_f"]


In [None]:
# explore the dataset:
print("File contains: ", data.shape)
data.describe()

So the file we found contains 2 columns and 300 data points, with the data metrics shown above.
Since there are only 2 columns, we can plot them to see if there is any relationship between the 2 variables. **Remember** that this is a simple case and most ML problem have multiple features so visualizations become difficult.

In [None]:
data.plot(kind="scatter", x="temp_c", y="temp_f");

It looks like there is a linear relationship between temperature in Celsuis and temperture in Fahrenheit. So we can try to extract that relationship and decide not to buy Fahrenheit thermometers anymore.

In [None]:
# Model the relationship between temperature in C (labeled x) and temperature in F (labeled y)

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

X = np.array(data["temp_c"]).reshape((-1, 1))
y = np.array(data["temp_f"])

#1- Choose a model: In this case, we will go with Linear regression since temperature is a continuous variable
# and regression is the simplest model possible:
model = LinearRegression()

#2- split data into test and train to later evaluate how well the model does on unseen data:
# We will use 70% of the data for training the model,
# and hide 30% of the data to test how well the model generalizes:

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size =0.3)

#3- train the model on the training data
model.fit(x_train, y_train)

# print model R2 to see how well the model did on the training set:
r2_train = model.score(x_train, y_train)
print("Score on Training set: ", r2_train)

#4- evaluate the model on unseen data
r2_test = model.score(x_test, y_test)
print("Score on Test set: ", r2_test)


Since the model scored on the test set (unseen data) very closely to how it scored on the training set, then the model has managed to pick up a pattern that generalizes well on the current dataset. If there is no pattern OR the model is not good enough to pick up the pattern (bias) then we would see a huge underperformance on the unseen data.

In [None]:
# Model summary:

print("coefficient of determination:", r2_test)
print("intercept:", model.intercept_)
print("slope:", model.coef_)

In [None]:
print("This ML system thinks that:\nTemperature in F = {} x Temperature in C + {}".format(round(model.coef_[0],2), round(model.intercept_),2))

This is fairly close to the known formula of:  Temperature_F= (1.8 x Temperature_C) + 32.   

Of course, the data generated had a certain level of noise (+/- 30) and the dataset is not large. But this is to illustrate an example of how traditional software is different from ML in process.

In [None]:
# Use the model to predict a response vs the traditional system:


temp_c = float(input("Enter the Temperature in Celsius :\n"))

# Traditional system response:
print("Software: Temperature in Fahrenheit :", temp_c2f(temp_c), "F")

# ML model response:
print("ML Temperature in Fahrenheit :", round(model.predict(np.array(temp_c).reshape(1, -1))[0],2), "F")


Now that you have a model that does fairly well, you can take that model and deploy it. Ideally the model performance would be monitored to detect when the underlying patterns change and the model needs to be retrained.

### Note:
Keep in mind that this is a very simple scenario to illustrate the difference in *coded formula/algorithm/pattern* vs *learnt formula/algorithm/pattern*. The data is small and simple and has no issues. It was not processed in any way, which is not the norm.
As I mentioned, there is no need for ML in simple systems. ML shines when there is complexity. Hopefully this gives you an idea of how ML works: **we are trying to define a mapping between input/features and a desired outcome**.