## Supervised learning

**Unsupervised learning is typically used for problems where there isn't one correct answer, but instead, better and worse solutions.**

### Problem statement

Your family has managed Washington State's longest-running elk farm for several generations, but the health of your herd has slowly worsened for decades. It's well known that your farm's breed of elk should not be fed grain when nightly temperatures average above freezing (32°F or 0°C). For that reason, you've always followed your grandfather's farming calendar and switched from grain feed after January 31.

You've recently read about climate change affecting farming practices. Could this explain the poorer health of elk in recent years? With some historical weather data at your side, you seek to determine whether local temperatures have changed from your grandfather's day, and whether your farming calendar needs to be updated.

In [None]:
# importing necessary data
!wget https://raw.githubusercontent.com/MicrosoftDocs/mslearn-introduction-to-machine-learning/main/graphing.py
!wget https://raw.githubusercontent.com/MicrosoftDocs/mslearn-introduction-to-machine-learning/main/m0b_optimizer.py
!wget https://raw.githubusercontent.com/MicrosoftDocs/mslearn-introduction-to-machine-learning/main/Data/seattleWeather_1948-2017.csv

--2023-02-12 16:40:48--  https://raw.githubusercontent.com/MicrosoftDocs/mslearn-introduction-to-machine-learning/main/graphing.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 21511 (21K) [text/plain]
Saving to: ‘graphing.py’


2023-02-12 16:40:48 (11.8 MB/s) - ‘graphing.py’ saved [21511/21511]

--2023-02-12 16:40:48--  https://raw.githubusercontent.com/MicrosoftDocs/mslearn-introduction-to-machine-learning/main/m0b_optimizer.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1287 (1.3K) [text/plain]
Saving to: ‘m0b_optimizer.p

In [None]:
import pandas as pd

In [None]:
dataset  = pd.read_csv('seattleWeather_1948-2017.csv', parse_dates=['date'])

# we need only January temperature
dataset = dataset[[d.month == 1 for d in dataset.date]].copy()

dataset

Unnamed: 0,date,amount_of_precipitation,max_temperature,min_temperature,rain
0,1948-01-01,0.47,51,42,True
1,1948-01-02,0.59,45,36,True
2,1948-01-03,0.42,45,35,True
3,1948-01-04,0.31,45,34,True
4,1948-01-05,0.17,45,32,True
...,...,...,...,...,...
25229,2017-01-27,0.00,54,37,False
25230,2017-01-28,0.00,52,37,False
25231,2017-01-29,0.03,48,37,True
25232,2017-01-30,0.02,45,40,True


In [None]:
import graphing

graphing.scatter_2D(dataset, label_x="date", label_y="min_temperature", title="January Temperatures (°F)")


In [None]:
dataset.head()

Unnamed: 0,date,amount_of_precipitation,max_temperature,min_temperature,rain
0,1948-01-01,0.47,51,42,True
1,1948-01-02,0.59,45,36,True
2,1948-01-03,0.42,45,35,True
3,1948-01-04,0.31,45,34,True
4,1948-01-05,0.17,45,32,True


In [None]:
import numpy as np

# Offset date into number of years since 1982
dataset["years_since_1982"] = [(d.year + d.timetuple().tm_yday / 365.25) - 1982 for d in dataset.date]

# Scale and offset temperature so that it has a smaller range of values
dataset["normalised_temperature"] = (dataset["min_temperature"] - np.mean(dataset["min_temperature"])) / np.std(dataset["min_temperature"])

# Graph
graphing.scatter_2D(dataset, label_x="years_since_1982", label_y="normalised_temperature", title="January Temperatures (Normalised)")


In [None]:
class MyModel:

  def __init__(self) ->None:
    # straight lines are described by two parameters.
    self.slope = 0
    self.intercept = 0

  def predict(self, data):

    return data * self.slope + self.intercept


model = MyModel()


In [None]:
print(f"Model parameters before training: {model.intercept}, {model.slope}")

# Look at how well the model does before training
print("Model visualised before training:")
graphing.scatter_2D(dataset, "years_since_1982", "normalised_temperature", trendline=model.predict)   

Model parameters before training: 0, 0
Model visualised before training:


In [None]:
def cost_function(actual_temperatures, estimated_temperatures):


    # Calculate the difference between actual temperatures and those
    # estimated by the model
    difference = estimated_temperatures - actual_temperatures

    # Convert to a single number that tells us how well the model did
    # (smaller numbers are better)
    cost = sum(difference ** 2)

    return difference, cost

 ### We are using the pre-built optimizer from microsoft here all it does is guess new values for parameter of the model

In [None]:
from m0b_optimizer import MyOptimizer

optimizer = MyOptimizer()

In [None]:
def train_one_iteration(model_inputs, true_temperatures, last_cost:float):
    '''
    Runs a single iteration of training.


    model_inputs: One or more dates to provide the model (dates)
    true_temperatues: Corresponding temperatures known to occur on those dates

    Returns:
        A Boolean, as to whether training should continue
        The cost calculated (small numbers are better)
    '''

    # === USE THE MODEL ===
    # Estimate temperatures for all data that we have
    estimated_temperatures = model.predict(model_inputs)

    # === OBJECTIVE FUNCTION ===
    # Calculate how well the model is working
    # Smaller numbers are better 
    difference, cost = cost_function(true_temperatures, estimated_temperatures)

    # Decide whether to keep training
    # We'll stop if the training is no longer improving the model effectively
    if cost >= last_cost:
        # Stop training
        return False, cost
    else:
        # === OPTIMIZER ===
        # Calculate updates to parameters
        intercept_update, slope_update = optimizer.get_parameter_updates(model_inputs, cost, difference)

        # Change the model parameters
        model.slope += slope_update
        model.intercept += intercept_update

        return True, cost


In [None]:
# running the iterations manually
# run this code a few times so that we can see how the parameter changes

import math

print(f"Model parameters before training:\t\t{model.intercept:.8f},\t{model.slope:.8f}")

continue_loop, cost = train_one_iteration(model_inputs = dataset["years_since_1982"],
                                                    true_temperatures = dataset["normalised_temperature"],
                                                    last_cost = math.inf)

print(f"Model parameters after 1 iteration of training:\t{model.intercept:.8f},\t{model.slope:.8f}")


Model parameters before training:		-0.00013035,	0.01192479
Model parameters after 1 iteration of training:	-0.00014348,	0.01192481


### It would take 1000's of iteration to get good parameters.

In [None]:
print("Training beginning...")

last_cost = math.inf
i = 0
continue_loop = True
while continue_loop:

    # Run one iteration of training
    # This will tell us whether to stop training, and also what
    # the cost was for this iteration
    continue_loop, last_cost = train_one_iteration(model_inputs = dataset["years_since_1982"],
                                                    true_temperatures = dataset["normalised_temperature"],
                                                    last_cost = last_cost)
   
    # Print the status
    if i % 400 == 0:
        print("Iteration:", i)

    i += 1

    
print("Training complete!")
print(f"Model parameters after training:\t{model.intercept:.8f},\t{model.slope:.8f}")
graphing.scatter_2D(dataset, "years_since_1982", "normalised_temperature", trendline=model.predict)    

Training beginning...
Iteration: 0
Training complete!
Model parameters after training:	-0.00648859,	0.01193327


## Supervised learning using another cost function

In [None]:
!wget https://raw.githubusercontent.com/MicrosoftDocs/mslearn-introduction-to-machine-learning/main/microsoft_custom_linear_regressor.py

--2023-02-12 17:43:08--  https://raw.githubusercontent.com/MicrosoftDocs/mslearn-introduction-to-machine-learning/main/microsoft_custom_linear_regressor.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2167 (2.1K) [text/plain]
Saving to: ‘microsoft_custom_linear_regressor.py’


2023-02-12 17:43:08 (23.6 MB/s) - ‘microsoft_custom_linear_regressor.py’ saved [2167/2167]



In [None]:
from datetime import datetime

# Load a file that contains our weather data
dataset = pd.read_csv('seattleWeather_1948-2017.csv', parse_dates=['date'])

# Convert the dates into numbers so we can use them in our models
# We make a year column that can contain fractions. For example,
# 1948.5 is halfway through the year 1948
dataset["year"] = [(d.year + d.timetuple().tm_yday / 365.25) for d in dataset.date]


# For the sake of this exercise, let's look at February 1 for the following years:
desired_dates = [
    datetime(1950,2,1),
    datetime(1960,2,1),
    datetime(1970,2,1),
    datetime(1980,2,1),
    datetime(1990,2,1),
    datetime(2000,2,1),
    datetime(2010,2,1),
    datetime(2017,2,1),
]

dataset = dataset[dataset.date.isin(desired_dates)].copy()

# Print the dataset
dataset

Unnamed: 0,date,amount_of_precipitation,max_temperature,min_temperature,rain,year
762,1950-02-01,0.0,27,1,False,1950.087611
4414,1960-02-01,0.15,52,44,True,1960.087611
8067,1970-02-01,0.0,50,42,False,1970.087611
11719,1980-02-01,0.37,54,36,True,1980.087611
15372,1990-02-01,0.08,45,37,True,1990.087611
19024,2000-02-01,1.34,49,41,True,2000.087611
22677,2010-02-01,0.08,49,40,True,2010.087611
25234,2017-02-01,0.0,43,29,False,2017.087611


### Comparing two cost functions

- SSD squares that difference and sums the result.

- SAD converts differences into absolute differences and then sums them.

In [None]:
import numpy as np

def sum_of_square_differences(estimate, actual):
    # With NumPy, to square each value we use **
    return np.sum((estimate - actual)**2)

def sum_of_absolute_differences(estimate, actual):
    return np.sum(np.abs(estimate - actual))

In [None]:
actual_label = np.array([1, 3])
model_estimate = np.array([2, 2])

print("SSD:", sum_of_square_differences(model_estimate, actual_label))
print("SAD:", sum_of_absolute_differences(model_estimate, actual_label))

SSD: 2
SAD: 2


In [None]:
actual_label = np.array([1, 3])
model_estimate = np.array([1, 1])

print("SSD:", sum_of_square_differences(model_estimate, actual_label))
print("SAD:", sum_of_absolute_differences(model_estimate, actual_label))

SSD: 4
SAD: 2


**When we use SSD, we encourage models to be both accurate and consistent in their accuracy.**

### Using the custom linear regression code written by microsoft 

In [None]:
from microsoft_custom_linear_regressor import MicrosoftCustomLinearRegressor
import graphing

# Create and fit the model
# We use a custom object that we've hidden from this notebook, because
# you don't need to understand its details. This fits a linear model
# by using a provided cost function

# Fit a model by using sum of square differences
model = MicrosoftCustomLinearRegressor().fit(X = dataset.year, 
                                             y = dataset.min_temperature, 
                                             cost_function = sum_of_square_differences)

# Graph the model
graphing.scatter_2D(dataset, 
                    label_x="year", 
                    label_y="min_temperature", 
                    trendline=model.predict)