# what is machine learning?

- linear regression example
    - plot the data
    - simple regression
 
- training, loss functions, learning rate, batch size, etc.
- logistic regression example
    - difference between classification and regression


In [None]:
%matplotlib inline
from sklearn import linear_model
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Next, let's read in the first dataset that we will use for the exercise: {explanation}.

In [None]:
data = pd.read_csv('data/armaghdata.csv')

In [None]:
data.plot('sun', 'tmax')

## linear regression

Remember that a linear model with a single variable has the form:

$$ y = \beta + \alpha x $$

where $\beta$ is the *intercept* of the line and $\alpha$ is the *slope* of the line. 

The terminology often used in machine learning is a little bit different. The equation for a linear model is often written as:

$$ \hat{y} = b + wx $$

where $\hat{y}$ is the predicted value/label, $b$ is the *bias*, and $w$ is the *weight* of the feature $x$. Extending this to multiple features (variables), the form looks like:

$$ \hat{y} = b + w_1 x_1 + w_2 x_2 + \cdots + w_n x_n = b + \sum_i w_i x_i $$

Where each feature $x_i$ has a corresponding weight $w_i$. 

For this first example, we will look at a model with a single feature (variable) - the relationship between {X and Y}.

To begin, we first have to create a **LinearRegression** object:

In [None]:
# instantiate a LinearRegression object
model = linear_model.LinearRegression()

In `scikit-learn`, the 

In [None]:
# train the model on our input data
data = data.dropna(subset=['sun', 'tmax'], how='any')

sun = data.sun.values.reshape(-1, 1)
tmax = data.tmax.values.reshape(-1, 1)

model.fit(sun, tmax)

In [None]:
model.coef_, model.intercept_

In [None]:
xx = np.arange(0, 301, 50)

fig, ax = plt.subplots(1, 1)

ax.plot(sun, tmax, 'k.', label='data')
ax.plot(xx, model.predict(xx.reshape(-1, 1)), 'r--', label='linear fit')

ax.legend()

ax.set_xlabel('hours of sun')
ax.set_ylabel('monthly maximum temperature (degC)')

## loss functions

$$ MSE = \frac{1}{N} \sum_i (y_i - \hat{y}_i)^2 $$

In [None]:
# use the fitted parameters to get the predicted values at the input x data
predicted = model.predict(sun)

Now, let's plot the value of the loss for each input feature value, as a function of the predicted value.

In [None]:
# calculate loss (residuals)
loss = tmax - predicted

fig, ax = plt.subplots(1, 1) # create a new figure and axis

ax.axhline(xmin=predicted.min(), xmax=predicted.max(), color='k', linestyle='--') # plot a horizontal line at loss = 0
ax.plot(predicted, loss, 'o') # plot the loss as a function of the predicted value

ax.set_xlabel('predicted value')
ax.set_ylabel('loss')

# gradient descent and learning


$$ \frac{\partial l}{\partial w} = \frac{1}{N} \sum_i -2x_i (y_i - (wx_i + b)) $$

$$ \frac{\partial l}{\partial b} = \frac{1}{N} \sum_i -2(y_i - (wx_i + b)) $$

In [None]:
def update_parameters(xdata, ydata, w, b, learning_rate):
    dl_dw = (-2 * xdata * (ydata - (w * xdata + b))).mean() # calculate the partial derivative of l wrt w
    dl_db = (-2 * (ydata - (w * xdata + b))).mean() # calculate the partial derivative of l wrt b

    w -= dl_dw * learning_rate # subtract dl/dw * learning_rate from w
    b -= dl_db * learning_rate # subtract dl/db * learning_rate from b

    return w, b # return the updated values of w and b


def avg_loss(xdata, ydata, w, b):
    loss = (ydata - (w * xdata + b))**2 
    return loss.mean()    


def train(xdata, ydata, w, b, learning_rate, epochs, verbose=True, plot=True):
    df = pd.DataFrame()
    
    for ee in range(epochs):
        w, b = update_parameters(xdata, ydata, w, b, learning_rate)

        if ee % 10 == 0:
            # if verbose:
            #    print(f"epoch: {ee}, loss: {avg_loss(xdata, ydata, w, b):.2f}")
            df.loc[ee, 'weight'] = w
            df.loc[ee, 'bias'] = b
            df.loc[ee, 'avg_loss'] = avg_loss(xdata, ydata, w, b)
 
    df.loc[ee, 'weight'] = w
    df.loc[ee, 'bias'] = b
    df.loc[ee, 'avg_loss'] = avg_loss(xdata, ydata, w, b)

    if plot:
        ax = df.reset_index(names='epoch').plot('epoch', 'avg_loss', legend=False)
        ax.set_ylabel('average loss')
        
    
    return df.reset_index(names=['epoch'])

In [None]:
# train the model
test = train(sun, tmax, w=0, b=0, learning_rate=1e-7, epochs=1000)

test.tail(n=1)

In [None]:
test = train(sun, tmax, w=0, b=0, learning_rate=1e-6, epochs=10000)

test.tail(n=1)

In [None]:
test = train(sun, tmax, w=0, b=0, learning_rate=1e-5, epochs=10000)

test.tail(n=1)

In [None]:
test = train(sun, tmax, w=0, b=0, learning_rate=1e-4, epochs=100, plot=True)