# Linear and logistic regression
This week we will look at how linear regression may be used as a binary classifier. We will then consider the logistic regression classifier and compare the two.

Then we will familiarize ourselves with  scikit-learn and make some comparisons to what we have done so far.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import sklearn

## Data
We will use simple synthetic data similarly to week_05, but we will make a little bigger sets to get more reliable results.

In [None]:
from sklearn.datasets import make_blobs
X_train, y_train = make_blobs(n_samples=500, centers=[[0,0],[1,2]], 
                  n_features=2, random_state=2019)
X_test, y_test = make_blobs(n_samples=500, centers=[[0,0],[1,2]], 
                  n_features=2, random_state=2020)

In [None]:
def show(X, y, marker='.'):
    labels = set(y)
    for lab in labels:
        plt.plot(X[y == lab][:, 1], X[y == lab][:, 0],
                 marker, label="class {}".format(lab))
    plt.legend()

In [None]:
show(X_train, y_train)

## Linear regression classifier
This is also called Ridge regression in the literature when it is smoothed. We will consider the simple unsmoothed version here and return to smoothing and regularization at a later lecture.

We saw last week how we could use gradient descent to implement a simple linear regressor, which we repeat here:

In [None]:
class NumpyLinReg():

    def fit(self, X_train, t_train, gamma = 0.1, epochs=10):
        """X_train is a Nxm matrix, N data points, m features
        t_train are the targets values for training data"""
        
        (k, m) = X_train.shape
        X_train = add_bias(X_train)
        
        self.theta = theta = np.zeros(m+1)
        
        for e in range(epochs):
            theta -= gamma / k *  X_train.T @ (X_train @ theta - t_train)      
    
    def predict(self, x):
        z = add_bias(x)
        score = z @ self.theta
        return score

### Code
Make a linear regression classifier.

In [None]:
# Your code goes here

### Experiment
Train the classifier on X_train, y_train and calculate the accuracy on X_test, y_test.

In [None]:
# Your code goes here

## Logistic regression

### Logistic function
First write code for the logistic function (sometimes called the sigmoid).

In [None]:
def logistic(x):
    pass # fill in the rest

### Code for the classifier
Write code for the logistic regression classifier. Compared to the linear regression classifier, you have to make adaptations to both fit and predict taking the logistic into consideration.

In [None]:
# Your classifier goes here

### First experiment
Train the classifier on X_train, y_train, and calculate the accuracy on X_test, y_test.

In [None]:
# Your code goes here

### Parameters
Did you get better results than with the linear regression classifier? That does not necessarily have to be the case for this data set. But, if your result is much inferior to the linear regression classifier, the reason might be the parameter settings. Experiment with the parameter values for the learning rate and the number of epochs to get an optimal result.

## Confusion matrix
Implement a procedure for calculating a confusion matrix for a classifier and try it on one of the runs above.

In [None]:
# Your code goes here

## scikit-learn
In this course, we implement many machine learning (ML) algorithms ourselves, to get a better understanding of how the ML algorithms work. After completing the course, you will have a stronger background for selecting tools and interpreting the results of your experiments.

After the course, when you come to work on real ML tasks, you will normally not implement everything yourself from scratch, but instead rely on some toolboxes with ML algorithms. scikit-learn is one such toolbox. It contains all the algorithms we have considered so far. It also contains many other tools, like scalers and tools for preparing data. We have already made use of some of these tools, e.g. for making synthetic data sets.

You can find a first introduction to scikit-learn here: https://scikit-learn.org/stable/getting_started.html, and you are advised to work through it. Let us now consider how learners are represented in scikit-learn. We first import the linear regressions classifier, caled ridge Regression.

In [None]:
from sklearn.linear_model import RidgeClassifier
ridge_cl = RidgeClassifier()
ridge_cl.fit(X_train, y_train)

You may calculate the accuracy using the score-method.

In [None]:
ridge_cl.score(X_test, y_test)

### LogisticRegression
Find out how to import logistic regression from scikit-learn and train and test it on the same data.

In [None]:
# Your code goes here

How do the results compare to the Ridge-classifier and to your own solution?

### More classifiers
Find out how you can import a *k*-nearest-neighbor classifier from sklearn, and train and test it on the same data for various classifiers.

In [None]:
# Your code goes here