# Week 6 - Logistic Regression

### Aims

The main concepts covered in this notebook are:

>* logistic regression
>* performance metrics for classification
>* dealing with imbalanced data 
>* multi-class logistic regression

1. [Setup](#setup)

2. [Binary Logistic Regression](#RBH)

3. [Regularization](#SKV)

4. [Imbalanced Data](#Imbal)

5. [Multi-class Example](#mclog)


This week we will be implementing logistic regression for a classification task. 

- We will mainly focus on the data set stored in `Default.csv`.
- For the multi-class example, we consider the `iris data` at the end of the notebook.

During workshops, you will complete the worksheets together in teams of 2-3, using **pair programming**. You should aim to switch roles between driver and navigator approximately every 15 minutes. When completing worksheets:

>- You will have tasks tagged by (CORE) and (EXTRA). 
>- Your primary aim is to complete the (CORE) components during the WS session, afterwards you can try to complete the (EXTRA) tasks for your self-learning process. 
>- In some Exercises, you will see some beneficial hints at the bottom of questions.

Instructions for submitting your workshops can be found at the end of worksheet. As a reminder, you must submit a pdf of your notebook on Learn by 16:00 PM on the Friday of the week the workshop was given.

---

# Setup <a id='setup'></a>

## Packages

Let's load the packages we need for this workshop.

In [93]:
# Display plots inline
%matplotlib inline

# Data libraries
import pandas as pd
import numpy as np

# Plotting libraries
import matplotlib.pyplot as plt
import seaborn as sns

# sklearn modules
from sklearn.linear_model import LogisticRegression

# Other necessary packages
from sklearn.preprocessing import StandardScaler # scaling features
from sklearn.pipeline import make_pipeline           # combining classifier steps
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

from sklearn.model_selection import GridSearchCV, KFold, StratifiedKFold 
from sklearn.model_selection import train_test_split

In [94]:
# Plotting defaults, feel free to adjust if you need
plt.rcParams['figure.figsize'] = (8,6)
plt.rcParams['figure.dpi'] = 80

## Data

The dataset collects information on **10000** individuals, recording whether they defaulted on their credit card or not as well as other characteristics. Specifically, the included columns in the data are:

* `default` - Whether the individual has defaulted

* `student` - Whether the individual is the student

* `balance` - The balance in the individual's account

* `income` - Income of an individual

Our aim is to build a model using Logistic Regression to predict if person will default or not.

In [None]:
df_default = pd.read_csv("Default.csv", index_col=0)

df_default.head()

## Exploratory Data Analysis

We will start with explanatory data analysis (EDA) to get more insight about the data. 

### 🚩 Exercise 1 (CORE)

Examine the structure of the data. Consider the following questions:

a. What are the types of each variable? Based on the descriptive statistics, do anticipate any feature engineering steps that may be needed?

b. Are there any missing values in the data? 

c. Visualize the features (balance, income, student) and comment on any differences that you observe between individuals that have defaulted 

In [None]:
# Part a: info and descriptive statistics


In [None]:
# Part b: missing data check


In [None]:
# Part c: Visualize the data


## Splitting and Preprocessing the Data

Next, let's create our feature matrix and response vector. We use `LabelEncoder` to encode our categorical output to a binary vector. 

In [None]:
from sklearn.preprocessing import LabelEncoder

# Feature matrix and response vector
X, y = df_default.drop(['default'], axis=1), df_default['default']

# Convert to numpy array
X = X.values

# Encode default
y = LabelEncoder().fit_transform(y)

print(X.shape)
print(y.shape)

# Print the class distribution before splitting the data set 
print(pd.Series(y).value_counts(normalize=True)*100)

Notice the high class imbalance, with 96.67% of individuals that have not defaulted on their loan. Consider the following naive train/test split.

In [None]:
# Naively split the data into train and test sets 
X_train, X_test, y_train, y_test = train_test_split(X, y, shuffle= True,
                                                    test_size = 0.1, random_state=1112)


# Check the proportion of defaults in both train and test data sets 
print(pd.Series(y_train).value_counts(normalize=True)*100)
print(pd.Series(y_test).value_counts(normalize=True)*100)

### 🚩 Exercise 2 (CORE)

a. Why might you NOT want to use the train/test split above?

b. Modify the code to have similar class proportions in the train and test sets. Print the class proportions to check if they are similar. 

**Hint:** consider the the additional argument `stratify=` inside of the [`train_test_split`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) function to stratify in terms of the response when spliting the data.

# Logistic Regression <a id='RBH'></a>

Recall from our notes that for a binary output $y \in \lbrace 0, 1 \rbrace$, **logistic regression** is a classifier that can be seen a simple generalization of linear regression by making two changes. 

- First, we replaced the **Gaussian** distribution of the output $y$ with a **Bernoulli** distribution. 
- Second, we pass the linear function of the inputs, $\mathbf{w}^T\mathbf{x}$, through a **link function** $g: \mathbb{R} \rightarrow [0,1]$. 

That is, we assume $y \sim \text{Bern}( g(\mathbf{w}^T\mathbf{x}))$.

The link function takes values in the unit interval to ensure that the conditional probability of a success, 

$$p(y =1 \mid \mathbf{x}) = E[y | \mathbf{x}] = g(\mathbf{w}^T\mathbf{x})$$

is between zero and one. Specifically, in logistic regression, we select the **logistic** link function (S-shaped), defined as 

$$g(\mathbf{w}^T\mathbf{x}) = \frac{1}{1 + \exp(-\mathbf{w}^T\mathbf{x})}$$

Putting these steps together, the logistic regression model is:

$$y \sim \text{Bern}\left( \left[1 + \exp(-\mathbf{w}^T\mathbf{x})\right]^{-1} \right)$$

In `sklearn`, we can fit a logistic regression model using [`LogisticRegression`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html). 

**Creating a logistic regression model:** some important options include
- `penalty` specifies the type of regularization. Options are `l1`, `l2`, `elasticnet`, `None`. 
- `C` the inverse strength of the penalty parameter.
- 'l1_ratio' the additional penalty parameter when using `elasticnet`.

**CAUTION:** by default `LogisticRegression` uses an l2 regularization with penalty parameter `C=1`. This default should never be used! Instead, you should either set `penalty=None` or tune the value of `C` with cross-validation. 

**Fitting a logistic regression model:**
- use the usual `.fit(X,y)` to fit the model. The fitted object stores the intercept and coefficients in the usual attributes `.intercept_` and `.coef_`.

**Prediction:** methods include
- `predict` which predicts the class label (either 0 or 1), 
- `predict_proba` which predicts the class probabilities, and 
- `predict_log_proba` which predicts the log probabilities of each class.

Let's start by create a pipeline to for our logisitic regression model (with no penatly).

In [103]:
# Pipeline
log_pipe = make_pipeline(
    ColumnTransformer(
        [("cat", OneHotEncoder(drop=["No"]), [0]),
         ("num", StandardScaler(), [1,2])]
    ),
    LogisticRegression(random_state=42, penalty=None)
)

### 🚩 Exercise 3 (CORE)

- Fit the logistic regression model to the training data
- Run the code to plot the coefficients and comment on their interpretation.
- Compute the the accuracy score of the model on the testing data

<details><summary><b><u>Hint</b></u></summary>
    
You can use the `score` method of `LogisiticRegression` to compute the accuracy.
    
</details>


In [None]:
# Fit the model using the training data


# Compute accuracy on the test data


In [None]:
# Create dataframe with coefficents
coefs = pd.DataFrame(
    np.copy(log_pipe[1].coef_).T,
    columns=["Coefficients"],
    index=df_default.columns[[1,2,3]],
)

# Plot the coefficients
coefs.plot.barh(figsize=(9, 7))
plt.title("Logistic regression")
plt.axvline(x=0, color=".5")
plt.xlabel("Coefficient values")
plt.subplots_adjust(left=0.3)
plt.show()

## Evaluating Classification Models: Beyond Accuracy

Because of the imbalanced nature of the data, looking at accuracy alone is misleading. Indeed, a naive classifier that predicts no one defaults would achieve a high accuracy of 96.7%.  

A binary classifier can make two types of errors:

- Incorrectly assigning an individual __who defaults__ to the __no default__ category (FN)
- Incorrectly assigning an individual who __does not default__ to the __default__ category (FP)

To better understand the FN vs FP tradeoff, we can compute the confusion matrix. Recall from our notes, the quantities reported in the confusion matrix are:


$$\text{TP} = \sum_{n=1}^N \mathbb{I}(y_n=1)\mathbb{I}(\widehat{y}_n=1),\quad \text{FP} = \sum_{n=1}^N \mathbb{I}(y_n=0)\mathbb{I}(\widehat{y}_n=1)$$
$$\text{FN} = \sum_{n=1}^N \mathbb{I}(y_n=1)\mathbb{I}(\widehat{y}_n=0), \quad \text{TN} = \sum_{n=1}^N \mathbb{I}(y_n=0)\mathbb{I}(\widehat{y}_n=0)$$


where $y_n$ is the true class and $\widehat{y}_n$ is the estimated class. 

### 🚩 Exercise 4 (CORE)

1. Use [ConfusionMatrixDisplay.from_estimator](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.ConfusionMatrixDisplay.html#sklearn.metrics.ConfusionMatrixDisplay.from_estimator) to compute and visualize the confusion matrix. 
2. Comment on the importance of FNs compared to FPs to the credit card company.

In [None]:
from sklearn.metrics import ConfusionMatrixDisplay



Recall from our notes, the different evaluation measures can be defined from the confusion matrix, such as:

$$
\text{Accuracy} = \frac{\text{TP + TN}}{\text{TP + TN + FP + FN}}
$$

$$
\text{FPR} = \frac{\text{FP}}{\text{FP}+ \text{TN}}, \hspace{0.5cm} \text{Recall (TPR)} = \frac{\text{TP}}{\text{TP}+ \text{FN}} \hspace{0.5cm}
\text{Precision} = \frac{\text{TP}}{\text{TP}+ \text{FP}}
$$


$$
\text{F1-Score} = 2\left(\frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}\right) = \frac{\text{2TP}}{\text{2TP + FP + FN}}
$$

To compute these quantites and more, functions are available in `sklearn.metrics`, including:

1. [Recall](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.recall_score.html) (True Positive Rate)
3. [Precision](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.precision_score.html)
4. [F1-score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html)
5. [AUC](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html)

For a detailed list, see https://scikit-learn.org/1.5/modules/model_evaluation.html#classification-metrics.

### 🚩 Exercise 5 (CORE)

Compute the accuracy, recall, precision, and F1-score on the test data for the fitted model (using the functions in `sklearn.metrics`) and comment on the model's performance.

**Hint** you first need to use `.predict()` to predict the class labels.

In [None]:
from sklearn.metrics import accuracy_score, precision_score, f1_score, recall_score



### 🚩 Exercise 6 (EXTRA)

Based on the confusion matrix computed below. Use the equations above to derive:

1. False Positive Rate 
2. Recall
3. Precision
4. F1-score

without using any additional built-in function from any module. If computed correctly, the numbers should match the previous exercise.

In [None]:
from sklearn.metrics import confusion_matrix

# Predict class labels
y_test_pred = log_pipe.predict(X_test)

# Compute confusion matrix
confmat = confusion_matrix(y_true = y_test, y_pred=y_test_pred)
confmat

### 🚩 Exercise 7 (CORE)

The handy function `RocCurveDisplay.from_estimator` is also available in `sklearn` for plotting  ROC curve for the fitted model. Use this function to plot the ROC curve and compute the AUC using `sklearn.metrics.roc_auc_score` on the test data. Comment on the model's performance based on these quantities.

**Hint** you will need to use `predict_proba()` to compute the class probabilities for each data point, in order to compute the AUC.

In [None]:
from sklearn.metrics import RocCurveDisplay, roc_auc_score

# Plot the ROC curve


# Compute the AUC


### 🚩 Exercise 8 (CORE)

Now, use the function `PrecisionRecallDisplay.from_estimator` to plot the Precision-Recall curve for the fitted model on the test data. Comment on the model's performance based on this figure.

In [None]:
from sklearn.metrics import PrecisionRecallDisplay



# Regularization <a id='SKV'></a>

Regularization can be useful in logistic regression to deal with high-dimensional data and for variable selection. As previous stated `LogisiticRegression` offers four options through the 'penalty' parameter:
- `penalty=None` corresponds to no regularization,
- `penalty=l2` is the default and corresponds to l2 regularization (ridge),
- `penalty=l1` corresponds to l1 regularization (lasso), and
- `penalty=elasticnet` corresponds to a combination of l1 and l2 regularizaiton.

For the three regularization methods, the parameter `C` represents the inverse strength of the penalty parameter. Additionaly, when using `elasticnet`, `l1_ratio` is an additional penalty parameter, controlling the balance between l1 and l2 regularization.

Note: the choice of the algorithm (aka solver) depends on the penalty chosen (see the [documentation](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) for more details.)

As we saw last week, it is important to tune the penalty parameter `C` (and `l1_ratio` if using elastic net) when including regularization. Before we start searching over hyperparameters, its worth noting that some of the folds may not have the same distribution of the classes. This is particulary important for imbalanced data and means the we could get a validation score that may be a poor estimate of performance (for example we may have a fold with very few positive classes or more than usual, which can cause large differences on imbalanced data). To address this,  when doing our grid search, we will use a `StratifiedKFold` to ensure the distribution of classes in our folds reflects the distribution in the larger data.

Run the code below to investigate the difference in class distributions across folds when using `Kfold` vs `StratifiedKfold`.


In [None]:
# Consider 5 folds
KF = KFold(n_splits=5)
SKF = StratifiedKFold(n_splits=5)

fold_names = ["KFold", "StratifiedKFold"]
for i, K in enumerate([KF, SKF]):
    # Initialize an empty DataFrame to store counts for the current fold type
    fold_nos = pd.DataFrame()
    for j, (train_i, test_i) in enumerate(K.split(X_train, y_train)):
        # Compute value counts for the current fold and ensure it's a DataFrame with appropriate columns
        fold_no = pd.DataFrame(pd.Series(y_train[test_i]).value_counts()).T
        fold_no.index = ["Fold " + str(j)]  # Rename the index to reflect the fold number
        # Concatenate with the fold_nos DataFrame
        fold_nos = pd.concat([fold_nos, fold_no], axis=0)
    
    fold_nos.fillna(0, inplace=True)  # Fill missing values with 0 if any class was not present in a fold
    print(f"{fold_names[i]} counts per fold:\n", fold_nos)

Next, let's perform a grid search to tune the regularization parameter with l2 regularization. Notice that we have listed multiple metrics to save with the option `scoring=`. However, setting `refit=` will select the best model and refit according to the  specfied metric.

In [None]:
# This code snippet sets up and executes a grid search for tuning the penalty parameter
# of a logistic regression model with l2 regularization using cross-validation. 

# Pipeline 
log_pipe_l2 = make_pipeline(
    ColumnTransformer(
        [("cat", OneHotEncoder(drop=["No"]), [0]),
         ("num", StandardScaler(), [1,2])]
    ),
    LogisticRegression(random_state=42, penalty='l2')
)

# Uncomment this line to find how the penalty parameter is called in the pipeline
#log_pipe.get_params()

# Possible C values: 
C_list = np.linspace(0.01, 15, num=151)


# Grid search CV:
log_rs = GridSearchCV(log_pipe_l2, 
                      param_grid={'logisticregression__C': C_list},
                      scoring = ["accuracy", "f1","recall","precision"], #Evaluation metrics to compute on validation sets
                      cv = StratifiedKFold(n_splits=5, shuffle=True),
                      refit = "accuracy", # Refits the best model on the entire dataset using the accuracy metric 
                      return_train_score = True)

# Tune the model with grid search:
log_rs.fit(X_train, y_train)

### 🚩 Exercise 9 (CORE)

a. Run the following code to plot the mean accuracy averaged across the validation folds (in black). Comment on the suggested value of `C`.

b. Choose a metric other than accuracy (i.e. f1, recall, or precision) and redraw the figure to plot the mean across the validation folds. Does your suggest value of `C` change?

In [None]:
# Extract only mean and split scores
cv_accuracy = pd.DataFrame(
    data = log_rs.cv_results_
).filter(
    # Extract the split#_test_accuracy and mean_test_accuracy columns
    regex = '(split[0-4]+|mean)_test_accuracy'
).assign(
    # Add the alphas as a column
    C = C_list
)

# Reshape the data frame for plotting
d = cv_accuracy.melt(
    id_vars=('C','mean_test_accuracy'),
    var_name='fold',
    value_name='Accuracy'
)

# Plot the validation scores across folds
plt.figure(figsize=(10,7))
sns.lineplot(x='C', y='Accuracy', color='black', errorbar=None, data = d)  # Plot the mean score in black.
sns.lineplot(x='C', y='Accuracy', hue='fold', data = d) # Plot the curves for each fold in different colors
plt.show()

In [None]:
# Redraw figure for a different metric



### 🚩 Exercise 10 (EXTRA)

a. Rebuild your model with `l1` regularization and perform a grid search to tune the penalty parameter. Which metric have you chosen for refitting the model and why?

b. Plot the coefficients. Which variables are included? How does the performance compare to the model with no regularization?

In [None]:
# Model with l1 regularization

In [None]:
# Plot the coefficients


In [None]:
# Compute metrics on the test data


# Dealing with Imbalanced Data <a id='Imbal'></a>

We have already seen that when dealing with imbalanced data, we need to 
1. Considering a suitable performance metric both for testing and validation.
2. Use a suitable splitting strategy (stratified splitting) both in creating our test and validation sets.

But the majority class can still overwelm the minority class when fitting the model. To investigate, let's use the function  [`classification_report`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html), which reports summary of the precision, recall, F1 score for each class. Note that:
- the recall of the positive class is also known as *sensitivity*; recall of the negative class is *specificity*.
- it also includes the *macro average* (averaging the unweighted mean per label) as well as the weighted average (averaging the support-weighted mean per label). 

In [None]:
# Compute and print the classification report for the model with no penatly
from sklearn.metrics import classification_report

print(classification_report(y_test, 
                            log_pipe.predict(X_test), 
                            target_names = ['No Default','Default']))

Notice, how the metrics are much higher for the majority class (individuals who have not deafulted). In this case, we may want to alter the training algorithm or data by:

1. Weighting the classes during training.
2. Resampling the data.

`LogisticRegression` offers the option to include class weights, and for example `class_weight="balanced"` will weight the classes by:
$$ \frac{N}{2 N_c}, \text{ where } N_c \text{ for } c = 0, 1 \text{ counts the number of observations in each class.}$$ 

In the following, the main focus is on suitable **resampling** to change the distribution of the classes in our training data.

## Resampling

To alter the distribution of the classes in our training data, there are two main approaches:

- Under-sampling the majority class
- Over-sampling the minority class

We will be using Imbalanced-learn (imported as `imblearn`), an open source, MIT-licensed library relying on scikit-learn that provides tools for dealing with classification with imbalanced classes (see [here](https://imbalanced-learn.org/stable/introduction.html) for an introduction). Let's start by installing the `imblearn` package if necessary.

In [122]:
# Install the imblearn if necessary 
#!pip install imblearn

In the previous notebooks, we learned about _transformers_, which allow us to __alter the features__ and not the number of observations in our data (i.e. the columns of $\mathbf{X}$). Instead, _resamplers_ provide a preprocessing step in our `Pipeline` to __alter the number of observations__ and not the features (i.e. the rows of $\mathbf{X}$ and $\mathbf{y}$).

__Resamplers__

Resamplers are classes that follow the scikit-learn API and have a sampling functionality through the `.resample()` method. Like all other scikit-learn methods, they have a `.fit()` method which is only applied during pipeline training. This means if we want to create our own resampler from scratch that is compatible with scikit-learn, we just have to make a class that has three methods; `.fit()`,  `.resample()`, and `.fit_resample()`; with the latter just chaining the other two together.

Therefore to resample a dataset, each sampler implements:

```
obj.fit(data, targets)
data_resampled, targets_resampled = obj.resample(data, targets)
```

or simply...

```
data_resampled, targets_resampled = obj.fit_resample(data, targets)
```

**Remember to include your sampler within your model pipeline to prevent data leakage!**


### Under-Sampling

Under-sampling involves removing observations from the majority class to prevent its signal from dominating during training. We will focus on the simplest strategy:

- [`RandomUnderSampler`](https://imbalanced-learn.org/stable/references/generated/imblearn.under_sampling.RandomUnderSampler.html), which randomly removes observations from the majority class.

For an overview of other under-samplers, see: https://imbalanced-learn.org/stable/under_sampling.html

In [None]:
from imblearn.pipeline import Pipeline as ImPipeline
from imblearn.under_sampling import RandomUnderSampler

# Pipeline with under sampling
log_pipe_us = ImPipeline([
    ("fe",ColumnTransformer(
        [("cat", OneHotEncoder(drop=["No"]), [0]),
         ("num", StandardScaler(), [1,2])]
    )),
   ("sampler", RandomUnderSampler(random_state=42)),
   ("model", LogisticRegression(random_state=42, penalty=None))])

# Fit model     
log_pipe_us.fit(X_train, y_train)

### 🚩 Exercise 11 (CORE)

Compute and print the classification report and visualize the confusion matrix for the model above with undersampling. How have the results changed?

## Oversampling

Over-sampling involves generating additional observations from the minority class. We will focus on the simplest strategy:

- [`RandomOverSampler`](https://imbalanced-learn.org/stable/references/generated/imblearn.over_sampling.RandomOverSampler.html), which randomly samples with replacement observations from the minority class.

For an overview of other over-samplers, see: https://imbalanced-learn.org/stable/over_sampling.html

### 🚩 Exercise 12 (CORE)

a. Create a new pipeline for a logisitic regression model with oversampling (by copying and editting the pipeline above).

b. How do the performance metrics compare with under-sampling?

In [None]:
from imblearn.over_sampling import RandomOverSampler



---
# Multi-class Logistic Regression <a id='mclog'></a>

Finally, to gain some experience with multi-class logistic regression, let's now look at the `iris` data set, which we have seen already in lectures.  

This is an example where the response has **3 classes** corresponding to the three types of iris species and $D=4$ features (petal length, petal width, sepal length, sepal width).

Let's start by loading the data and doing some basic EDA.

In [None]:
# First load the data 
from sklearn.datasets import load_iris

# Loading data
iris = pd.DataFrame(sns.load_dataset('iris'))

# Print information about the data set
print(iris.info())
print(iris.describe())
print(iris['species'].value_counts())

# Pairplot
sns.pairplot(data = iris, hue = 'species',corner=True)
plt.show()

From the intial EDA, we observe:
- The features have similar, but slightly different scales.
- Visually, the species appear to be fairly well separated.
- The target variable `species` is a string and we will need a label encoding
- The species types are equal distributed, and thus no methods for class imbalance are needed.

Now, let's separate the features and target, encode the target, and split into training and test sets.

In [128]:
# Feature matrix and response vector
X, y = iris.drop(['species'], axis=1), iris['species']

# Convert to numpy array
X = X.values

# Encode target
y = LabelEncoder().fit_transform(y)

# Split into training and testing 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, shuffle=True) 

### 🚩 Exercise 13 (EXTRA)

We can use the same `LogisticRegression` for the multi-class setting. In this case, it will fit the **multinomial logistic regression** model. Fit a multinomial logisitic regression model and visualize the confusion matrix.

### 🚩 Exercise 14 (EXTRA)

 Some machine learning classification algorithms are only suited to binary classification. The **One-vs-Rest** scheme is simple strategy to extend any binary classification method to the multi-class setting, by simply:
 - one-hot encoding the target variable, and
 - fitting at binary classification model for each class against all other classes. 
 
This can be easily implemented using `sklearn`'s [OneVsRestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html).

Fit a one-vs-rest logistic regression model and visual the confusion matrix. How do the results compare with the multinomial model?

In [None]:
from sklearn.multiclass import OneVsRestClassifier 



# Competing the Worksheet

At this point you have hopefully been able to complete all the CORE exercises and attempted the EXTRA ones. Now 
is a good time to check the reproducibility of this document by restarting the notebook's
kernel and rerunning all cells in order.

Before generating the PDF, please go to Edit -> Edit Notebook Metadata and change 'Student 1' and 'Student 2' in the **name** attribute to include your name. If you are unable to edit the Notebook Metadata, please add a Markdown cell at the top of the notebook with your name(s).

Once that is done and you are happy with everything, you can then run the following cell 
to generate your PDF. Once generated, please submit this PDF on Learn page by 16:00 PM on the Friday of the week the workshop was given. 

In [None]:
!jupyter nbconvert --to pdf mlp_week06_key.ipynb 