**<div align="center"><span style="font-size:4em">Exercise</span></div>**

# Mushrooms

In this notebook you're expected to practice setting up and training a simple neural network with tensorflow. If you're running this in Colab then you will not need to install any packages. If you run this on your own compute, you may need to install tensorflow, scikit-learn, numpy or matplotlib.

First we do the necessary imports.

In [None]:
import tensorflow as tf
import matplotlib.pyplot as plt
import numpy as np
plt.style.use("seaborn-v0_8")
print("Tensorflow version: "+tf.version.VERSION)

import sklearn.datasets
import sklearn.preprocessing
import sklearn.utils

Next we download a simple dataset on mushrooms. The task will consist in predicting whether a given mushroom is edible or poisonous. The dataset itself is hosted on a repository for a number of datasets used in machine learning: [openml](https://www.openml.org/) To download the data, we use a convenient method of scikit-learn (and that is also the reason why we're import scikit-learn at all).

In [None]:
data,target=sklearn.datasets.fetch_openml('mushroom',return_X_y=True)

Okay, a warning that in the future something will change. That shouldn't bother us. Let's look at the data.

In [None]:
data

Okay, apparently each mushroom is described by a number of characteristics that will be familiar to every mushroom lover. What matters for us: the features are not encoded numerically -- we'll need to change that. But first let's look at the target.

In [None]:
target

Aha, two values 'e' (as in *edible*) and 'p' (as in *rather not*). Also here: we'll need to change that to a numerical value. Let's start with the target. We simply put a '1' whenever the mushroom in question is poisonous. The code looks a bit more complicated because I want to the target to be encoded as a *float* vector, ie, with values 1.0 and 0.0. (If we don't specify that we'll get a vector of <code>True</code> and <code>False</code> values.)

In [None]:
yy=np.array(target=='p',dtype='float')

Next, let's transform the data. *scikit-learn* provides a convenient class to do that, the <code>OneHotEncoder</code>. *tensorflow* offers similar functionality but it's a little bit more complicated to use so we'll go with *scikit-learn*. Let's look at a small toy example to figure out how that works.

In [None]:
sample_data=[['cat','light saber'],['dog','stick'],['rat','light saber']]

one_hot_encoder=sklearn.preprocessing.OneHotEncoder(sparse_output=False)
one_hot_encoder.fit_transform(sample_data)

The first three entries in each row encode *cat*, *dog*, *rat*, the last two encode *light saber* and *stick*.

Let's now apply that to the mushroom data.

In [None]:
one_hot_encoder=sklearn.preprocessing.OneHotEncoder(sparse_output=False)
XX=one_hot_encoder.fit_transform(data)

Let's check how the data has changed. Before we had (compare the table above) 8124 datapoints with 22 features.

In [None]:
XX.shape

We split the data into three sets: A training set, a test set and a validation set that we will use to estimate the final error during training.

In [None]:
X,y=sklearn.utils.shuffle(XX,yy) # let's make sure the data is in random order
train_size=6000
val_size=1000
X_train,X_val,X_test=X[:train_size],X[train_size:train_size+val_size],X[train_size+val_size:]
y_train,y_val,y_test=y[:train_size],y[train_size:train_size+val_size],y[train_size+val_size:]
X_train.shape,X_val.shape,X_test.shape

Now it's your turn!

Set up a neural network, train it and plot training loss and loss on the validation set. A good idea would be to consult the notebook [tfintro](https://colab.research.google.com/github/henningbruhn/math_of_ml_course/blob/main/neural_networks/tfintro.ipynb) for some pointers. For help on plotting, also look at [plt_intro](https://colab.research.google.com/github/henningbruhn/math_of_ml_course/blob/main/python_intro/plt_intro.ipynb).

More concretely
* Define a neural network with a single output neuron. Experiment with the size of the network. That is, see how many neurons and layers you actually need. Since we have only one output, the activation in the last layer should be <code>activation='sigmoid'</code>, which is nothing else than the logistic function.
* Take <code>BinaryCrossentropy</code> as loss function. That is cross entropy adapted to binary classification, ie, when there is a single output and classes are encoded as 0 and 1.
* Train the network for suitable number of rounds (<code>epochs</code>). When calling <code>fit</code> use also the parameter <code>validation_data=(X_val,y_val)</code> to record loss and accuracy on the validation set.
* Collect the result of training in a variable <code>history</code> and plot training loss and validation loss in one plot, and training accuracy and validation accuracy in another (side by side, preferably).
* Compute the accuracy on the test set.

In [None]:
### put your code here and below ###