# Classifying penguins with a neural network

<br><br><br>

## Setting up to classify penguins

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn.linear_model
import sklearn.neural_network

In [None]:
penguins = pd.read_csv("data/penguins.csv")

<br><br><br>

Just like the classification tasks that you just worked on, let's classify penguin species by measurements of their bills.

<img src="img/culmen_depth.png" width="400">

In [None]:
fig, ax = plt.subplots()

penguins[penguins["species"] == "Adelie"].plot.scatter("bill_length_mm", "bill_depth_mm", color="blue", ax=ax)
penguins[penguins["species"] == "Gentoo"].plot.scatter("bill_length_mm", "bill_depth_mm", color="orange", ax=ax)
penguins[penguins["species"] == "Chinstrap"].plot.scatter("bill_length_mm", "bill_depth_mm", color="green", ax=ax)

None

<br><br><br>

First complication: `"Adelie"`, `"Gentoo"`, and `"Chinstrap"` are strings, but neural networks return numbers.

But all we care about are distinctions between strings, such as

```python
"Adelie" == "Adelie"   # and
"Adelie" != "Gentoo"
```

So we'll replace the strings with numbers—a distinct number for each distinct string.

In Pandas, this is the `pd.Categorical` data type.

In [None]:
pd.Categorical(penguins["species"]).codes

In [None]:
penguins["species_code"] = pd.Categorical(penguins["species"]).codes
penguins[["species", "species_code"]]

<br><br><br>

In [None]:
input_data = penguins.dropna()[["bill_length_mm", "bill_depth_mm"]].values
desired_output = penguins.dropna()["species_code"].values

In [None]:
input_data

In [None]:
desired_output

<br><br><br>

Second complication: neural networks are slow to train if the input values are far from ‒1 through 1.

So we'll scale them (subtract and multiply by constants) to put them in that range.

In [None]:
import sklearn.preprocessing

In [None]:
scaler = sklearn.preprocessing.MinMaxScaler((-1, 1)).fit(input_data)

In [None]:
scaled_input_data = scaler.transform(input_data)

In [None]:
plt.scatter(scaled_input_data[:, 0], scaled_input_data[:, 1])

<br><br><br>

## No hidden layers: (mostly) linear

First, we'll train a neural network with no hidden layers, which makes it a purely linear model.

It's called "logistic classification" because the linear fit has to be transformed to return probabilities between 0 and 1:

$$P_0 = \mbox{classify as adelie}$$

$$P_1 = \mbox{classify as gentoo}$$

$$P_2 = \mbox{classify as chinstrap}$$

with $P_0 + P_1 + P_2 = 1$. The output of the linear fit has to be passed through a function called [softmax](https://en.wikipedia.org/wiki/Softmax_function).

In [None]:
logistic_regression = sklearn.neural_network.MLPClassifier(solver="lbfgs", activation="logistic", hidden_layer_sizes=())

In [None]:
logistic_regression.fit(scaled_input_data, desired_output)

<br><br><br>

In [None]:
fig, (ax0, ax1, ax2) = plt.subplots(1, 3, figsize=(13, 4))

xmin, xmax = -1, 1
ymin, ymax = -1, 1

background_x, background_y = np.meshgrid(np.linspace(xmin, xmax, 100), np.linspace(ymin, ymax, 100))

probabilities = logistic_regression.predict_proba(np.column_stack([background_x.ravel(), background_y.ravel()]))

ax0.contourf(background_x, background_y, probabilities[:, 0].reshape(background_x.shape))
ax1.contourf(background_x, background_y, probabilities[:, 1].reshape(background_x.shape))
ax2.contourf(background_x, background_y, probabilities[:, 2].reshape(background_x.shape))

for ax in [ax0, ax1, ax2]:
    ax.set_xlim(xmin, xmax)
    ax.set_ylim(ymin, ymax)
    ax.set_xlabel("scaled bill length")

ax0.set_ylabel("scaled bill depth")

ax0.set_title("probability of adelie")
ax1.set_title("probability of gentoo")
ax2.set_title("probability of chinstrap")

None

<br><br><br>

In [None]:
def draw_everything(model):
    fig, ax = plt.subplots(figsize=(6, 6))
    
    xmin, xmax = -1, 1
    ymin, ymax = -1, 1
    
    background_x, background_y = np.meshgrid(np.linspace(xmin, xmax, 100), np.linspace(ymin, ymax, 100))
    
    probabilities = model.predict_proba(np.column_stack([background_x.ravel(), background_y.ravel()]))
    
    ax.contour(background_x, background_y, probabilities[:, 0].reshape(background_x.shape), [0.5])
    ax.contour(background_x, background_y, probabilities[:, 1].reshape(background_x.shape), [0.5])
    ax.contour(background_x, background_y, probabilities[:, 2].reshape(background_x.shape), [0.5])
    
    ax.scatter(*scaled_input_data[desired_output == 0].T, color="blue")
    ax.scatter(*scaled_input_data[desired_output == 1].T, color="orange")
    ax.scatter(*scaled_input_data[desired_output == 2].T, color="green")
    
    ax.set_xlim(xmin, xmax)
    ax.set_ylim(ymin, ymax)
    ax.set_xlabel("scaled bill length")
    ax.set_ylabel("scaled bill depth")

draw_everything(logistic_regression)

<br><br><br>

## Now with hidden layers

Already pretty good. Let's add some hidden layers to make this a real neural network!

In [None]:
neural_network1 = sklearn.neural_network.MLPClassifier(solver="lbfgs", activation="logistic", hidden_layer_sizes=(10, 10), max_iter=1000)

In [None]:
neural_network1.fit(scaled_input_data, desired_output)

In [None]:
draw_everything(neural_network1)

<br><br><br>

That's too loose! It's overfitting like crazy!

Let's add regularization to force the model to be simpler.

In [None]:
neural_network2 = sklearn.neural_network.MLPClassifier(solver="lbfgs", activation="logistic", hidden_layer_sizes=(10, 10), max_iter=1000, alpha=0.03)

neural_network2.fit(scaled_input_data, desired_output)

draw_everything(neural_network2)

<br><br><br>

Side-note: neural networks and other advanced machine learning models aren't always necessary!

For this problem, I would consider a logistic regression good enough. Adding hidden layers only introduces problems with overfitting and non-determinism.

<br><br><br>

## Conclusion for day 1

How regular _should_ it be?

If the model is too strict—too few parameters, too strongly regularized, underfit—then it doesn't describe or predict well.

If the model is too loose—too many parameters, too weakly regularized, overfit—then it is a restatement of the training data and makes wacky predictions in regions with no training data.

<br><br><br>

**Tomorrow:** language models!

For fun, here's sample output from [my first attempt](https://github.com/jpivarski/rnn-oz) at a language model 6 years ago:

> Then manderunt thee. I's anf leus, for and as to mope not thal se the Caid will the wale. "I trop iclusers and Willy age and preed geach duppeny.
>
> "I doble and the primman forsed the Ellarke coup?" Realk.
>
> "There lookna'u cere them chimed was neerid.
>
> "Younway the arous afrithy Stonsad. "Ws?"
>
> "On. But'm dee poas the gad now ulterwoth the lorked the where were if Dorothy, "untle lecking the hes got wook to care wors.
>
> "Ic intameed him godlyed," is dich buttly. As pigle.
>
> "Me?" a dexn hander upen this slieve "angolst," facked a more srough copenting his then the tuvp Is strome it it, and likned to Ozquere mane so nol hud of to suf awe grissand, hom hal as for thingn wish witley, and wondents ucherthy brome byinged inknest's clawirs, she bot the briced to peinon" "I dpor. out the motlond honky what hen with frow thin't dyole thes teen then man he sugnit.
>
> "Num, he chan she dices. I waykel of caply Lood abreaklulres frisk and elewal, Igredled sto liow je a shirut, a do no her at was al ouly a wessed to falded anknes you, and a thister blecel of she lady. Suppen hows indiedat of his, who fien. "Thing, ald greps a dotsores.

Yours will be better!