## Loading the Data

In [13]:
#!pip install tensorflow
from tensorflow import keras 
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Use the keras module to import the necessary data 
from sklearn.model_selection import train_test_split

In [4]:
#  Preparing and cleaning the data
df = pd.read_csv("palmer_penguins.csv")
df = df.dropna()
df = df.drop('Unnamed: 0', 1)

Now, Julia wants to dig her hands into neural networks. She has a bunch of data on all the penguins and she wants to see if a multi layer neural network can figure out what typre of penguin corresponds to a data point with a considerable feature vector.

Given a set of feature vector of penguin characteris, can we now construct a model that will be able to guess what kind of penguin (Adelie, Gentoo or Chinstrap) just from the feature vector. Let's try to test it out.

Before we do that we have to prepare our feature vector and our target vector.

## Preparing Target Vector

We can prepare our target vector in different kinds of ways. One in categorical form where we map each species to a number <br>
Adelie ---> 0 <br>
Chinstrap ---> 1 <br>
Gentoo ---> 2 <br>
<br>
Another way to map the species is in vector form, for instance, <br>
Adelie --> [1,0,0] <br>
Chinstrap ---> [0,1,0] <br>
Gentoo ---> [0,0,1] <br>

We prepare the vectors in both form.

In [5]:
# Type 1 Categorical
cls = {"Adelie": 0,
       "Chinstrap": 1,
       "Gentoo": 2,}
df["species_index"] = df["species"].apply(lambda x: cls[x])
y_total = df['species_index'].values

# Type 2 Vectorized
onehot_train_y = []
for y in y_total:   
    temp_vec = np.zeros((3, 1))
    temp_vec[y][0] = 1.0
    onehot_train_y.append(temp_vec)

## Preparing Feature Vector

Next, we prepare the feature vector. So, far we haven't used any of the qualitative variables in our models (like sex and island) both of which contain highly useful information (as we've seen previously). <br>

We use one-hot-enchoding to convert the qualitative variable <b> island </b> to 3 boolean variables; isTorgerson, isBiscoe and isDream. Furthermore, since we've seen that the qualitative variable <b> sex </b> has only two unique values i.e. 'Male' and 'Female' we convert it a boolean called isFemale. <br>

In [6]:
# Data processing using one-hot-encoding
df["isFemale"] = np.where(df["sex"]=="Female",1,0)
df["isTorgersen"] = np.where(df["island"]=="Torgersen",1,0)
df["isBiscoe"] = np.where(df["island"]=="Biscoe",1,0)
df["isDream"] = np.where(df["island"]=="Dream",1,0)

# Feature vector containing all the variables
X =  df[[ "bill_length_mm","bill_depth_mm","flipper_length_mm","body_mass_g",
          "isFemale","isTorgersen","isBiscoe","isDream"]].values 

# Scaling the feature vector so that all the variables are centered close to each other
X[0:,0] = X[0:,0]/10
X[0:,1] = X[0:,1]/10
X[0:,2] = X[0:,2]/100
X[0:,3] = X[0:,3]/1000
np.shape(X[0])

# Also create a flattened train of feature vectors
flat_train_X = []
for x in X:   
    flat_train_X.append(x.flatten().reshape(8, 1))
    
print(np.shape(flat_train_X))

(333, 8, 1)


## Train/Test Split

In [7]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split (X, y_total, test_size= 0.3, random_state =42)

## Initializing a Mutli Layer Perceptron

In [8]:
import tensorflow as tf
print("TensorFlow version:", tf.__version__)

TensorFlow version: 2.8.0


In [9]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Flatten(input_shape=(8,)),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dropout(0.2),
  tf.keras.layers.Dense(3)
])

Training the Dense neuron with a RELU activation function.

In [11]:
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])
model.fit(X_train, y_train, epochs=20)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x7fa115b75d60>

We've reached about 93% accurace with just 20 epochs. Let's see how this model performs on the test data!

In [12]:
model.evaluate(X_test, y_test, verbose=1)



[0.2033604383468628, 0.9700000286102295]

It performs very well - almost 97% accuracy! 