# Hello TensorFlow 

In this notebook, you will create your first neural network with Tensorflow. We will be using our familiar penguins dataset for this assignment. This notebook is loosely based off of [Tensorflow's beginner notebook](https://www.tensorflow.org/tutorials/quickstart/beginner).

## 1. Set up TensorFlow

1. Make sure that tensorflow is properly installed via pip in a venv before running this cell. 
2. Activate your venv: Select kernal -> select another kernal -> python environments -> dl-env (or whatever you named your environment)
2. Run this cell to import TensorFlow in python and ensure it works properly

In [25]:
# Run this cell to import libraries and check that tensorflow is properly installed
import pandas as pd
import numpy as np
import math
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
import plotly.express as px
import tensorflow as tf

# Load your data

Load your penguins dataset. You will want to make sure you hand NA values, encode strings as numbers, and split your data. In this lab we will be predicting body_mass_g rather than species, so make sure to set y to be body_mass_g.

In [26]:
# TODO Load penguins.csv
data = pd.read_csv("classifcation_and_seqs_aln.csv")

#TODO Handle NA Values

# TODO encode string data using LabelEncoder
encoder = LabelEncoder()
encoded = []

data["species"] = encoder.fit_transform(data["species"])

for sequence in data["sequence"].tolist():
    mini_encoded = []
    for char in sequence:
        if char == "-":
            mini_encoded.append(0)
        if char == "A":
            mini_encoded.append(1)
        if char == "T":
            mini_encoded.append(2)
        if char == "C":
            mini_encoded.append(3)
        if char == "G":
            mini_encoded.append(4)
    encoded.append(mini_encoded)



    


    
#TODO Select your features. Select body_mass_g as your "target" (y) and everything else as X
y = data["species"]
X = np.array(encoded)

print(np.unique(data['species']))

# TODO : Split the data into testing and training data. Use a 20% split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.8, random_state = 42)


[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33]


In [27]:
# run this to see if you implemented the above block correctly
assert math.isclose(len(X_train), .8*len(data), rel_tol=1), f"\033[91mExpected {.8*len(data)} but got {len(X_train)}\033[0m"
assert math.isclose(len(X_test), .2*len(data), rel_tol=1), f"\033[91mExpected {.2*len(data)} but got {len(X_test)}\033[0m"

## Build a machine learning model

Let's now built our first neural network with tensorflow. Using tensorflow, we can define each layer of our neural network, specifying what types of activation function we want to use and how many neurons there should be in each layer. 

In [None]:
# TODO create a neural network with tensorflow
model = tf.keras.models.Sequential([
    tf.keras.layers.Input(shape=(4795,)),
    tf.keras.layers.Dense(272, activation='relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(136, activation='relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(68, activation='relu'),
    tf.keras.layers.BatchNormalization(),
    tf.keras.layers.Dense(34, activation='softmax')
])

Before you start training, configure and compile the model using Keras `Model.compile`. Set the [`optimizer`](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers) class to `adam`, and set the `loss` to the `mse`.

In [52]:
# TODO set your learning rate
lr = 0.0004

#TODO Compile your model with a selected optimizer and loss function
model.compile(
    loss=tf.keras.losses.SparseCategoricalCrossentropy(),
    optimizer=tf.keras.optimizers.Adam(learning_rate=lr),
    metrics=['accuracy']
)

## Train and evaluate your model

Use the `Model.fit` method to adjust your model parameters and minimize the loss. Creating the variable history to store the training output will allow us to graph our loss later on. 

In [53]:
# TODO: fit your model with X_train and Y_train
history = model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=200)

Epoch 1/200


[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 137ms/step - accuracy: 0.0729 - loss: 4.0789 - val_accuracy: 0.4766 - val_loss: 3.2057
Epoch 2/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 44ms/step - accuracy: 0.3854 - loss: 2.3670 - val_accuracy: 0.5391 - val_loss: 3.3107
Epoch 3/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 93ms/step - accuracy: 0.6979 - loss: 1.6551 - val_accuracy: 0.5417 - val_loss: 3.5040
Epoch 4/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 40ms/step - accuracy: 0.9062 - loss: 1.2074 - val_accuracy: 0.5443 - val_loss: 3.7602
Epoch 5/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 57ms/step - accuracy: 0.9688 - loss: 0.9429 - val_accuracy: 0.5495 - val_loss: 3.9705
Epoch 6/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 44ms/step - accuracy: 0.9896 - loss: 0.7697 - val_accuracy: 0.5521 - val_loss: 4.0659
Epoch 7/200
[1m3/3[0m [32m━━━━━━━━━━━━━━━━━━━━[

In [36]:
#Run this cell to graph your loss
df = pd.DataFrame(history.history)['loss']
px.scatter(df).show()

# Make predictions
Now run the below code block to see your average error. Your *minimum* goal is to get within an average error of 100.

In [42]:
# TODO generate some predictions using Y_test
predictions = model.predict(X_test)

[1m12/12[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step 


Now run the below code block to see your average error.

In [None]:
# Run this cell to calcuate your mean average error based on Y_test
# error = y_test.squeeze() - predictions.ravel()
# print("Your average error is: ", error.mean())

In [54]:
#if abs(error.mean()) > 100:
#    print("\033[91mYour model should be a bit more accurate\033[0m")
#else:
#    print("\033[92mYour model is accurate enough!\033[0m")

In [None]:
#What different settings did you experiment with, and how did each one affect your model’s performance?
#Describe which choices ultimately worked best, which did not, and provide reasoning for why you think those outcomes occurred.
# I tweaked the learning rate, the activation functions, the amount of layers and nodes, and even looked up ways to try and counter over-fitting. Learning rate affected the speed at which I reached the plateu, but ot also led to a resugence in loss, caused by the the model under fitting. But when I used a smaller l rate, it took really long and over fiteed. but it also 
#Increasing the amount of nodes and decreasing them worked well to me, atleast conceptually, as I was passing a LOT of features in and thus I needed a lot of nodes to weed through all of that. I also add another layer so that the drop in nodes wasn't so high. l rate didn't doo too much for me since all of my tweaks were around the same amount, but eary on, my l rate was really small and caused my model to get stuck.