## ProjF3 - Baseline Model

Use this document as a template to provide the evaluation of your baseline model. You are welcome to go in as much depth as needed.

Make sure you keep the sections specified in this template, but you are welcome to add more cells with your code or explanation as needed.

In [2]:
import matplotlib.pyplot as plt
import numpy as np
import os
import tensorflow as tf
import pandas as pd

from tensorflow.keras.applications import VGG16
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Flatten, Dense
from sklearn.model_selection import train_test_split



### 1. Load and Prepare Data

This should illustrate your code for loading the dataset and the split into training, validation and testing. You can add steps like pre-processing if needed.

In [3]:
df = pd.read_csv('dataset/dataset.csv')

In [4]:
df.head()

Unnamed: 0,filename,label
0,dataset/train/snail/snail (317).jpg,snail
1,dataset/train/wasp/wasp (879).jpg,wasp
2,dataset/train/bees/bees (383).jpg,bees
3,dataset/train/grasshopper/grasshopper (364).jpg,grasshopper
4,dataset/train/weevil/Weevil (161).jpg,weevil


In [5]:
"""
Passing the image as input, and 
loading image instead, 
converting the image into (224,224,3) as input expected by vgg16,
normalizng the pixel value between [0,1]
"""
def preprocess_image(image_path):
    img = tf.io.read_file(image_path)
    img = tf.image.decode_jpeg(img, channels=3)
    img = tf.image.resize(img, (224, 224))
    img = tf.cast(img, tf.float32) / 255.0
    return img

In [6]:
#loading the lable and imagepath
image_paths = df['filename']
labels = df['label']

In [11]:
# Split dataset into training and testing sets
train_image_paths, test_image_paths, train_labels, test_labels = train_test_split(image_paths, labels, test_size=0.2, random_state=42)

In [12]:
train_image_paths

2886                 dataset/train/snail/snail (145).jpg
2845     dataset/train/grasshopper/grasshopper (432).jpg
957                dataset/train/earwig/earwig (359).jpg
2569                dataset/train/weevil/Weevil (57).jpg
4447    dataset/test/catterpillar/catterpillar (447).jpg
                              ...                       
3772                   dataset/train/moth/moth (325).jpg
5191                    dataset/test/ants/ants (195).jpg
5226                dataset/test/earwig/earwig (138).jpg
5390                dataset/test/weevil/Weevil (195).jpg
860      dataset/train/grasshopper/grasshopper (332).jpg
Name: filename, Length: 4395, dtype: object

In [8]:
# Preprocess images for training set
train_images = [preprocess_image(image_path) for image_path in train_image_paths]

# Preprocess images for testing set
test_images = [preprocess_image(image_path) for image_path in test_image_paths]

# Convert labels to numerical format
label_to_index = {label: i for i, label in enumerate(set(labels))}
train_labels = [label_to_index[label] for label in train_labels]
test_labels = [label_to_index[label] for label in test_labels]

# Create TensorFlow datasets for training and testing
train_dataset = tf.data.Dataset.from_tensor_slices((train_images, train_labels))
test_dataset = tf.data.Dataset.from_tensor_slices((test_images, test_labels))

In [10]:
train_dataset

<_TensorSliceDataset element_spec=(TensorSpec(shape=(224, 224, 3), dtype=tf.float32, name=None), TensorSpec(shape=(), dtype=tf.int32, name=None))>

In [24]:
# Shuffle and batch the training dataset
train_dataset = train_dataset.shuffle(buffer_size=len(train_image_paths)).batch(512)

# Batch the testing dataset (no need to shuffle)
test_dataset = test_dataset.batch(512)

### 2. Prepare your Baseline Model

Here you can have your code to either train (e.g., if you are building it from scratch) or load (e.g., in the case that you are loading a pre-trained model) your model. These steps may require you to use other packages or python files. You can just call them here. You don't have to include them in your submission. Remember that we will be looking at the saved outputs in the notebooked and we will not run the entire notebook.

In [25]:
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

# Freeze all layers of the base model
for layer in base_model.layers:
    layer.trainable = False

In [26]:
x = tf.keras.layers.Flatten()(base_model.output)
x = tf.keras.layers.Dense(128, activation='relu')(x)
predictions = tf.keras.layers.Dense(12, activation='softmax')(x)

In [27]:
model = tf.keras.models.Model(inputs=base_model.input, outputs=predictions)

In [28]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

### 3. Baseline Performance

Make sure to include the following:
- Performance on the training set
- Performance on the test set
- Provide some screenshots of your output (e.g., pictures, text output, or a histogram of predicted values in the case of tabular data). Any visualization of the predictions are welcome.

In [29]:
model.fit(train_dataset, epochs=10)

Epoch 1/10
[1m3/9[0m [32m━━━━━━[0m[37m━━━━━━━━━━━━━━[0m [1m8:13[0m 82s/step - accuracy: 0.0821 - loss: 4.3906 

In [1]:
train_dataset

NameError: name 'train_dataset' is not defined