# Homework 3 (Convolutional Neural Networks)

Choose a dataset that you're interested in from among these options (or choose your own data set as long as it's large enough and **you check with me** in advance to make sure it'll work):

- [Grape Disease Detection Data](https://www.kaggle.com/datasets/rm1000/augmented-grape-disease-detection-dataset)
- [Indian Bird Data](https://www.kaggle.com/datasets/arjunbasandrai/25-indian-bird-species-with-226k-images)
- [Skin Cancer Classification](https://www.kaggle.com/datasets/kylegraupe/skin-cancer-binary-classification-dataset)
- [Fruit and Veg Detection Data](https://www.kaggle.com/datasets/kritikseth/fruit-and-vegetable-image-recognition)
- [Large Scale Fish Data](https://www.kaggle.com/datasets/crowww/a-large-scale-fish-dataset)
- [Berkeley Segmentation Data](https://www.kaggle.com/datasets/balraj98/berkeley-segmentation-dataset-500-bsds500)

Then Build a Deep CONVOLUTIONAL Neural Network (No Recurrent Layers, no Transfer Learning unless approved by Dr. P ahead of time, no Generative Models) using keras/tensorflow (at least 5 Convolutional Layers, and at least 3 Pooling Layers) to do one of the following tasks:

- Classify Images (e.g. Hot Dog vs. Not a Hot Dog)
- Compress Images (e.g. with a Denoising Convolutional AutoEncoder)
- Detect/Segment Objects (e.g. what pixels in the image contain a cat?)

Make sure that:

- your NN has some sort of regularization (or multiple types if needed)
- you've properly formatted and inputted your data into the network
- your model architechture and loss function are appropriate for the problem
- you print out at least 2 metrics for both train and test data to examine


Then create a **technical report** discussing your model building process, the results, and your reflection on it. The report should follow the format in the example including an Introduction, Analysis, Methods, Results, and Reflection section.

In [35]:
import os
import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd

import tensorflow.keras as kb
from tensorflow.keras import backend
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from keras.layers import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense

train_dir = 'C:/Users/isaac/Documents/datasets/fruitveggie/train'
test_dir = 'C:/Users/isaac/Documents/datasets/fruitveggie/test'
validation_dir = 'C:/Users/isaac/Documents/datasets/fruitveggie/validation'

batch_size = 32
image_height = 160
image_width = 160

train_datagen = ImageDataGenerator(
    rotation_range = 40,
    width_shift_range = 0.2,
    height_shift_range = 0.2,
    rescale = 1./255,
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True,
    fill_mode = 'nearest'
)

test_datagen = ImageDataGenerator(rescale = 1./255)

train_gen = train_datagen.flow_from_directory(
    train_dir,
    target_size = (image_height, image_width),
    batch_size = batch_size,
    class_mode = 'categorical'
)

validation_gen = test_datagen.flow_from_directory(
    validation_dir,
    target_size = (image_height, image_width),
    batch_size = batch_size,
    class_mode = 'categorical'
)



Found 3115 files belonging to 36 classes.
Found 351 files belonging to 36 classes.
['apple', 'banana', 'beetroot', 'bell pepper', 'cabbage', 'capsicum', 'carrot', 'cauliflower', 'chilli pepper', 'corn', 'cucumber', 'eggplant', 'garlic', 'ginger', 'grapes', 'jalepeno', 'kiwi', 'lemon', 'lettuce', 'mango', 'onion', 'orange', 'paprika', 'pear', 'peas', 'pineapple', 'pomegranate', 'potato', 'raddish', 'soy beans', 'spinach', 'sweetcorn', 'sweetpotato', 'tomato', 'turnip', 'watermelon']


In [42]:
model = kb.Sequential([
    tf.keras.layers.Conv2D(32, (3,3), activation = 'relu', input_shape=(image_height, image_width, 3)),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Conv2D(64, (3,3), activation = 'relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Conv2D(128, (3,3), activation = 'relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Conv2D(256, (3,3), activation = 'relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Conv2D(512, (3,3), activation = 'relu'),
    tf.keras.layers.MaxPooling2D(2,2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(128, activation='relu'),
    tf.keras.layers.Dropout(0.3),
    tf.keras.layers.Dense(36, activation = 'softmax')
])

model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_10 (Conv2D)          (None, 158, 158, 32)      896       
                                                                 
 max_pooling2d_10 (MaxPoolin  (None, 79, 79, 32)       0         
 g2D)                                                            
                                                                 
 conv2d_11 (Conv2D)          (None, 77, 77, 64)        18496     
                                                                 
 max_pooling2d_11 (MaxPoolin  (None, 38, 38, 64)       0         
 g2D)                                                            
                                                                 
 conv2d_12 (Conv2D)          (None, 36, 36, 128)       73856     
                                                                 
 max_pooling2d_12 (MaxPoolin  (None, 18, 18, 128)     

In [43]:
history = model.fit(
    train_gen,
    steps_per_epoch = train_gen.n,
    epochs = 100,
    validation_data = validation_gen,
    validation_steps = validation_gen.n)


Epoch 1/100
  20/3115 [..............................] - ETA: 32:18 - loss: 3.5882 - accuracy: 0.0275

KeyboardInterrupt: 

# Introduction
An introduction should introduce the problem you're working on, give some background and relevant detail for the reader, and explain why it is important. 

# Analysis 
Any exploratory analysis of your data, and general summarization of the data (e.g. summary statistics, correlation heatmaps, graphs, information about the data...). This can also include any cleaning and joining you did. 

# Methods
Explain the structure of your model and your approach to building it. This can also include changes you made to your model in the process of building it. Someone should be able to read your methods section and *generally* be able to tell exactly what architechture you used. 

# Results
Detailed discussion of how your model performed, and your discussion of how your model performed.

# Reflection
Reflections on what you learned/discovered in the process of doing the assignment. Things you would do differently in the future, ways you'll approach similar problems in the future, etc.


# What to Turn In

- PDF of your technical report
- your code as a .py, .ipynb, or link to github (you must turn it in either as a file, or a link to something that has timestamps of when the file was last edited)
- a README file as a .txt or .md