<a href="https://colab.research.google.com/github/mincfranc/Code-Notes/blob/drafts/11_24_24_P6_draft_COPY.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Problem Definition


This project is a supervised classification problem, as it involves training a model on labeled data (images of cats and dogs) to predict the category of new, unseen images.

The task highlights the use of deep learning to automate feature extraction and solve a traditionally challenging computer vision problem.

```
Resources
Feature Set: pickled dataset of images (X.pickle), containing raw image data.
Target Set: pickled dataset of labels (y.pickle), indicating whether an image is a cat or dog.
Test Image: single image (dog.jpg) to validate trained model.
```

# Images – To Do List



Prior to starting this problem, be sure to enable the GPU runtime processing in your Jupyter notebook.



```
Keras is a Python library that is used to fit neural networks.

The following are the common steps you want to take when fitting a neural network using keras:
Load data
Define keras model
Compile model
Fit model
Evaluate model
Use model for prediction
```

11_23_24_2c-Deep.Learning.Example.ipynb


* Write a concise problem definition for the project. Put it in a text field at the top of your Jupyter notebook.

* Load necessary packages.



In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pickle

import tensorflow as tf
import tensorflow.keras as keras

from keras.models import Sequential
from keras.layers import Dense, Flatten, Conv2D, MaxPooling2D
from keras.utils import plot_model

from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler


## Data Collection



* Load pickled data from X.pickle and y.pickle from the AWS S3 bucket.



In [None]:
url_X = 'https://ddc-datascience.s3.amazonaws.com/Projects/Project.6-Images/Data/X.pickle'

In [None]:
!curl -O {url_X}

```
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                               Dload   Upload  Total   Spent    Left  Speed
100  237M  100  237M    0     0  24.5M      0  0:00:09  0:00:09 --:--:-- 27.3M
```

In [None]:
data_X = pd.read_pickle(url_X)
data_X


In [None]:
data_X.shape

In [None]:
type(data_X)

In [None]:
url_y = 'https://ddc-datascience.s3.amazonaws.com/Projects/Project.6-Images/Data/y.pickle'

In [None]:
!curl -O {url_y}

```
 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 49948  100 49948    0     0   115k      0 --:--:-- --:--:-- --:--:--  115k
```

In [None]:
data_y = pd.read_pickle(url_y)
data_y

In [None]:
type(data_y)

In [None]:
len(data_y)

## Data Cleaning



* Scale the values in X so that they fall between 0 and 1 by dividing by 255.



In [None]:
data_Xscaled = data_X / 255
data_Xscaled

## Exploratory Data Analysis



* Look at the shape of X and y. Ensure that X is 4 dimensional.



In [None]:
data_Xscaled.shape

In [None]:
data_X.shape

```
Original data_X and data_Xscaled are 4 dimensional
(24946, 100, 100, 1)
```

In [None]:
len(data_y)

```
data_y is a list and cannot obtain shape, used length=
24946
```

* Plot a few ( >5 ) of the images in X using plt.imshow().

In [None]:
plt.imshow(data_Xscaled[11555])

In [None]:
plt.imshow(data_Xscaled[8277])

In [None]:
plt.imshow(data_Xscaled[2071])

In [None]:
plt.imshow(data_Xscaled[23488])

In [None]:
plt.imshow(data_Xscaled[927])


* Look at the response values in y for those images.

In [None]:
print(data_y[927])
print(data_y[2071])
print(data_y[11555])
print(data_y[23488])
print(data_y[8277])

```
Images data_y values
Looks like dogs == 0 and cats == 1
[927]       1
[2071]      1
[11555]     0
[23488]     0
[8277]      0

```

* Hint: you may want to start with a random subset to get familiar with the process of building a NN.  Then go through the process again with the full set.

```
With for loop iterate through range of indices in data_Xscaled (from 444 through 484 by increments of 10)
```

In [None]:
for c in range(444,485,10):
  plt.imshow(data_Xscaled[c])

In [None]:
# Subset Selection for Demonstration
subset_size = 1000
random_indices = np.random.choice(data_Xscaled.shape[0], size=subset_size, replace=False)
data_Xscaledsub = data_Xscaled[random_indices]
data_ysub = np.array(data_y)[random_indices]

## Data Processing



* Split X and y into training and testing sets.

*  Build a convolutional neural network with the following:
  * Sequential layers
  * At least two 2D convolutional layers using the 'relu' activation function and a (3,3) kernel size.
  * A MaxPooling2D layer after each 2D convolutional layer that has a pool size of (2,2).
  * A dense output layer using the 'sigmoid' activation function.
  Note: you can play around with the number of layers and nodes to try to get better performance.

* Compile your model. Use the 'adam' optimizer. Determine which loss function and metric is most appropriate for this problem.

* Fit your model using the training set.

* Evaluate your model using the testing set.

* Plot the distribution of probabilities for the testing set.

* Define a function that will read in a new image and convert it to a 4 dimensional array of pixels (ask the instructor for help with this). Hint: [numpy.reshape]( https://numpy.org/doc/stable/reference/generated/numpy.reshape.html )

* Use the function defined above to read in the dog.jpg image that is saved in the AWS S3 bucket.

* Use the neural network you created to predict whether the image is a dog or a cat.



* Split X and y into training and testing sets.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(data_Xscaledsub, data_ysub, test_size = 0.25, random_state = 42)

* Build a convolutional neural network with the following:
  * Sequential layers

In [None]:
model = Sequential()

*
  
  * At least two 2D convolutional layers using the 'relu' activation function and a (3,3) kernel size.
  * A MaxPooling2D layer after each 2D convolutional layer that has a pool size of (2,2).



In [None]:
#Define input layer
model.add(
  Conv2D(
    name = "conv_input",
    input_shape = X_train.shape[1:],
    filters = 32,
    kernel_size = (3,3),
    activation = 'relu',
  )
)

model.add(
    MaxPooling2D(pool_size=(2,2))
)


In [None]:
# Define second hidden layer
model.add(
  Conv2D(
    name = "hidden1",
    filters = 64,
    kernel_size = (3,3),
    activation = 'relu',
  )
)

model.add(
  MaxPooling2D(pool_size=(2,2))
)


*
 * A dense output layer using the 'sigmoid' activation function.

    Note: you can play around with the number of layers and nodes to try to get better performance.

In [None]:
# Flatten data to be used in output layer
model.add(
    Flatten()
)

# Define output layer
model.add(
  Dense(
    name = "output",
    units = 8,
    activation = 'sigmoid'
  )
)

* Compile your model. Use the 'adam' optimizer. Determine which loss function and metric is most appropriate for this problem.

In [None]:
# Compile model
model.compile(
  optimizer = 'adam',  # Tells Keras model how to learn using algorithm which updates weights of neural network during training
  loss = 'binary_crossentropy', #Tells Keras what to aim for using fx measuring diff bt predicted prollys & true labels in trying to predict cat or dog
  metrics = ['accuracy'], #Tells Keras how to judge its model's performance measured by its accuracy in making predictions (% correctly classified examples)
)

* Fit your model using the training set.

In [None]:
model.fit(X_train, y_train, epochs=8)

In [None]:
# Actual training of neural network happening here
# model.fit(X_train, y_train, epochs=10, batch_size=32)

#Model learns to distinguish between cats and dogs based on training data.

In [None]:
model.summary()

* Evaluate your model using the testing set.

In [None]:
# Evaluate the Model
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Test Loss: {loss:.4f}")
print(f"Test Accuracy: {accuracy:.4f}")

* Plot the distribution of probabilities for the testing set.


In [None]:
plot_model(model)

* Define a function that will read in a new image and convert it to a 4 dimensional array of pixels (ask the instructor for help with this). Hint: numpy.reshape

* Use the function defined above to read in the dog.jpg image that is saved in the AWS S3 bucket.

In [None]:
# https://ddc-datascience.s3.amazonaws.com/Projects/Project.6-Images/Data/dog.jpg

* Use the neural network you created to predict whether the image is a dog or a cat.

## Communication of Results



* Communicate the results of your analysis.



## **BONUS** (optional)



* Upload an image of your (or your friend's or family's) dog or cat and use your model to predict whether the image is a dog or cat.
* Hint: you'll probably need to convert the image from color to grayscale.  OpenCV, pillow, and other libraries are your friend.