# Respiratory Disease Classification Model
## 1. Data Gathering
For this activity, a dataset has been provided split between three classes: Covid; Viral Pneumonia, and Normal.

## 2. Preprocessing Data
For data preprocessing, tensorflow's built-in *preprocess input* will be used, as it is the most recent.

In [30]:
import tensorflow as tf
from tensorflow.keras.models import Sequential                                                                     #used to define the model type.
from tensorflow.keras.layers import Activation, Dense, Flatten, BatchNormalization, Conv2D, MaxPool2D, Dropout     #used for defining each layer of the model
from tensorflow.keras.optimizers import Adam                                                                       #used for defining what optimizer the model will use
from tensorflow.keras.metrics import categorical_crossentropy                                                      #used for defining what metrics the model will use
from sklearn.metrics import confusion_matrix                                                                       #used for model evaluation
from tensorflow.keras.preprocessing.image import ImageDataGenerator                                                #used for importing the data from the dataset
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint                                              #implemented for model training
import matplotlib.pyplot as plt                                                                                    #used for plotting the confusion matrix earlier.
%matplotlib inline
import pandas as pd

In [2]:
#Defining the directories of every image.
train_dir = "C:\\Users\\Dingus-Elite\\Desktop\\lung_dataset\\train"
test_dir = "C:\\Users\\Dingus-Elite\\Desktop\\lung_dataset\\test"

In [58]:
#Uses the imported ImageDataGenerator to preprocess every image
#image size is set to 750x750.
#Three classes are defined: the Covid, Viral Pneumonia and Normal.
#Every vatch, 10 imaages will be handled.
#for the valid_batch, I created a subfolder and moved the images there so the ImageDataGenerator works as intended.

train_batch = ImageDataGenerator(preprocessing_function=tf.keras.applications.vgg16.preprocess_input).flow_from_directory(directory=train_dir, target_size=(224,224), classes=['Viral Pneumonia', 'Normal', 'Covid'], batch_size=10)
test_batch = ImageDataGenerator(preprocessing_function=tf.keras.applications.vgg16.preprocess_input).flow_from_directory(directory=test_dir, target_size=(224,224), classes=['Viral Pneumonia', 'Normal', 'Covid'], batch_size=10)
valid_batch = ImageDataGenerator(preprocessing_function=tf.keras.applications.vgg16.preprocess_input).flow_from_directory("C:\\Users\\Dingus-Elite\\Desktop\\lung_dataset\\validation", target_size=(224,224))

Found 249 images belonging to 3 classes.
Found 65 images belonging to 3 classes.
Found 10 images belonging to 1 classes.


## 3. Choosing a Model
As part of the constraints, I'm assigned to using Conv2D as the basis for my model.

In [59]:
#Defining the Model

model = Sequential()

model.add(Conv2D(filters=32, kernel_size=(3, 3), activation='relu', padding = 'same', input_shape=(224,224,3)))          #Defining the 2D Convolution of the model, size is at 750x750
model.add(MaxPool2D(pool_size=(2, 2), strides=2))                                                                        #Used to reduce the image's size
model.add(Dropout(0.2))                                                                                                  #Added to prevent overfitting

model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu', padding = 'same'))
model.add(MaxPool2D(pool_size=(2, 2), strides=2))
model.add(Dropout(0.2)) 

model.add(Conv2D(filters=128, kernel_size=(2, 2), activation='relu', padding = 'same'))
model.add(MaxPool2D(pool_size=(2, 2), strides=2))
model.add(Dropout(0.2)) 

model.add(Flatten())                                                                                                     #Falttens the multidimensional output of the previous
                                                                                                                         #to 1D.
model.add(Dense(units=3, activation='softmax'))                                                                          #Units are set to 3 as there are three categories

model.compile(optimizer=Adam(learning_rate=0.0001), loss='categorical_crossentropy', metrics=['accuracy'])               #compiles the model with categorical crossentropy for the loss
                                                                                                                         #Adam for optimizer with a learning rate of 0.0001, and 'accuracy' for metrics  

## 4. Training
The prepared datasets earlier are fit into the model; the model is then trained.

In [60]:
batch_size = 10                                                                                                         #Since the batches earlier is set at 10, batch_size will be also 10.
model.fit(                                                                                                               #fitting the training data into the model.
    x = train_batch,                                                                                                     #For this, the test_batch will be used as the validation data.
    steps_per_epoch=train_batch.samples // batch_size, 
    epochs=20, 
    validation_data=test_batch, 
    validation_steps=test_batch.samples // batch_size,
    verbose=1)


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


<keras.callbacks.History at 0x1bb87055220>

## 5. Evaluation
To evaluate my model, I used .evaluate function to see how it performed

In [61]:
_, val = model.evaluate(test_batch)
val



0.8769230842590332

## 6. Model Tuning
Several parameters were modified throughout the duration of the test:
- Added an additional Conv2D and MaxPooling layer. 
- Added droupout layers
- Adjusted the kernel size

## 7. Prediction
We then ask the model to predict the valid_data defined earlier:

In [132]:
acquired_values = model.predict(valid_batch)



In [133]:
acquired_values.astype(int)

array([[0, 0, 1],
       [0, 0, 0],
       [0, 0, 1],
       [1, 0, 0],
       [0, 0, 0],
       [0, 0, 1],
       [0, 1, 0],
       [1, 0, 0],
       [0, 1, 0],
       [0, 1, 0]])

In [134]:
prediction = []
for i in range(0,10):
    if acquired_values[i][0] == 1:
        prediction.append("C")
    elif acquired_values[i][1] == 1:
        prediction.append("V")
    elif acquired_values[i][2] == 1:
         prediction.append("N")
    else:
         prediction.append("Indeterminate")
prediction

['N', 'Indeterminate', 'N', 'C', 'Indeterminate', 'N', 'V', 'C', 'V', 'V']

In [135]:
picnos = [""]
for i in range(1,11):
    picnos.append("pic" + str(i))
print(picnos)

['', 'pic1', 'pic2', 'pic3', 'pic4', 'pic5', 'pic6', 'pic7', 'pic8', 'pic9', 'pic10']


In [136]:
prediction_test_out = pd.DataFrame(picnos, columns=[""]).to_csv("C:\\Users\\Dingus-Elite\\Desktop\\billones_cnn_output_ex.csv")
prediction_test_out = pd.DataFrame(prediction, columns=['diagnosis']).to_csv("C:\\Users\\Dingus-Elite\\Desktop\\billones_cnn_output_ex.csv")


NameError: name 'pickle' is not defined