# **Projeto de Deep learning grupo 12**




Bussiness case: Predição de tipos de fraturas

Grupo:


*   Anderson
*   Gabriel
*   Mariana

Estruturação do notebook:


```
|--Projeto de Deep learning grupo 12
  |-- Configurando o ambiente geral
      |-- Bibliotecas
      |-- Funções auxiliares
  |-- Configurando o ambiente para o modelo multiclasse
      |-- Importação das imagens do drive
      |-- Tratamento das imagens
  |-- Modelo multiclasse baseline
  |-- Modelo multiclasse ResNet50
  |-- Configurando o ambiente para o modelo binário
      |-- Importação das imagens do drive
      |-- Tratamento das imagens
  |-- Modelo binário baseline
  |-- Modelo binário ResNet50
```



Instruções para rodar o notebook:


1.   Rodar a configuração do ambiente geral
1.   Executar a configuração para o ambiente multiclasse (se atentar para localização dos arquivos)
1.   Executar os modelos multi-classe baseline e ResNet50
2.   Executar a configuração para o ambiente do modelo binário (se atentar para localização dos arquivos)
2.   Executar os modelos binários baseline e ResNet50
2.   Item da lista



## Configurando o ambiente geral

### Bibliotecas

Executar esse comando só se for necessário resetar toda a sessão e limpar variaveis salvas

In [None]:
#exit()

In [None]:
!pip install keras-visualizer
!pip install tensorflow==2.12.0

In [None]:
import glob
import pandas as pd
import time
import numpy as np
import joblib
import matplotlib.pyplot as plt
import time
import joblib
import tensorflow as tf
from tensorflow import keras
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import ConfusionMatrixDisplay
from keras.utils import to_categorical
from keras.models import Sequential
from keras.regularizers import L1, L2, L1L2
from keras_visualizer import visualizer
from PIL import Image
from keras.utils import img_to_array, load_img
from tqdm import tqdm
from tensorflow.python.keras.layers import Dense, Flatten, Dropout, Conv2D, MaxPooling2D, AveragePooling2D
from tensorflow.python.keras.models import Sequential
from tensorflow.keras.applications import ResNet50
from tensorflow.keras import layers, Model
from sklearn.model_selection import train_test_split
from keras.utils import img_to_array, load_img

In [None]:
from google.colab import drive
drive.mount('/content/drive')

### Funções auxiliares

In [None]:
def plot_img(img_array, plot_axis=False):
    img_pil = Image.fromarray(img_array.astype('uint8'))
    if plot_axis:
        return plt.imshow(img_pil)
    return img_pil

def plot_history(history):
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))
    ax1.plot(history.history['loss'], 'r-', label='train loss')
    ax1.plot(history.history['val_loss'], 'b--', label='test loss')
    ax1.set_title('Model Loss')
    ax1.set_xlabel('Epochs')
    ax1.set_ylabel('Loss')
    ax1.legend()
    ax2.plot(history.history['accuracy'], 'r-', label='train acc')
    ax2.plot(history.history['val_accuracy'], 'b--', label='test acc')
    ax2.set_title('Model Accuracy')
    ax2.set_xlabel('Epochs')
    ax2.set_ylabel('Accuracy')
    ax2.legend()
    plt.tight_layout()

def plot_random_imgs(df, rows=2, columns=5, figsize=(8, 4), show_predictions=False):
    fig, axs = plt.subplots(rows, columns, figsize=figsize)

    idx_img = list(np.random.choice(list(df.index), rows*columns, replace=False))
    print(idx_img)
    for i, ax in enumerate(axs.flat):
        title = f'{df.target.iloc[idx_img[i]]}'

        if show_predictions and 'predicted_class' in df.columns:
            title = title + f' | Pred:{df.predicted_class.iloc[idx_img[i]]} ( {df.target_proba.iloc[idx_img[i]]:.3f})'

        ax.imshow(plt.imread(df.full_path.iloc[idx_img[i]]))
        ax.set_title(title)
        ax.axis('off')
    plt.tight_layout()

def resize_convert_to_array(full_path, img_size=(32,32)): # resize
    try:
        return img_to_array(load_img(full_path).resize(img_size)) / 255 # standartazing
    except Exception as e:
        print(e)
        return np.array([])

def save_obj(obj, full_path):
    try:
        joblib.dump(obj, full_path)
    except Exception as e:
        print(e)

def load_obj(full_path):
    try:
        obj = joblib.load(full_path)
        return obj
    except Exception as e:
        print(e)

def resize_convert_to_array(full_path, img_size=(32,32)): # resize
    try:
        return img_to_array(load_img(full_path).resize(img_size)) / 255 # standartazing
    except Exception as e:
        print(e)
        return np.array([])

## Configurando o ambiente para o modelo multiclasse

### Importação das imagens do drive


As imagens do drive devem estar organizadas na seguinte estrutura de diretórios


```
|-- content/
    |--drive/
        |--folder/
            |--folder-2/
                |--folder-3/
                    |--final-folder/
                        |--class-01
                        |--class-02
                        |--class-03
                        |--class-04
                              .
                              .
                              .
                        |--class-n
  
```



Definição dos diretórios e construção do dataframe de teste e train, atualizar imagens como novo dataset aug

In [None]:
### MUDAR O CAMINHO DOS ARQUIVOS ###
BASE_PATH ='/content/drive/MyDrive/dataset-aug/' ### partindo com classes balanceadas: data augmentation foi utilizado para gerar mais imagens a fim de balancear as classes
image_files = glob.glob(f'{BASE_PATH}/**/*.jpg', recursive=True)
df = pd.DataFrame(image_files, columns=['full_path'])
aux = df.iloc[0]['full_path']
aux = aux.split('/')
df['type_dataset'] = df.full_path.apply(lambda x: x.split('/')[-2])
df['target'] = df.full_path.apply(lambda x: x.split('/')[-3])
df['filename'] = df.full_path.apply(lambda x: x.split('/')[-1])

Contagem das imagens com separação de teste e treino

In [None]:
df.value_counts(['type_dataset', 'target']).to_frame()

### Tratamento das imagens

Montagem dos arrays de teste, treino e classes de predição

Formatando as imagens para 224 x 224 devida a maior compatilidade com a CNN. Dados tambem foram normalizados (/255) - contido na funcao auxiliar resize_convert_to_array

In [None]:
new_imgs_size = (224, 224)
imgs_train = []
targets_train = []
imgs_test = []
targets_test = []

for line in tqdm(df.itertuples(), total=df.shape[0]):
    if line.type_dataset == 'train':
        aux_array = resize_convert_to_array(line.full_path, new_imgs_size)
        if len(aux_array) == 0:
            df.loc[line.Index, 'img_processada'] = False
        else:
            imgs_train.append(aux_array)
            targets_train.append(line.target)
            df.loc[line.Index, 'img_processada'] = True
    elif line.type_dataset == 'test':
        aux_array = resize_convert_to_array(line.full_path, new_imgs_size)
        if len(aux_array) == 0:
            df.loc[line.Index, 'img_processada'] = False
        else:
            imgs_test.append(aux_array)
            targets_test.append(line.target)
            df.loc[line.Index, 'img_processada'] = True

Montagem dos arrays de teste e treino

In [None]:
X_train = np.array(imgs_train)
y_train = np.array(targets_train)
X_test = np.array(imgs_test)
y_test = np.array(targets_test)
X_train.shape, y_train.shape, X_test.shape, y_test.shape

In [None]:
!mkdir -p /content/drive/MyDrive/datasets-processed/fractures

In [None]:
### save arrays
save_obj(X_train, '/content/drive/MyDrive/datasets-processed/fractures/X_train.joblib')
save_obj(X_test, '/content/drive/MyDrive/datasets-processed/fractures/X_test.joblib')
save_obj(y_train, '/content/drive/MyDrive/datasets-processed/fractures/y_train.joblib')
save_obj(y_test, '/content/drive/MyDrive/datasets-processed/fractures/y_test.joblib')

In [None]:
### load if necessary
X_train = load_obj('/content/drive/MyDrive/datasets-processed/fractures/X_train.joblib')
X_test = load_obj('/content/drive/MyDrive/datasets-processed/fractures/X_test.joblib')
y_train = load_obj('/content/drive/MyDrive/datasets-processed/fractures/y_train.joblib')
y_test = load_obj('/content/drive/MyDrive/datasets-processed/fractures/y_test.joblib')

Montagem das classes para predição

In [None]:
num_classes = np.unique(y_train).size
print(num_classes)

In [None]:
le = LabelEncoder() #make the cattegories (10)
y_train_encoder = le.fit_transform(y_train)
y_test_encoder = le.transform(y_test)
y_train_categorical = to_categorical(y_train_encoder, num_classes)
y_test_categorical = to_categorical(y_test_encoder, num_classes)

In [None]:
y_train_categorical[:3]

In [None]:
X_train.shape

In [None]:
X_test.shape

In [None]:
y_train_categorical.shape

In [None]:
input_shape = X_train[0].shape
input_shape,
num_classes
input_shape, num_classes

##Modelo multiclass Baseline

Montagem da arquiteura do modelo

In [None]:
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(3, 3), activation='relu', input_shape=input_shape,
                   strides=(1,1), padding='same',kernel_regularizer=L1()))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu',strides=(1,1), padding='same', kernel_regularizer=L1()))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(num_classes, activation='softmax'))

Sumário do modelo

In [None]:
model.summary()

Compilando o modelo

In [None]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Treinando o modelo para os dados de treino gerados, utilizando:

*   150 épocas
*   32 amostras




In [None]:
history1 = model.fit(X_train, y_train_categorical, validation_data=(X_test, y_test_categorical), batch_size= 32, epochs=150, verbose = 1)

Métricas do modelo baseline

In [None]:
plot_history(history1)

In [None]:
model.evaluate(X_test, y_test_categorical)

Realizando predições

In [None]:
predictions = model.predict(X_test)

In [None]:
predicted_classes = np.argmax(predictions, axis=1)
predicted_classes

Dataframe de predito Vs real

In [None]:
df_test = df[df['type_dataset'] == 'test'].copy().reset_index(drop=True)

In [None]:
df_test['predicted_class'] = le.inverse_transform(predicted_classes)

In [None]:
df_test['target_proba'] = np.max(predictions, axis=1)

In [None]:
df_test.head(3)

Matriz de confusão

In [None]:
ConfusionMatrixDisplay.from_predictions(df_test['target'], df_test['predicted_class'], xticks_rotation='vertical')

Verificação de quais tipos de fraturas o modelo mais errou

In [None]:
df_diff = df_test[df_test['target'] != df_test['predicted_class']].copy().reset_index(drop=True)
df_diff.head(13)

In [None]:
plot_random_imgs(df_diff, 3, 4, show_predictions=True, figsize=(14, 14))

## Modelo multiclass ResNet50

In [None]:
num_classes = 10
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

for layer in base_model.layers:
    layer.trainable = False

inputs = base_model.input

x = base_model.output
x = layers.Dropout(0.5)(x)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Flatten()(x)
outputs = layers.Dense(num_classes, activation='softmax')(x)

model = Model(inputs=inputs, outputs=outputs)

Sumário do modelo

In [None]:
model.summary()

Compilação do modelo

In [None]:
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Treinando o modelo para os dados de treino gerados, utilizando:

*   150 épocas
*   32 amostras

In [None]:
history2 = model.fit(X_train, y_train_categorical, validation_data=(X_test, y_test_categorical), batch_size= 32, epochs=150, verbose = 1)

Métricas do modelo

In [None]:
plot_history(history2)

In [None]:
model.evaluate(X_test, y_test_categorical)

In [None]:
predictions = model.predict(X_test)

In [None]:
predicted_classes = np.argmax(predictions, axis=1) # extract the predictions indexes
predicted_classes

Dataframe de predito Vs real

In [None]:
df_test = df[df['type_dataset'] == 'test'].copy().reset_index(drop=True)

In [None]:
df_test['target_proba'] = np.max(predictions, axis=1)

In [None]:
df_test['predicted_class'] = le.inverse_transform(predicted_classes)

In [None]:
df_test.head(3) # show expected and predicted

Matriz de confusão

In [None]:
ConfusionMatrixDisplay.from_predictions(df_test['target'], df_test['predicted_class'], xticks_rotation='vertical')

Dataframe de predito Vs real

In [None]:
df_diff = df_test[df_test['target'] != df_test['predicted_class']].copy().reset_index(drop=True)
df_diff.head(13)

In [None]:
plot_random_imgs(df_diff, 3, 4, show_predictions=True, figsize=(14, 14))

## Configurando o ambiente para o modelo binário

### Importando imagens do drive

In [None]:
### MUDAR O CAMINHO DOS ARQUIVOS ###
IMGS_DIR = '/content/drive/MyDrive/gravidade' ## imagens originais
image_files = glob.glob(f"{IMGS_DIR}/**/*.*", recursive=True)
len(image_files)

In [None]:
df = pd.DataFrame(image_files, columns=['full_path'])
df.head()

Montando dataframe com alvo e localização da imagem

In [None]:
df['target'] = df.full_path.apply(lambda path_complete: path_complete.split('/')[-2])
df['filename'] = df.full_path.apply(lambda path_complete: path_complete.split('.')[-2])
df['extension'] = df.full_path.apply(lambda path_complete: path_complete.split('.')[-1])
df.head(5)

### Tratamento das imagens

Divisão treino e teste

In [None]:
df_train, df_test = train_test_split(df, test_size=0.2, stratify=df.target, random_state=42)
df_train.target.value_counts(1)

In [None]:
df_train['type_dataset'] = 'train'
df_test['type_dataset'] = 'test'

df = pd.concat([df_train, df_test])
df.head(10)

In [None]:
df.value_counts(['type_dataset', 'target']).to_frame() # distribution for each Target class

Alterando o formato das imagens

In [None]:
new_imgs_size = (224, 224)
imgs_train = []
targets_train = []

imgs_test = []
targets_test = []

for line in tqdm(df.itertuples(), total=df.shape[0]):
    if line.type_dataset == 'train':
        aux_array = resize_convert_to_array(line.full_path, new_imgs_size)
        if len(aux_array) == 0:
            df.loc[line.Index, 'img_processada'] = False
        else:
            imgs_train.append(aux_array)
            targets_train.append(line.target)
            df.loc[line.Index, 'img_processada'] = True
    elif line.type_dataset == 'test':
        aux_array = resize_convert_to_array(line.full_path, new_imgs_size)
        if len(aux_array) == 0:
            df.loc[line.Index, 'img_processada'] = False
        else:
            imgs_test.append(aux_array)
            targets_test.append(line.target)
            df.loc[line.Index, 'img_processada'] = True


Gerando arrays de teste e treino

In [None]:
X_train = np.array(imgs_train)
y_train = np.array(targets_train)
X_test = np.array(imgs_test)
y_test = np.array(targets_test)

X_train.shape, y_train.shape, X_test.shape, y_test.shape

In [None]:
save_obj(X_train, '/content/drive/MyDrive/datasets-processed/fractures-binary/X_train.joblib')
save_obj(X_test, '/content/drive/MyDrive/datasets-processed/fractures-binary/X_test.joblib')
save_obj(y_train, '/content/drive/MyDrive/datasets-processed/fractures-binary/y_train.joblib')
save_obj(y_test, '/content/drive/MyDrive/datasets-processed/fractures-binary/y_test.joblib')

In [None]:
X_train = load_obj('/content/drive/MyDrive/datasets-processed/fractures-binary/X_train.joblib')
X_test = load_obj('/content/drive/MyDrive/datasets-processed/fractures-binary/X_test.joblib')
y_train = load_obj('/content/drive/MyDrive/datasets-processed/fractures-binary/y_train.joblib')
y_test = load_obj('/content/drive/MyDrive/datasets-processed/fractures-binary/y_test.joblib')

In [None]:
num_classes = np.unique(y_train).size ## number of classes independent of format
print(num_classes)

In [None]:
le = LabelEncoder()

y_train_encoder = le.fit_transform(y_train)
y_test_encoder = le.transform(y_test)

y_train_categorical = to_categorical(y_train_encoder, num_classes)
y_test_categorical = to_categorical(y_test_encoder, num_classes)

In [None]:
input_shape = X_train[0].shape
input_shape,
num_classes
input_shape, num_classes

## Modelo binário baseline

In [None]:
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(3, 3), activation='relu', input_shape=input_shape,
                   strides=(1,1), padding='same',kernel_regularizer=L1()))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(filters=64, kernel_size=(3, 3), activation='relu',strides=(1,1), padding='same', kernel_regularizer=L1()))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(num_classes, activation='softmax'))

Sumário do modelo

In [None]:
model.summary()

Compilação do modelo

In [None]:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

Treinando o modelo

In [None]:
history3 = model.fit(X_train, y_train_categorical, validation_data=(X_test, y_test_categorical), batch_size= 32, epochs=150, verbose = 1)

In [None]:
plot_history(history3)

In [None]:
model.evaluate(X_test, y_test_categorical)

In [None]:
predictions = model.predict(X_test)

In [None]:
predicted_classes = np.argmax(predictions, axis=1)
predicted_classes

In [None]:
df_test = df[df['type_dataset'] == 'test'].copy().reset_index(drop=True)
df_test['predicted_class'] = le.inverse_transform(predicted_classes)
df_test['target_proba'] = np.max(predictions, axis=1)
df_test.head(3)

In [None]:
ConfusionMatrixDisplay.from_predictions(df_test['target'], df_test['predicted_class'])

Verificação de quais tipos de fraturas o modelo mais errou

In [None]:
df_diff = df_test[df_test['target'] != df_test['predicted_class']].copy().reset_index(drop=True)
df_diff.head(13)

In [None]:
plot_random_imgs(df_diff, 3, 4, show_predictions=True, figsize=(14, 14))