Para replicar el referenciamiento de ImageNet en tu propio dataset de imágenes almacenado en Amazon S3 y ejecutar la transferencia de aprendizaje con VGG16, debes seguir varios pasos. Aquí tienes una guía general sobre cómo hacerlo:


In [1]:
# Librerías estándar
import re
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from io import BytesIO

# TensorFlow y Keras
import tensorflow as tf
from tensorflow.keras.applications import VGG16
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.layers import (Conv2D, Dense, Dropout, Flatten, GlobalAveragePooling2D, MaxPooling2D, Reshape)
from tensorflow.keras.models import Model, load_model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.image import ImageDataGenerator, img_to_array, load_img

# Otras librerías
import os
import boto3
from PIL import Image
from sklearn.metrics import (accuracy_score, auc, average_precision_score, classification_report,
                             confusion_matrix, precision_score, recall_score)
from sklearn.model_selection import train_test_split



2023-11-11 16:54:11.026158: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2, in other operations, rebuild TensorFlow with the appropriate compiler flags.


# Entrenamiento Animal y No Animal

In [5]:
# Carga la base de datos de Train y Test
train = pd.read_pickle('ArchivosUtiles/trainingAnimal.pkl')
test = pd.read_pickle('ArchivosUtiles/testingAnimal.pkl')

In [69]:
X_train = train['Imagen']
y_train = train['Animal']
X_test = test['Imagen']
y_test = test['Animal']

In [70]:
# Convierte los datos de entrenamiento y etiquetas en tensores de TensorFlow
X_train_tf = tf.convert_to_tensor(np.array([img_to_array(img) for img in X_train]))
X_test_tf = tf.convert_to_tensor(np.array([img_to_array(img) for img in X_test]))

## Descarga del modelo VGG16 pre-entrenado:
Descarga el modelo VGG16 pre-entrenado con pesos de ImageNet. Puedes hacerlo utilizando TensorFlow o Keras.

In [71]:
input_shape = (150, 150, 3)

In [72]:
# Carga el modelo VGG16 preentrenado con pesos de ImageNet (no incluye las capas densas superiores)
base_model = VGG16(weights='imagenet', include_top=False, input_shape=input_shape)

## Generador de datos:

Utiliza un generador de datos de Keras para cargar y preprocesar tus imágenes desde S3. Debes proporcionar la ruta a tus imágenes en S3 y etiquetas correspondientes. Aquí un ejemplo de cómo configurar un generador de datos:

In [73]:
# Número de clases en tu conjunto de datos
num_classes = len(y_train.unique())

In [74]:
# Tamaño del lote (batch size) que deseas utilizar durante el entrenamiento
batch_size = 32

## Entrenamiento del modelo:

Añade capas personalizadas en la parte superior del modelo VGG16 y entrena el modelo en tus datos utilizando el generador de datos. Asegúrate de congelar las capas base de VGG16 para que no se actualicen durante el entrenamiento.

In [75]:
# Agregar capas personalizadas en la parte superior del modelo base
x = Flatten()(base_model.output) #Flatten output to 1 dimension
x = Dense(1024,activation='relu')(x) #Añade una layer con Relu activation
x = Dropout(0.2)(x) #Añade un dropout rate de 0.2
predictions = Dense(1, activation = 'sigmoid')(x)

In [76]:
# Crear el modelo final
model = Model(inputs=base_model.input, outputs=predictions)

In [77]:
# Congelar las capas del modelo base para el transfer learning
for layer in base_model.layers:
    layer.trainable = False

In [78]:
# Compilar el modelo
model.compile(optimizer=Adam(learning_rate=0.001), loss='binary_crossentropy', metrics=['accuracy'])

In [79]:
# Define the EarlyStopping callback
early_stopping = EarlyStopping(
    monitor='val_loss',  # Monitor validation loss
    patience=5,           # Number of epochs with no improvement after which training will be stopped
    restore_best_weights=True  # Restore model weights from the epoch with the best value of the monitored quantity
)

In [None]:
# Entrena el modelo con los datos de entrenamiento
model.fit(X_train_tf, y_train, epochs=30, batch_size=32, validation_split=0.2, callbacks=[early_stopping])

In [None]:
# Guardar el modelo entrenado
model.save('ModelosFinales/modeloAnimalVGG16.h5')

In [None]:
y_proba = model.predict(X_test_tf)
y_pred = (y_proba >= 0.5).astype(int)



In [None]:
# Calculamos Accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")

# Calculamos Recall
recall = recall_score(y_test, y_pred, pos_label=1, average='binary')
print(f"Recall: {recall}")

# Calculamos Specificity
tn, fp, fn, tp = confusion_matrix(y_test, y_pred).ravel()
specificity = tn / (tn + fp)
print(f"Specificity: {specificity}")

Accuracy: 0.9696714406065712
Recall: 0.9854838709677419
Specificity: 0.9523809523809523


In [None]:
# Calcular average precision
ap = average_precision_score(y_test, y_proba)

print("Average Precision (AP):", ap)

Average Precision (AP): 0.9867896712891505


In [None]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.98      0.95      0.97       567
           1       0.96      0.99      0.97       620

    accuracy                           0.97      1187
   macro avg       0.97      0.97      0.97      1187
weighted avg       0.97      0.97      0.97      1187



# Entrenamiento Guanaco y No Guanaco

In [69]:
# Carga la base de datos de Train y Test
train = pd.read_pickle('ArchivosUtiles/trainingGuanaco.pkl')
test = pd.read_pickle('ArchivosUtiles/testingGuanaco.pkl')

In [70]:
X_train = train['Imagen']
y_train = train['Guanaco']
X_test = test['Imagen']
y_test = test['Guanaco']

In [71]:
# Convierte los datos de entrenamiento y etiquetas en tensores de TensorFlow
X_train_tf = tf.convert_to_tensor(np.array([img_to_array(img) for img in X_train]))
X_test_tf = tf.convert_to_tensor(np.array([img_to_array(img) for img in X_test]))

## Balanceo

In [72]:
y_train.value_counts()

Guanaco
True     1187
False     539
Name: count, dtype: int64

La base esta desbalanceada ya que existen muchos mas guanacos que otros animales. Para resolver esto se somete al resto de los animales a tecnicas de *Data Augmentation*

Data Augmentation: make training set larger by applying transformations. More information to learn from.
- Brightness and Contrast adjustments
- Rotations
- Gaussian noise
- Mirroring

In [73]:
def data_augmentation(image_tensor):

    # Convierte el tensor de imagen a una imagen TensorFlow
    image = tf.convert_to_tensor(image_tensor, dtype=tf.float32)

    # Brightness and Contrast adjustments
    if np.random.rand() < 0.8:
        image = tf.image.adjust_brightness(image, delta=0.2)  # Cambiar el brillo
        image = tf.image.adjust_contrast(image, contrast_factor=1.2)  # Cambiar el contraste

    # Rotations
    if np.random.rand() < 0.7:
        degrees = np.random.uniform(-10, 10)  # Rotación aleatoria entre -10 y 10 grados
        degrees = int(round(degrees))  # Redondea los grados a un entero
        image = tf.image.rot90(image, k=degrees // 90)  # Rotar la imagen

    # Gaussian noise
    if np.random.rand() < 0.2:
        noise = tf.random.normal(shape=tf.shape(image), mean=0.0, stddev=0.1)
        image = image + noise

    # Mirroring (flip horizontal)
    if np.random.rand() < 0.5:
        image = tf.image.flip_left_right(image)

    # Convierte la imagen aumentada de nuevo a un tensor
    augmented_image_tensor = tf.convert_to_tensor(image.numpy(), dtype=tf.float32)

    return augmented_image_tensor

In [74]:
# Crea una nueva lista para almacenar los tensores de imágenes aumentados
imagen_tensor_aumentada = []

# Itera a través de las filas del DataFrame y aplica la función de aumento de datos
for index, row in train.iterrows():
    if np.random.rand() < 0.38:
        imagen_tensor = row['Imagen']
        imagen_aumentada = data_augmentation(imagen_tensor)
        imagen_tensor_aumentada.append(imagen_aumentada)

In [75]:
train_augmentation = pd.DataFrame(columns=['Imagen', 'Guanaco'])
train_augmentation['Imagen'] = imagen_tensor_aumentada
train_augmentation['Guanaco'] = False

In [76]:
train_augmentation= pd.concat([train, train_augmentation], ignore_index=True)

In [77]:
train_augmentation['Guanaco'].value_counts()

Guanaco
False    1203
True     1187
Name: count, dtype: int64

In [78]:
X_train = train_augmentation['Imagen']
y_train = train_augmentation['Guanaco']

In [79]:
X_train_tf = tf.convert_to_tensor(np.array([img_to_array(img) for img in X_train]))

## Descarga del modelo VGG16 pre-entrenado:
Descarga el modelo VGG16 pre-entrenado con pesos de ImageNet. Puedes hacerlo utilizando TensorFlow o Keras.

In [103]:
input_shape = (150, 150, 3)

In [104]:
# Carga el modelo VGG16 preentrenado con pesos de ImageNet (no incluye las capas densas superiores)
base_model = VGG16(weights='imagenet', include_top=False, input_shape=input_shape)

## Entrenamiento del modelo:

Añade capas personalizadas en la parte superior del modelo VGG16 y entrena el modelo en tus datos utilizando el generador de datos. Asegúrate de congelar las capas base de VGG16 para que no se actualicen durante el entrenamiento.

In [105]:
# Agregar capas personalizadas en la parte superior del modelo base
x = Flatten()(base_model.output)  # Flatten output to 1 dimension

# Agregar más capas densas
x = Dense(1024, activation='relu')(x)  # Añade una capa con Relu activation
x = Dropout(0.2)(x)  # Añade un dropout rate de 0.2

# Agregar más capas densas
x = Dense(512, activation='relu')(x)
x = Dropout(0.3)(x)

# Agregar más capas densas
x = Dense(256, activation='relu')(x)
x = Dropout(0.2)(x)

# Agregar más capas densas
x = Dense(128, activation='relu')(x)
x = Dropout(0.2)(x)

# Capa de salida
predictions = Dense(1, activation='sigmoid')(x)

In [106]:
# Crear el modelo final
model = Model(inputs=base_model.input, outputs=predictions)

In [107]:
# Congelar las capas del modelo base para el transfer learning
for layer in base_model.layers:
    layer.trainable = False

In [108]:
# Define the EarlyStopping callback
early_stopping = EarlyStopping(
    monitor='val_loss',  # Monitor validation loss
    patience=5,           # Number of epochs with no improvement after which training will be stopped
    restore_best_weights=True  # Restore model weights from the epoch with the best value of the monitored quantity
)

In [109]:
# Compilar el modelo
model.compile(optimizer=Adam(learning_rate=0.01), loss='binary_crossentropy', metrics=['accuracy'])

In [110]:
# Entrenar el modelo con los datos de entrenamiento
model.fit(X_train_tf, y_train, epochs=30, batch_size=32, validation_split=0.2, callbacks=[early_stopping])

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30


<keras.callbacks.History at 0x7f936150e850>

In [None]:
# Guardar el modelo entrenado
model.save('ModelosFinales/modeloGuanacoVGG16.h5')

In [111]:
y_proba = model.predict(X_test_tf)
y_pred = (y_proba >= 0.5).astype(int)



In [102]:
# Calculamos Accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")

# Calculamos Recall
recall = recall_score(y_test, y_pred, pos_label=1, average='binary')
print(f"Recall: {recall}")

# Calculamos Precision
precision = precision_score(y_test, y_pred, pos_label=1, average='binary')
print(f"Precision: {precision}")

# Calculamos Specificity
tn, fp, fn, tp = confusion_matrix(y_test, y_pred).ravel()
specificity = tn / (tn + fp)
print(f"Specificity: {specificity}")

# Accuracy: 0.8283752860411899
# Recall: 0.903010033444816
# Precision: 0.8544303797468354
# Specificity: 0.6666666666666666

Accuracy: 0.7413394919168591
Recall: 0.7181208053691275
Precision: 0.8842975206611571
Specificity: 0.7925925925925926


In [55]:
# Calcular average precision
ap = average_precision_score(y_test, y_proba)

print("Average Precision (AP):", ap)

# Average Precision (AP): 0.9239772259039202

Average Precision (AP): 0.8610577537065705


In [None]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

       False       0.39      0.96      0.56       138
        True       0.95      0.31      0.46       299

    accuracy                           0.51       437
   macro avg       0.67      0.64      0.51       437
weighted avg       0.77      0.51      0.49       437



In [None]:
(y_pred == 0).sum() / len(y_pred)

0.7780320366132724

# Entrenamiento Categoria Especie

In [None]:
# Carga la base de datos de Train y Test
train = pd.read_pickle('ArchivosUtiles/trainingCategoria.pkl')
test = pd.read_pickle('ArchivosUtiles/testingCategoria.pkl')

In [None]:
X_train = train['Imagen']
y_train = train['Categoria']
X_test = test['Imagen']
y_test = test['Categoria']

In [None]:
# Convierte los datos de entrenamiento y etiquetas en tensores de TensorFlow
X_train_tf = tf.convert_to_tensor(np.array([img_to_array(img) for img in X_train]))
X_test_tf = tf.convert_to_tensor(np.array([img_to_array(img) for img in X_test]))