<img src="https://drive.google.com/uc?export=view&id=14reVO1X6LsjqJ3cFgoeHxxddZVGfZn3t" width="100%">

# Modelamiento
---

Este notebook se enfocará en optimizar el rendimiento del modelo mediante dos procesos críticos: primero, implementaremos la búsqueda de hiperparámetros, explorando de manera sistemática diferentes configuraciones como la tasa de aprendizaje, el tamaño del lote y la arquitectura de las capas, para identificar la combinación que maximice la precisión de nuestra CNN en la tarea de clasificación de plagas; y segundo, realizaremos el entrenamiento principal para obtener un modelo completamente ajustado.

Importemos los paquetes necesarios

In [2]:
!pip install keras_tuner

Collecting keras_tuner
  Downloading keras_tuner-1.4.7-py3-none-any.whl.metadata (5.4 kB)
Collecting kt-legacy (from keras_tuner)
  Downloading kt_legacy-1.0.5-py3-none-any.whl.metadata (221 bytes)
Downloading keras_tuner-1.4.7-py3-none-any.whl (129 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m129.1/129.1 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading kt_legacy-1.0.5-py3-none-any.whl (9.6 kB)
Installing collected packages: kt-legacy, keras_tuner
Successfully installed keras_tuner-1.4.7 kt-legacy-1.0.5


In [3]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import os
import sys
import kagglehub

from sklearn.model_selection import train_test_split
import keras
import keras_tuner
import gdown
import zipfile

Ahora carguemos el conjunto de datos usando la API de Kaggle

In [4]:
path = kagglehub.dataset_download("vencerlanz09/agricultural-pests-image-dataset")

Downloading from https://www.kaggle.com/api/v1/datasets/download/vencerlanz09/agricultural-pests-image-dataset?dataset_version_number=1...


100%|██████████| 102M/102M [00:04<00:00, 25.7MB/s]

Extracting files...





# 1. Particionamiento del conjunto de datos

In [5]:
image_path = []
image_class = []
labels = []

classes = os.listdir(path)
for label, Class in enumerate(classes):
    images_names = os.listdir(f'{path}/{Class}')
    image_path = image_path +  [f'{path}/{Class}/{name}' for name in images_names]
    image_class = image_class + len(images_names)*[Class]
    labels = labels + len(images_names)*[label]

metadata = pd.DataFrame(np.array([image_path, image_class, labels]).T, columns=['path','class name','label'])
metadata.head()

Unnamed: 0,path,class name,label
0,/root/.cache/kagglehub/datasets/vencerlanz09/a...,catterpillar,0
1,/root/.cache/kagglehub/datasets/vencerlanz09/a...,catterpillar,0
2,/root/.cache/kagglehub/datasets/vencerlanz09/a...,catterpillar,0
3,/root/.cache/kagglehub/datasets/vencerlanz09/a...,catterpillar,0
4,/root/.cache/kagglehub/datasets/vencerlanz09/a...,catterpillar,0


In [6]:
metadata_train, metadata_test = train_test_split(metadata, test_size=0.15, random_state=42)
metadata_train, metadata_tune = train_test_split(metadata_train, test_size=0.175, random_state=42)

print(f'Numero de registros de entrenamiento: {metadata_train.shape[0]}')
print(f'Numero de registros de validación: {metadata_tune.shape[0]}')
print(f'Numero de registros de Ajuste: {metadata_test.shape[0]}')

Numero de registros de entrenamiento: 3851
Numero de registros de validación: 818
Numero de registros de Ajuste: 825


# 2. Generadores de datos

In [7]:
scripts_url = "https://drive.google.com/uc?id=1Ua3O6uh45uNOfcIbPZch2uDHeGUVmpL8"
zip_path = "/content/scripts.zip"
gdown.download(scripts_url, zip_path)

if os.path.exists(zip_path):
    with zipfile.ZipFile(zip_path, "r") as zip_ref:
        zip_ref.extractall()
    print("Extracción completada.")

else:
    print(f"Archivo no encontrado: {zip_path}.")

script_folder = "/content/scripts"

Downloading...
From (original): https://drive.google.com/uc?id=1Ua3O6uh45uNOfcIbPZch2uDHeGUVmpL8
From (redirected): https://drive.google.com/uc?id=1Ua3O6uh45uNOfcIbPZch2uDHeGUVmpL8&confirm=t&uuid=2e8b5c7c-ccec-4652-b89f-77cb88554e4f
To: /content/scripts.zip
100%|██████████| 377M/377M [00:08<00:00, 46.1MB/s]


Extracción completada.


In [8]:
from scripts.preprocessing import DataGenerator as gd

In [9]:
train_generator = gd.DataGenerator(metadata_train, batch_size=32, dim=(128,128,3), shuffle=True)
test_generator = gd.DataGenerator(metadata_test, batch_size=32, dim=(128,128,3), shuffle=True)
tune_generator = gd.DataGenerator(metadata_tune, batch_size=32, dim=(128,128,3), shuffle=True)

X, Y = train_generator.__getitem__(0)
print(f'Dimensión de tensor de entrada: {X.shape}')
print(f'Dimensión de tensor de salida: {Y.shape}')

Dimensión de tensor de entrada: (32, 128, 128, 3)
Dimensión de tensor de salida: (32, 1)


# 3. Busqueda de hiperparametros

In [10]:
def build_model(hp):
    keras.backend.clear_session()

    bb = keras.applications.VGG19(include_top=False, weights="imagenet", input_shape=(128,128,3))
    for layer in bb.layers:
        layer.trainable=False

    model = keras.Sequential()
    model.add(keras.layers.Input((128,128,3)))
    model.add(bb)
    model.add(keras.layers.GlobalAveragePooling2D())
    model.add(keras.layers.Dense(hp.Choice('units', [64, 128, 256]), activation='relu'))
    model.add(keras.layers.Dense(12, activation='softmax'))

    model.compile(optimizer=keras.optimizers.Adam(learning_rate=hp.Float("lr", min_value=1e-4, max_value=1e-2, sampling="log")),
                  loss=keras.losses.SparseCategoricalCrossentropy(),
                  metrics=['accuracy'])

    return  model

In [11]:
hp = keras_tuner.HyperParameters()
model = build_model(hp)
model.summary()

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg19/vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5
[1m80134624/80134624[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 0us/step


In [12]:
tuner = keras_tuner.RandomSearch(
    hypermodel=build_model,
    objective="val_loss",
    max_trials=3,
    executions_per_trial=3,
    overwrite=True,
    directory="",
    project_name="plagas",
)

In [13]:
tuner.search_space_summary()

Search space summary
Default search space size: 2
units (Choice)
{'default': 64, 'conditions': [], 'values': [64, 128, 256], 'ordered': True}
lr (Float)
{'default': 0.0001, 'conditions': [], 'min_value': 0.0001, 'max_value': 0.01, 'step': None, 'sampling': 'log'}


In [None]:
tuner.search(train_generator, epochs=15, validation_data=(tune_generator))


Search: Running Trial #1

Value             |Best Value So Far |Hyperparameter
256               |256               |units
0.0003215         |0.0003215         |lr



  self._warn_if_super_not_called()


Epoch 1/15
[1m 11/120[0m [32m━[0m[37m━━━━━━━━━━━━━━━━━━━[0m [1m12:58[0m 7s/step - accuracy: 0.0910 - loss: 2.5914

## Créditos
---
* **Profesores:**
  - [Jorge E. Camargo, PhD](https://dis.unal.edu.co/~jecamargom/)
* **Asistentes docentes:**
    - [Juan Sebastián Malagón Torres](https://co.linkedin.com/in/juan-sebastian-malag%C3%B3n-torres-86039a164).
* **Diseño de imágenes:**
    - [Sebastián Daniel Moreno Martinez](http://www.linkedin.com/in/sm-xwx).
* **Coordinador de virtualización:**
    - [Edder Hernández Forero](https://www.linkedin.com/in/edder-hernandez-forero-28aa8b207/).
    