# Supplementary 1 - Dataset Preparation and Initial Modeling
**"BoutScout: A Deep Learning Framework for Automatic Detection of Incubation Events in Avian Nests Using Temperature Time Series" **

Author: Jorge Lizarazo

We detail here the final steps in preparing the incubation dataset and the construction of an initial exploratory model using an 80/20 train-test split. These procedures mark the transition between data curation and early modeling, allowing us to test model structure, depth, and core hyperparameters before advancing to more rigorous cross-validation. This phase is not meant to produce a production-ready model, but rather to explore architectural decisions such as the number of BiLSTM layers, hidden dimensions, and learning rate. Results from this stage inform decisions used in Supplementary 2, where cross-validation and full retraining are performed.

**Year:** 2025

## Final Dataset Assembly and Cleaning

In [1]:
import numpy as np
import glob
import os
import pandas as pd
import json
import ast
#
import matplotlib.pyplot as plt
import random
from sklearn.preprocessing import LabelEncoder

**File loading:** The files annotated by three different team members (`Pamela López`, `Mariana Torres`, and `Lizarazo`) are loaded using pandas.

In [1]:
file_names = [
    "G:/Thesis/pre_processing/Etiquetados_JL.csv",
    "G:/Thesis/pre_processing/Etiquetados_mariana.csv",
    "G:/Thesis/pre_processing/Etiquetados_pame.csv"
]

In [None]:
list_of_dfs = []
for file_name in file_names:
    try:
        df = pd.read_csv(file_name, index_col=0, parse_dates=['date'])
        list_of_dfs.append(df)
    except FileNotFoundError:
        print(f"Advertencia: Archivo no encontrado: {file_name}")
    except Exception as e:
        print(f"Error al leer {file_name}: {e}")

In [None]:
if list_of_dfs:
    df_bloques_etiquetados = pd.concat(list_of_dfs, ignore_index=True)

    # 4. Mostrar información básica del DataFrame combinado
    print("--- DataFrame Combinado ---")
    print(f"Shape: {df_bloques_etiquetados.shape}")
    print("\nPrimeras filas:")
    print(df_bloques_etiquetados.head())
else:
    print("No se cargaron DataFrames para combinar. ocho ahi crack")

In [None]:
print(df_bloques_etiquetados)

In [None]:
X_list = []
y_list = []

# Codificador para etiquetas
le = LabelEncoder()
le.fit(df_bloques_etiquetados['label'])

In [None]:
for (archivo, fecha), bloque in df_bloques_etiquetados.groupby(['archivo_origen', 'fecha']):
    bloque = bloque.sort_values('date')  # Asegurar orden temporal

    # Entrada: variables numéricas
    X = bloque[['tempe', 'ambient', 'hour_sin']].values

    # Salida: etiquetas codificadas como números
    y = le.transform(bloque['label'].values)

    X_list.append(X)
    y_list.append(y)

# Convertir a arrays finales
X_array = np.array(X_list, dtype=object)  # cada elemento es un bloque (array 2D)
y_array = np.array(y_list, dtype=object)  # cada elemento es un array de etiquetas

In [None]:
print(X_array.shape, y_array.shape)

In [None]:
print(X_array[0].shape)  # por ejemplo (1440, 3) si el día tiene datos cada minuto
print(y_array[0])        # etiquetas numéricas del primer día
print(le.classes_)       # ['Error' 'Nocturnal' 'Off' 'On']

In [None]:
import torch
print(torch.__version__)
print(torch.cuda.is_available())  # Debe decir False si usas CPU

In [None]:
i = random.randint(0, len(X_array) - 1)
X = X_array[i]
y = y_array[i]
labels = le.inverse_transform(y)  # etiquetas originales

# Crear el gráfico
plt.figure(figsize=(14, 5))
colors = {
    'On': '#E28342',
    'Off': '#535AA6',
    'Nocturnal': '#333E48',
    'Error': 'red'
}

# Graficar los puntos con color según la etiqueta
for label in np.unique(labels):
    mask = labels == label
    plt.scatter(np.arange(len(X))[mask], X[mask, 0], label=label, color=colors[label], s=10)

#plt.title(f"Day {i}", fontsize=10)
plt.xlabel("Minute of the Day", fontsize=16)
plt.ylabel("Temperature (°C)", fontsize=16)
plt.xticks(fontsize=12)
plt.yticks(fontsize=12)
plt.legend(title="Labels", loc='lower left', fontsize=12, title_fontsize=15)
plt.tight_layout()
plt.show()