# TOOL WEAR PREDICTION

> **Note:** This notebook contains explanatory text in both English and Basque (Euskara). Each explanation appears first in English, followed by the original Basque version.

---

**[EN]** Hello, we are Gorka Dabó and Mikel Oscoz. This notebook contains the implementation of the Tool Wear Prediction project with detailed explanations. Let's start by importing the required packages.

**[EU]** Kaixo, Gorka Dabó eta Mikel Oscoz gara eta koaderno honetan Tool Wear Prediction proiektuaren inplementazioa azalpenekin batera dago. Hasteko pakete batzuk inportatuko ditugu.

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv1D, MaxPooling1D, LSTM, Dense, Dropout
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix
import pandas as pd
import torch.nn as nn
from sklearn.metrics import f1_score, accuracy_score, recall_score, precision_score
import torch
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split
from torch.utils.data import TensorDataset, DataLoader
import torch.optim as optim
from collections import Counter

Orain kaggle-ko dataseta deskargatuko dugu, atzigarria izateko drive-ko karpeta publiko batean jarri ditugu datuak.

In [2]:
import gdown
url = "https://drive.google.com/drive/folders/1GMP4ocr_x4ULEgRogHooFvrl81WoQju-?usp=sharing"
gdown.download_folder(url, quiet=True)

['/content/Dataset-IANS/Kaggle Dataset/experiment_01.csv',
 '/content/Dataset-IANS/Kaggle Dataset/experiment_02.csv',
 '/content/Dataset-IANS/Kaggle Dataset/experiment_03.csv',
 '/content/Dataset-IANS/Kaggle Dataset/experiment_04.csv',
 '/content/Dataset-IANS/Kaggle Dataset/experiment_05.csv',
 '/content/Dataset-IANS/Kaggle Dataset/experiment_06.csv',
 '/content/Dataset-IANS/Kaggle Dataset/experiment_07.csv',
 '/content/Dataset-IANS/Kaggle Dataset/experiment_08.csv',
 '/content/Dataset-IANS/Kaggle Dataset/experiment_09.csv',
 '/content/Dataset-IANS/Kaggle Dataset/experiment_10.csv',
 '/content/Dataset-IANS/Kaggle Dataset/experiment_11.csv',
 '/content/Dataset-IANS/Kaggle Dataset/experiment_12.csv',
 '/content/Dataset-IANS/Kaggle Dataset/experiment_13.csv',
 '/content/Dataset-IANS/Kaggle Dataset/experiment_14.csv',
 '/content/Dataset-IANS/Kaggle Dataset/experiment_15.csv',
 '/content/Dataset-IANS/Kaggle Dataset/experiment_16.csv',
 '/content/Dataset-IANS/Kaggle Dataset/experiment_17.csv

**[EN]** Now we will download the Kaggle dataset. To make it accessible, we have placed the data in a public Google Drive folder.

**[EU]** Orain kaggle-ko dataseta deskargatuko dugu, atzigarria izateko drive-ko karpeta publiko batean jarri ditugu datuak.

In [3]:
def load_data():
    results = pd.read_csv("/content/Dataset-IANS/Kaggle Dataset/train.csv")
    worn_experiments = []
    unworn_experiments = []

    experiment_condition = {}

    for i in range(1, 19):
        exp = '0' + str(i) if i < 10 else str(i)
        frame = pd.read_csv(f"/content/Dataset-IANS/Kaggle Dataset/experiment_{exp}.csv")
        row = results[results['No'] == i]
        condition = row.iloc[0]['tool_condition']


        experiment_condition[i] = 1 if condition == 'worn' else 0

        if condition == 'worn':
            worn_experiments.append((i, frame))
        else:
            unworn_experiments.append((i, frame))

    return worn_experiments, unworn_experiments, experiment_condition

Funtzio honek, datu guztiak 3 multzotan banatzen ditu: train, dev eta test multzoak. Multzo bakoitzean bi klasetako esperimentuak egoteko eginda dago. Azkenik hiru listak bueltatzen ditu.

In [4]:
def split_experiments():
    worn_experiments, unworn_experiments, experiment_condition = load_data()

    test_worn = worn_experiments[:2]
    test_unworn = unworn_experiments[:2]

    remaining_worn = worn_experiments[2:]
    remaining_unworn = unworn_experiments[2:]

    val_worn = remaining_worn[:2]
    val_unworn = remaining_unworn[:1]

    train_worn = remaining_worn[2:]
    train_unworn = remaining_unworn[1:]

    return (train_worn + train_unworn,
            val_worn + val_unworn,
            test_worn + test_unworn,
            experiment_condition)

**[EN]** The `load_data` function loads the necessary data for tool wear analysis. First, it reads a CSV file containing information about each experiment's tool condition (worn or unworn).

Then, it loads the data for each experiment and finally returns three elements: the worn experiments, the unworn experiments, and a dictionary mapping experiment IDs to their conditions.

**[EU]** Load_data funtzioak erreminten higadura aztertzeko beharrezko datuak kargatzen ditu. Lehenik eta behin, irakurri CSV fitxategi bat, esperimentu bakoitzerako tresnaren egoerari buruzko informazioa duena (higatua edo ez).

Ondoren, esperimentu bakoitzaren datuak kargatzen ditu eta azkenik, funtzioak hiru elementu itzultzen ditu: higatutako esperimentuak, higatu gabekoak eta esperimenturen kondizioak mapeatzen dituen hiztegia.

In [5]:
def prepare_data(experiments):
    all_data = []
    experiment_indices = []

    for exp_id, frame in experiments:
        frame = frame.drop('Machining_Process', axis=1)
        n_rows = len(frame)
        all_data.append(frame)
        experiment_indices.extend([exp_id] * n_rows)

    return pd.concat(all_data, ignore_index=True), experiment_indices

`create_sliding_windows_by_experiment` Funtzio honek leiho irristakorrak sortzen ditu entrenamendu eta balidazio multzoetarako. Esperimentu bakoitzaren datuak tamaina finkoko leihoetan banatzen ditu, eta urrats jakin batekin mugitzen da. Leiho horiek modeloa entrenatzeko erabiliko dira, datuetan aldi baterako patroiak atzemateko aukera emanez. Funtzioak array bat itzultzen du sortutako leihoekin eta beste bat leihoak dagozkien esperimentuen IDekin.

In [6]:
def create_sliding_windows_by_experiment(data, experiment_indices, window_size, step_size):
    windows = []
    window_exp_ids = []

    unique_experiments = np.unique(experiment_indices)

    for exp_id in unique_experiments:
        exp_mask = np.array(experiment_indices) == exp_id
        exp_data = data[exp_mask]

        if len(exp_data) >= window_size:
            valid_indices = range(len(exp_data) - window_size + 1)
            for i in range(0, len(valid_indices), step_size):
                if i + window_size <= len(exp_data):
                    windows.append(exp_data[i:i + window_size])
                    window_exp_ids.append(exp_id)

    return np.array(windows), window_exp_ids

**[EN]** This function splits all data into 3 sets: train, dev, and test sets. It is designed to ensure that experiments from both classes are present in each set. Finally, it returns three lists.

**[EU]** Funtzio honek, datu guztiak 3 multzotan banatzen ditu: train, dev eta test multzoak. Multzo bakoitzean bi klasetako esperimentuak egoteko eginda dago. Azkenik hiru listak bueltatzen ditu.

In [7]:
def create_test_windows_by_experiment(data, experiment_indices, window_size):
    windows = []
    window_exp_ids = []

    unique_experiments = np.unique(experiment_indices)

    for exp_id in unique_experiments:
        exp_mask = np.array(experiment_indices) == exp_id
        exp_data = data[exp_mask]

        current_pos = 0
        while current_pos + window_size <= len(exp_data):
            windows.append(exp_data[current_pos:current_pos + window_size])
            window_exp_ids.append(exp_id)
            current_pos += window_size

    return np.array(windows), window_exp_ids

`HybridCNNRNN` klaseak Sare Neuronal Konboluzionalen (CNN) eta Sare Neuronal Errekurrenteen (RNN) geruzak konbinatzen dituen eredu hibrido baten arkitektura definitzen du, zehazki LSTM (Long Short-Term Memory). Eredu mota hau oso erabilgarria da da ezaugarri espazialak dituzten datu sekuentzialak prozesatzeko, hala nola tresnen higadura-datuak.

Ereduak osagai hauek ditu:

* **CNN geruzak:** sarrerako datuetatik ezaugarri lokalak eta patroi
espazialak ateratzeko erabiltzen dira.
* **LSTM geruzak:** Denbora-mendekotasunak eta epe luzerako patroiak atzematen dituzte datu-sekuentzietan.
* **Fully Connected geruzak:** Azken sailkapenaz arduratzen dira, ereduaren iragarpena irudikatzen duen irteera baten aurreko geruzek ateratako ezaugarriak mapeatuz.


In [8]:
class HybridCNNRNN(nn.Module):
    def __init__(self):
        super().__init__()
        # CNN geruzak
        self.conv1 = nn.Conv1d(in_channels=47, out_channels=32, kernel_size=5, padding=2)
        self.bn1 = nn.BatchNorm1d(32)
        self.conv2 = nn.Conv1d(in_channels=32, out_channels=64, kernel_size=3, padding=1)
        self.bn2 = nn.BatchNorm1d(64)
        self.pool = nn.MaxPool1d(kernel_size=2, stride=2)
        self.dropout1 = nn.Dropout(0.3)

        # LSTM geruzak
        self.rnn = nn.LSTM(input_size=64, hidden_size=128,
                          num_layers=2, batch_first=True,
                          bidirectional=True, dropout=0.3)

        # Fully connected geruzak
        self.fc1 = nn.Linear(256, 64)  # 256 bidirekzioanala delako (128*2)
        self.dropout2 = nn.Dropout(0.3)
        self.fc2 = nn.Linear(64, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        # CNN
        x = x.permute(0, 2, 1)
        x = self.conv1(x)
        x = self.bn1(x)
        x = nn.functional.relu(x)
        x = self.pool(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x = nn.functional.relu(x)
        x = self.pool(x)
        x = self.dropout1(x)

        # LSTM
        x = x.permute(0, 2, 1)
        x, _ = self.rnn(x)
        x = x[:, -1, :]

        # Fully connected
        x = self.fc1(x)
        x = nn.functional.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        x = self.sigmoid(x)
        return x

**[EN]** The `prepare_data` function preprocesses the experiment data, combining them for later use.

**[EU]** `prepare_data` funtzioak esperimentuen datuak preprozesatzen ditu, hauek konbinatzen gero erabili ahal izateko.

In [9]:
def train_loop(model, train_loader, val_loader, device, num_epochs=10, loocv=False):
    loss_fn = nn.BCELoss()
    optimizer = optim.Adam(model.parameters(), lr=5e-4, weight_decay=1e-5)

    best_val_f1 = -1
    best_model = None

    for epoch in range(num_epochs):
        print(f'\nEpoch {epoch+1}:')
        model.train()
        train_loss = 0
        train_preds = []
        train_targets = []

        size = len(train_loader.dataset)

        for batch, (X_batch, y_batch) in enumerate(train_loader):
            X_batch = X_batch.to(device)
            y_batch = y_batch.to(device)

            pred = model(X_batch)
            loss = loss_fn(pred.squeeze(), y_batch.float())

            optimizer.zero_grad()
            loss.backward()
            optimizer.step()

            train_loss += loss.item()
            train_preds.extend((pred.squeeze() > 0.5).int().cpu().numpy())
            train_targets.extend(y_batch.cpu().numpy())

            if loocv == False and batch % 20 == 0:
                loss, current = loss.item(), batch * len(X_batch)
                print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")


        # Balidazioa
        model.eval()
        val_preds = []
        val_targets = []

        with torch.no_grad():
            for X_batch, y_batch in val_loader:
                X_batch = X_batch.to(device)
                y_batch = y_batch.to(device)

                pred = model(X_batch)
                val_preds.extend((pred.squeeze() > 0.5).int().cpu().numpy())
                val_targets.extend(y_batch.cpu().numpy())

        # Metrikak kalkulatu
        train_f1 = f1_score(train_targets, train_preds)
        val_f1 = f1_score(val_targets, val_preds)

        print(f'Train Loss: {train_loss/len(train_loader):.4f}, Train F1: {train_f1:.4f}')
        print(f'Val F1: {val_f1:.4f}')


        if val_f1 > best_val_f1:
            best_val_f1 = val_f1
            best_model = model.state_dict().copy()

    return best_model

`Evaluate_model` funtzioak test multzoan entrenatutako ereduaren errendimendua ebaluatzen du. Helburua da ereduak entrenamenduan ikusi gabeko datuetan erreminten higadura aurreikusteko duen gaitasuna adierazten duten metrikak lortzea.

Horretarako, lehenengo funtzioak test multzoari buruzko iragarpenak egiten ditu. Ondoren, leiho-mailan eta esperimentu-mailan metrikak kalkulatzen ditu. Metrika kalkulatuek zehaztasuna, recall eta F1-score dira. Azkenik, funtzioak kalkulatutako metrikak, leiho eta esperimentu mailako iragarpenak eta esperimentuen benetako etiketak biltzen dituen hiztegia itzultzen du.

In [10]:
def evaluate_model(model, test_loader, test_experiment_indices, device):
    model.eval()
    predictions_by_experiment = {}
    true_labels_by_experiment = {}

    # Lehioen metrikak
    all_window_predictions = []
    all_window_true_labels = []

    with torch.no_grad():
        batch_start = 0
        for X_batch, y_batch in test_loader:
            X_batch = X_batch.to(device)
            y_batch = y_batch.to(device)

            pred = model(X_batch)
            predictions = (pred.squeeze() > 0.5).int().cpu().numpy()

            all_window_predictions.extend(predictions)
            all_window_true_labels.extend(y_batch.cpu().numpy())

            for i, pred in enumerate(predictions):
                exp_id = test_experiment_indices[batch_start + i]
                if exp_id not in predictions_by_experiment:
                    predictions_by_experiment[exp_id] = []
                    true_labels_by_experiment[exp_id] = y_batch[i].item()
                predictions_by_experiment[exp_id].append(pred)

            batch_start += len(y_batch)

    window_metrics = {
        'accuracy': accuracy_score(all_window_true_labels, all_window_predictions),
        'precision': precision_score(all_window_true_labels, all_window_predictions),
        'recall': recall_score(all_window_true_labels, all_window_predictions),
        'f1': f1_score(all_window_true_labels, all_window_predictions)
    }

    # Esperimentu bakoitzako majority voting egin
    experiment_predictions = []
    experiment_true_labels = []

    for exp_id in predictions_by_experiment:
        exp_predictions = predictions_by_experiment[exp_id]
        # Majority voting
        majority_vote = Counter(exp_predictions).most_common(1)[0][0]
        experiment_predictions.append(majority_vote)
        experiment_true_labels.append(true_labels_by_experiment[exp_id])

    experiment_metrics = {
        'accuracy': accuracy_score(experiment_true_labels, experiment_predictions),
        'precision': precision_score(experiment_true_labels, experiment_predictions),
        'recall': recall_score(experiment_true_labels, experiment_predictions),
        'f1': f1_score(experiment_true_labels, experiment_predictions)
    }

    return {
        'window_metrics': window_metrics,
        'experiment_metrics': experiment_metrics,
        'window_predictions': all_window_predictions,
        'window_true_labels': all_window_true_labels,
        'experiment_predictions': experiment_predictions,
        'experiment_true_labels': experiment_true_labels
    }

**[EN]** `create_sliding_windows_by_experiment` - This function creates sliding windows for training and validation sets. It divides each experiment's data into fixed-size windows, moving with a specific step size. These windows will be used to train the model, enabling the detection of temporal patterns in the data. The function returns an array with the generated windows and another with the experiment IDs corresponding to each window.

**[EU]** `create_sliding_windows_by_experiment` Funtzio honek leiho irristakorrak sortzen ditu entrenamendu eta balidazio multzoetarako. Esperimentu bakoitzaren datuak tamaina finkoko leihoetan banatzen ditu, eta urrats jakin batekin mugitzen da. Leiho horiek modeloa entrenatzeko erabiliko dira, datuetan aldi baterako patroiak atzemateko aukera emanez. Funtzioak array bat itzultzen du sortutako leihoekin eta beste bat leihoak dagozkien esperimentuen IDekin.

In [11]:
def main():
    # Hyperparametroak
    window_size = 128
    step_size = 10
    batch_size = 32

    train_experiments, val_experiments, test_experiments, experiment_condition = split_experiments()

    # Datasetak prestatu
    train_data, train_exp_indices = prepare_data(train_experiments)
    val_data, val_exp_indices = prepare_data(val_experiments)
    test_data, test_exp_indices = prepare_data(test_experiments)

    # Datuak normalizatu
    scaler = StandardScaler()
    train_data_normalized = scaler.fit_transform(train_data)
    val_data_normalized = scaler.transform(val_data)
    test_data_normalized = scaler.transform(test_data)

    # Lehioak sortu
    X_train, train_window_exp_ids = create_sliding_windows_by_experiment(train_data_normalized, train_exp_indices, window_size, step_size)
    X_val, val_window_exp_ids = create_sliding_windows_by_experiment(val_data_normalized, val_exp_indices, window_size, step_size)
    X_test, test_window_exp_ids = create_test_windows_by_experiment(test_data_normalized, test_exp_indices, window_size)

    y_train = np.array([experiment_condition[exp_id] for exp_id in train_window_exp_ids])
    y_val = np.array([experiment_condition[exp_id] for exp_id in val_window_exp_ids])
    y_test = np.array([experiment_condition[exp_id] for exp_id in test_window_exp_ids])

    # Dataloaderrak sortu
    train_dataset = TensorDataset(torch.FloatTensor(X_train), torch.FloatTensor(y_train))
    val_dataset = TensorDataset(torch.FloatTensor(X_val), torch.FloatTensor(y_val))
    test_dataset = TensorDataset(torch.FloatTensor(X_test), torch.FloatTensor(y_test))

    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=batch_size)
    test_loader = DataLoader(test_dataset, batch_size=batch_size)

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model = HybridCNNRNN().to(device)

    # Eredua entrenatu
    print("Training model...")
    best_model_state = train_loop(model, train_loader, val_loader, device)
    model.load_state_dict(best_model_state)

    print("\nTesting model...")
    results = evaluate_model(model, test_loader, test_window_exp_ids, device)

    print("Window-level Results:")
    print(f"Accuracy: {results['window_metrics']['accuracy']:.4f}")
    print(f"Precision: {results['window_metrics']['precision']:.4f}")
    print(f"Recall: {results['window_metrics']['recall']:.4f}")
    print(f"F1-score: {results['window_metrics']['f1']:.4f}")

    print("\nExperiment-level Results (after majority voting):")
    print(f"Accuracy: {results['experiment_metrics']['accuracy']:.4f}")
    print(f"Precision: {results['experiment_metrics']['precision']:.4f}")
    print(f"Recall: {results['experiment_metrics']['recall']:.4f}")
    print(f"F1-score: {results['experiment_metrics']['f1']:.4f}")

    print("\nPredictions for each experiment:", results['experiment_predictions'])
    print("True labels for experiments:", results['experiment_true_labels'])



if __name__ == "__main__":
    main()

Training model...

Epoch 1:
loss: 0.683587  [    0/ 1649]
loss: 0.675646  [  640/ 1649]
loss: 0.554105  [ 1280/ 1649]
Train Loss: 0.6585, Train F1: 0.7167
Val F1: 0.6414

Epoch 2:
loss: 0.581713  [    0/ 1649]
loss: 0.595820  [  640/ 1649]
loss: 0.745361  [ 1280/ 1649]
Train Loss: 0.5417, Train F1: 0.7389
Val F1: 0.6111

Epoch 3:
loss: 0.408757  [    0/ 1649]
loss: 0.374869  [  640/ 1649]
loss: 0.627431  [ 1280/ 1649]
Train Loss: 0.3997, Train F1: 0.8314
Val F1: 0.6232

Epoch 4:
loss: 0.380280  [    0/ 1649]
loss: 0.432512  [  640/ 1649]
loss: 0.338392  [ 1280/ 1649]
Train Loss: 0.2678, Train F1: 0.9084
Val F1: 0.6193

Epoch 5:
loss: 0.249256  [    0/ 1649]
loss: 0.282770  [  640/ 1649]
loss: 0.780990  [ 1280/ 1649]
Train Loss: 0.2553, Train F1: 0.9173
Val F1: 0.6162

Epoch 6:
loss: 0.340773  [    0/ 1649]
loss: 0.162852  [  640/ 1649]
loss: 0.288304  [ 1280/ 1649]
Train Loss: 0.1694, Train F1: 0.9439
Val F1: 0.6111

Epoch 7:
loss: 0.046628  [    0/ 1649]
loss: 0.034947  [  640/ 1649]


# LEAVE-ONE-OUT CROSS VALIDATION

**[EN]** `create_test_windows_by_experiment` - Similar to the previous one, this function creates data windows but is designed for the test set. The main difference is that it doesn't use a "step" for sliding the window, but instead creates consecutive windows without overlap. This ensures that all information from the experiment will be used in testing, avoiding biases in evaluation. Like the previous function, it returns an array with windows and another with their corresponding experiment IDs.

**[EU]** `create_test_windows_by_experiment` Aurrekoaren antzekoa da, funtzio honek datu leihoak sortzen ditu, baina test multzorako diseinatuta dago. Alde nagusia da ez duela "urrats" bat erabiltzen leihoa irristatzeko, baizik eta elkarren segidako leihoak sortzen dituela gainjartzerik gabe. Horrek proban esperimentuaren informazio guztia erabiliko dela ziurtatzen du, ebaluazioan alborapenak saihestuz. Aurrekoak bezala, array bat itzultzen du leihoekin eta beste bat dagozkion esperimentuen IDekin.

In [None]:
class HybridCNNRNNNoWindow(nn.Module):
    def __init__(self):
        super().__init__()
        # CNN geruzak
        self.conv1 = nn.Conv1d(in_channels=1, out_channels=32, kernel_size=5, padding=2)
        self.bn1 = nn.BatchNorm1d(32)
        self.conv2 = nn.Conv1d(in_channels=32, out_channels=64, kernel_size=3, padding=1)
        self.bn2 = nn.BatchNorm1d(64)
        self.pool = nn.MaxPool1d(kernel_size=2, stride=2)
        self.dropout1 = nn.Dropout(0.3)

        # LSTM geruzak
        self.rnn = nn.LSTM(input_size=64, hidden_size=128,
                          num_layers=2, batch_first=True,
                          bidirectional=True, dropout=0.3)

        # Fully connected geruzak
        self.fc1 = nn.Linear(256, 64)  # 256 bidirekzionala delako (128*2)
        self.dropout2 = nn.Dropout(0.3)
        self.fc2 = nn.Linear(64, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x = x.unsqueeze(1)

        # CNN
        x = self.conv1(x)
        x = self.bn1(x)
        x = nn.functional.relu(x)
        x = self.pool(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x = nn.functional.relu(x)
        x = self.pool(x)
        x = self.dropout1(x)

        # LSTM
        x = x.permute(0, 2, 1)
        x, _ = self.rnn(x)
        x = x[:, -1, :]

        # Fully connected
        x = self.fc1(x)
        x = nn.functional.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        x = self.sigmoid(x)
        return x

`Loocv_evaluation` funtzioak "Leave-One-Out" (LOOCV) balidazio gurutzatuaren teknika inplementatzen du HybridCNNRNNoWindow eredua ebaluatzeko. Teknika hau eredua behin eta berriz entrenatzean eta ebaluatzean datza, esperimentu indibidual bakoitza test multzo gisa behin erabiliz eta gainerakoak entrenamendu multzo gisa erabiliz.

Iterazio bakoitzean, funtzioak probarako esperimentu bat bereizten du, entrenamendu eta test datuak prestatzen ditu, datuak normalizatzen ditu eta leihorik gabeko datu multzoak sortzen ditu. Gero, eredua hasieratu eta entrenatzen du entrenamendu-datuak erabiliz, eta probako esperimentuan ebaluatzen du.

In [None]:
def loocv_evaluation():
    # Esperimentu guztiak kargatu
    worn_experiments, unworn_experiments, experiment_condition = load_data()

    # Esperimentu guztiak konbinatu
    all_experiments = worn_experiments + unworn_experiments

    f1_scores = []
    accuracies = []

    # LOOCV
    for test_exp_idx in range(len(all_experiments)):
        print(f"\nTesting with experiment {all_experiments[test_exp_idx][0]}")

        # Banatu train eta test
        test_exp = [all_experiments[test_exp_idx]]
        train_exp = [exp for i, exp in enumerate(all_experiments) if i != test_exp_idx]

        train_data, train_exp_indices = prepare_data(train_exp)
        test_data, test_exp_indices = prepare_data(test_exp)

        # Datuak normalizatu
        scaler = StandardScaler()
        train_data_normalized = scaler.fit_transform(train_data)
        test_data_normalized = scaler.transform(test_data)

        X_train = torch.FloatTensor(train_data_normalized)
        y_train = torch.FloatTensor([experiment_condition[exp_id] for exp_id in train_exp_indices])

        X_test = torch.FloatTensor(test_data_normalized)
        y_test = torch.FloatTensor([experiment_condition[exp_id] for exp_id in test_exp_indices])

        train_dataset = TensorDataset(X_train, y_train)
        test_dataset = TensorDataset(X_test, y_test)

        train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
        test_loader = DataLoader(test_dataset, batch_size=32)

        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        model = HybridCNNRNNNoWindow().to(device)

        # Eredua entrenatu
        try:
            best_model_state = train_loop(model, train_loader, test_loader, device, num_epochs=10, loocv=True)
            model.load_state_dict(best_model_state)
        except Exception as e:
            print(f"Error during training: {e}")
            continue

        # Ebaluazioa
        model.eval()
        all_preds = []
        all_labels = []

        with torch.no_grad():
            for X_batch, y_batch in test_loader:
                X_batch = X_batch.to(device)
                pred = model(X_batch)
                predictions = (pred.squeeze() > 0.5).int().cpu().numpy()
                all_preds.extend(predictions)
                all_labels.extend(y_batch.numpy())

        fold_f1 = f1_score(all_labels, all_preds)
        fold_accuracy = accuracy_score(all_labels, all_preds)

        print(f"Experiment {all_experiments[test_exp_idx][0]} - F1: {fold_f1:.4f}, Accuracy: {fold_accuracy:.4f}")

        f1_scores.append(fold_f1)
        accuracies.append(fold_accuracy)

    avg_f1 = sum(f1_scores) / len(f1_scores)
    avg_accuracy = sum(accuracies) / len(accuracies)

    print("\nOverall LOOCV Results:")
    print(f"Average F1-Score: {avg_f1:.4f}")
    print(f"Average Accuracy: {avg_accuracy:.4f}")

    print("\nIndividual Experiment Scores:")
    for i, (f1, acc) in enumerate(zip(f1_scores, accuracies)):
        print(f"Experiment {all_experiments[i][0]}: F1 = {f1:.4f}, Accuracy = {acc:.4f}")

    return f1_scores, accuracies

if __name__ == "__main__":
    print("Starting Leave-One-Out Cross-Validation with Hybrid CNN-RNN...")
    f1_scores, accuracies = loocv_evaluation()

Starting Leave-One-Out Cross-Validation with Hybrid CNN-RNN...

Testing with experiment 6

Epoch 1:
Train Loss: 0.6362, Train F1: 0.5927
Val F1: 0.7490

Epoch 2:
Train Loss: 0.5923, Train F1: 0.6355
Val F1: 0.5672

Epoch 3:
Train Loss: 0.5580, Train F1: 0.6933
Val F1: 0.5314

Epoch 4:
Train Loss: 0.5295, Train F1: 0.6972
Val F1: 0.6019

Epoch 5:
Train Loss: 0.5055, Train F1: 0.7171
Val F1: 0.4357

Epoch 6:
Train Loss: 0.4853, Train F1: 0.7290
Val F1: 0.5875

Epoch 7:
Train Loss: 0.4696, Train F1: 0.7394
Val F1: 0.5078

Epoch 8:
Train Loss: 0.4567, Train F1: 0.7492
Val F1: 0.3026

Epoch 9:
Train Loss: 0.4473, Train F1: 0.7552
Val F1: 0.3114

Epoch 10:
Train Loss: 0.4310, Train F1: 0.7675
Val F1: 0.3147
Experiment 6 - F1: 0.3147, Accuracy: 0.1867

Testing with experiment 7

Epoch 1:
Train Loss: 0.6377, Train F1: 0.6109
Val F1: 0.8597

Epoch 2:
Train Loss: 0.5969, Train F1: 0.5729
Val F1: 0.9289

Epoch 3:
Train Loss: 0.5732, Train F1: 0.6052
Val F1: 0.8677

Epoch 4:
Train Loss: 0.5529, Tr