### FrugalAI
#### Page
<ul>
<li>https://frugalaichallenge.org/tasks/</li>
<li>https://huggingface.co/collections/frugal-ai-challenge/frugal-ai-challenge-tasks-673dd5ee724c6659a5b42443</li>
<li>https://www.notion.so/Tips-journal-de-bord-1826269aa8b38066ae20fd7418db8dfc</li>
</ul>

### Functions that could be useful

In [22]:
# How to load audio from path
# Import the 'wavfile' module from scipy.io to read WAV files
from scipy.io import wavfile

# Define a function to load audio from a given file path
def load_audio(path):
    # 'wavfile.read' reads a WAV file and returns two values:
    # - samplerate: the sample rate of the audio (number of samples per second)
    # - data: the actual audio data (a numpy array with the audio samples)
    samplerate, data = wavfile.read(path)
    
    # Return both the sample rate and the audio data
    return samplerate, data

### Questions
<ul>
<li>Do you need to shuffle records for when using only 1000, all ?</li>
<li>Do you need to look into the creation of the spectrogram class ? (len method)</li>
<li>Can you do that differently (with path and map function, without creating the spectrogram class ?)</li>
</ul>

## Advice

<li>Eviter les conversions répétées : Si possible, stockez les données audio dans un format déjà optimisé pour PyTorch (tensors) ou un format compressé léger.</li>
<li>Préchargement des données : Si votre dataset est statique, utilisez un système de cache (comme lru_cache ou une autre technique) pour éviter de lire les mêmes données plusieurs fois.</li>
<li>Cible de taille minimale : Réduisez target_size à la résolution minimale nécessaire pour votre modèle. Une résolution plus petite réduit les calculs.</li>
<li>Remplacez ReLU par LeakyReLU, qui gère mieux le problème des neurones morts.</li>

# Detection of illegal deforestation

## Acquisition

In [23]:
# installs
!pip install librosa soundfile datasets

# signing in hugging face for datasets
from huggingface_hub import login
token = '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'
login(token)

# train dataset
from datasets import load_dataset
dataset = load_dataset("rfcx/frugalai", streaming=True)
print(next(iter(dataset['train'])))

{'audio': {'path': 'pooks_6ebcaf77-aa92-4f10-984e-ecc5a919bcbb_41-44.wav', 'array': array([-0.00915527,  0.01025391, -0.01452637, ..., -0.00628662,
        0.00064087,  0.00137329]), 'sampling_rate': 12000}, 'label': 1}


In [3]:
# dataset size of audio
print('length of audio : ' + str(len(next(iter(dataset['train']))['audio']['array'])))

length of audio : 36000


In [24]:
# imports
import tensorflow
import torchaudio
import pandas
import numpy
import librosa
import librosa.display
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
from torch.utils.data import DataLoader
import torch.nn as nn
import torch.nn.functional as F
import torch
import torch.optim as optim
"""
Imports explanation:
- `torch`: The core PyTorch library for creating tensors, defining models, and performing computations.
- `torch.nn`: A module containing neural network layers, such as convolutional, linear, and dropout layers.
- `torch.nn.functional`: Provides functions for operations like activation functions, pooling, and loss functions.
- `torch.optim`: Contains optimization algorithms like SGD and Adam for training neural networks.
- `torch.utils.data.DataLoader`: A utility to load data from a dataset and manage batching, shuffling, and parallel loading.
"""

'\nImports explanation:\n- `torch`: The core PyTorch library for creating tensors, defining models, and performing computations.\n- `torch.nn`: A module containing neural network layers, such as convolutional, linear, and dropout layers.\n- `torch.nn.functional`: Provides functions for operations like activation functions, pooling, and loss functions.\n- `torch.optim`: Contains optimization algorithms like SGD and Adam for training neural networks.\n- `torch.utils.data.DataLoader`: A utility to load data from a dataset and manage batching, shuffling, and parallel loading.\n'

In [5]:
# example of record
next(iter(dataset['train']))['audio']['array']

array([-0.00915527,  0.01025391, -0.01452637, ..., -0.00628662,
        0.00064087,  0.00137329])

In [6]:
# dataset format
dataset

IterableDatasetDict({
    train: IterableDataset({
        features: ['audio', 'label'],
        num_shards: 6
    })
    test: IterableDataset({
        features: ['audio', 'label'],
        num_shards: 3
    })
})

## Spectrogram class

#### Normal script

In [14]:
# script for transforming audio_iterable to spectrogram
'''import librosa
import librosa.display
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path

def audio_to_spectrogram(audio_iterable, save_dir=None, n_fft=2048, hop_length=512, n_mels=128):
    """
    Converts an audio file to a Mel spectrogram and saves it as an image.

    Args:
        audio_iterable (iterable): Path to the audio file.
        save_dir (str): Directory to save the spectrogram image (optional).
        n_fft (int): Number of FFT components.
        hop_length (int): Hop length for the STFT.
        n_mels (int): Number of Mel bands.
    
    Returns:
        np.ndarray: The generated Mel spectrogram (log-scaled).
    """
    # Load the audio file
    y, sr = audio_iterable['audio']['array'], audio_iterable['audio']['sampling_rate']
    
    # Generate the Mel spectrogram
    mel_spectrogram = librosa.feature.melspectrogram(y=y, sr=sr, n_fft=n_fft, hop_length=hop_length, n_mels=n_mels)
    
    # Convert to log scale (dB)
    log_mel_spectrogram = librosa.power_to_db(mel_spectrogram, ref=np.max)

    # Plot and save the spectrogram as an image if save_dir is specified
    if save_dir:
        Path(save_dir).mkdir(parents=True, exist_ok=True)
        save_path = Path(save_dir) / f"{Path('example').stem}_spectrogram.png" # modify the example part 
        
        plt.figure(figsize=(10, 4))
        librosa.display.specshow(log_mel_spectrogram, sr=sr, hop_length=hop_length,
                                 x_axis='time', y_axis='mel', cmap='viridis')
        plt.colorbar(format='%+2.0f dB')
        plt.title('Mel Spectrogram')
        plt.tight_layout()
        plt.savefig(save_path)
        plt.close()
        print(f"Spectrogram saved to {save_path}")
    
    return log_mel_spectrogram

# Example usage
audio_iterable = next(iter(dataset['train'])) # Replace with your audio file path
output_dir = "spectrograms"  # Replace with your desired output directory
spectrogram = audio_to_spectrogram(audio_iterable, save_dir=output_dir)'''

Spectrogram saved to spectrograms/example_spectrogram.png


In [18]:
# Spectrogram with __iter__
class SpectrogramIterableDataset(torch.utils.data.IterableDataset):
    def __init__(self, iterable_dataset, n_fft=2048, hop_length=512, n_mels=128, target_size=(128, 128)):
        """
        Wraps an IterableDataset to preprocess audio into spectrograms.
        
        Args:
            iterable_dataset (IterableDataset): The input dataset.
            n_fft (int): Number of FFT components.
            hop_length (int): Hop length for the STFT.
            n_mels (int): Number of Mel bands.
            target_size (tuple): Desired size for spectrograms (height, width).
        """
        self.dataset = iterable_dataset
        self.n_fft = n_fft
        self.hop_length = hop_length
        self.n_mels = n_mels
        self.target_size = target_size

    def process_audio(self, audio_array, sampling_rate):
        # Generate Mel spectrogram
        mel_spectrogram = librosa.feature.melspectrogram(
            y=audio_array, sr=sampling_rate, n_fft=self.n_fft, 
            hop_length=self.hop_length, n_mels=self.n_mels
        )
        # Convert to log scale (dB)
        log_mel_spectrogram = librosa.power_to_db(mel_spectrogram, ref=np.max)
        
        # Normalize to [0, 1]
        log_mel_spectrogram = (log_mel_spectrogram - np.min(log_mel_spectrogram)) / (
            np.max(log_mel_spectrogram) - np.min(log_mel_spectrogram)
        )
        
        # Resize to target size
        log_mel_spectrogram = librosa.util.fix_length(log_mel_spectrogram, size=self.target_size[1], axis=1)
        log_mel_spectrogram = librosa.util.fix_length(log_mel_spectrogram, size=self.target_size[0], axis=0)
        
        return torch.tensor(log_mel_spectrogram, dtype=torch.float32).unsqueeze(0)  # Add channel dimension

    def __iter__(self):
        for sample in iter(self.dataset):  # Iterate over the base IterableDataset
            audio_array = sample['audio']['array']
            sampling_rate = sample['audio']['sampling_rate']
            label = sample['label']
            
            # Process audio to spectrogram
            spectrogram = self.process_audio(audio_array, sampling_rate)
            
            yield spectrogram, label
    def __len__(self):
        # Count items manually
        return sum(1 for _ in iter(self.dataset))  # Count the number of items


#### Optimized script

In [25]:
# SpectrogramIterableDataset with torchaudio
import torch
import torchaudio.transforms as T

class SpectrogramIterableDataset(torch.utils.data.IterableDataset):
    def __init__(self, iterable_dataset, n_fft=2048, hop_length=512, n_mels=128, target_size=(128, 128)):
        """
        Wraps an IterableDataset to preprocess audio into spectrograms.
        
        Args:
            iterable_dataset (IterableDataset): The input dataset.
            n_fft (int): Number of FFT components.
            hop_length (int): Hop length for the STFT.
            n_mels (int): Number of Mel bands.
            target_size (tuple): Desired size for spectrograms (height, width).
        """
        self.dataset = iterable_dataset
        self.n_fft = n_fft
        self.hop_length = hop_length
        self.n_mels = n_mels
        self.target_size = target_size

        # Pre-compute length if possible
        try:
            self._length = len(iterable_dataset)
        except TypeError:
            self._length = None

    def process_audio(self, audio_array, sampling_rate):
        """
        Convert audio data into a log Mel spectrogram tensor with normalization.
        """
        waveform = torch.tensor(audio_array).unsqueeze(0)  # Convert to tensor
        mel_transform = T.MelSpectrogram(
            sample_rate=sampling_rate, n_fft=self.n_fft, 
            hop_length=self.hop_length, n_mels=self.n_mels
        )
        mel_spectrogram = mel_transform(waveform)
        log_mel_spectrogram = T.AmplitudeToDB()(mel_spectrogram)
        
        # Normalize to [0, 1]
        log_mel_spectrogram = (log_mel_spectrogram - log_mel_spectrogram.min()) / (
            log_mel_spectrogram.max() - log_mel_spectrogram.min()
        )
        
        # Resize to the target size # Can be removed as it is a costly operation
        log_mel_spectrogram = torch.nn.functional.interpolate(
            log_mel_spectrogram.unsqueeze(0), size=self.target_size
        ).squeeze(0)
    
        return log_mel_spectrogram


    def __iter__(self):
        """
        Iterate over the base dataset and yield processed spectrograms with labels.
        """
        for sample in iter(self.dataset):
            audio_array = sample['audio']['array']
            sampling_rate = sample['audio']['sampling_rate']
            label = sample['label']
            
            spectrogram = self.process_audio(audio_array, sampling_rate)
            yield spectrogram, label

    def __len__(self):
        """
        Return the length of the dataset if it can be pre-calculated; otherwise, calculate it dynamically.
        """
        # Dynamically compute if not available
        self._length = sum(1 for _ in iter(self.dataset))
        return self._length


## Load data

#### Normal script

In [26]:
# Data Loading
from torch.utils.data import DataLoader
batch_size = 32  # Adjust based on your system's memory

from codecarbon import track_emissions

@track_emissions(offline=True, country_iso_code="FRA")
def WrapTrainDataset(train_dataset):
    # Wrap the train IterableDataset
    wrapped_train_dataset = SpectrogramIterableDataset(train_dataset)

    # Create DataLoader
    train_loader = DataLoader(
        wrapped_train_dataset,
        batch_size=batch_size, 
        shuffle=False, # Shuffling is not allowed for IterableDataset
        num_workers=0 # This could be 8 as well, performance depends on available RAM. Ensure your system has enough RAM to handle multiple workers without swapping to disk.
    )
    return train_loader

'''# Iterate through batches
for batch_idx, (spectrograms, labels) in enumerate(train_loader):
    print(f"Batch {batch_idx}")
    print("Spectrograms shape:", spectrograms.shape)  # (batch_size, 1, height, width)
    print("Labels shape:", labels.shape)
    break'''


train_loader = WrapTrainDataset(dataset['train'])

[codecarbon INFO @ 17:09:00] offline tracker init
[codecarbon INFO @ 17:09:00] [setup] RAM Tracking...
[codecarbon INFO @ 17:09:00] [setup] CPU Tracking...
 Mac OS detected: Please install Intel Power Gadget or enable PowerMetrics sudo to measure CPU

[codecarbon INFO @ 17:09:07] CPU Model on constant consumption mode: Intel(R) Core(TM) i5-8257U CPU @ 1.40GHz
[codecarbon INFO @ 17:09:07] [setup] GPU Tracking...
[codecarbon INFO @ 17:09:07] No GPU found.
[codecarbon INFO @ 17:09:07] >>> Tracker's metadata:
[codecarbon INFO @ 17:09:07]   Platform system: macOS-10.16-x86_64-i386-64bit
[codecarbon INFO @ 17:09:07]   Python version: 3.11.7
[codecarbon INFO @ 17:09:07]   CodeCarbon version: 2.8.3
[codecarbon INFO @ 17:09:07]   Available RAM : 8.000 GB
[codecarbon INFO @ 17:09:07]   CPU count: 8
[codecarbon INFO @ 17:09:07]   CPU model: Intel(R) Core(TM) i5-8257U CPU @ 1.40GHz
[codecarbon INFO @ 17:09:07]   GPU count: None
[codecarbon INFO @ 17:09:07]   GPU model: None
[codecarbon INFO @ 17:0

#### Optimized script

In [None]:
# Install dependencies
!pip install librosa soundfile datasets

# Hugging Face login
from huggingface_hub import login
token = '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'
login(token)

# Load dataset with streaming
from datasets import load_dataset
from torch.utils.data import DataLoader, IterableDataset
import librosa
import torch

class StreamingAudioDataset(torch.utils.data.IterableDataset):
    """
    A PyTorch IterableDataset for streaming and preprocessing audio data.
    """

    def __init__(self, dataset_split):
        """
        Initializes the dataset with a specific split (e.g., 'train').
        
        Args:
            dataset_split (iterable): Streaming dataset split (e.g., dataset['train']).
        """
        self.dataset_split = dataset_split

    def __iter__(self):
        """
        Iterator that streams and preprocesses audio data on-the-fly.
        
        Yields:
            Tuple[torch.Tensor, torch.Tensor]: Preprocessed spectrogram and label tensor.
        """
        for sample in self.dataset_split:
            audio_path = sample['audio']['path']
            label = sample['label']

            # Load audio and convert to spectrogram
            waveform, sr = librosa.load(audio_path, sr=None)
            spectrogram = librosa.feature.melspectrogram(y=waveform, sr=sr)
            spectrogram = torch.tensor(spectrogram, dtype=torch.float32)

            # Return spectrogram and label
            yield spectrogram, torch.tensor(label, dtype=torch.long)
    def __len__(self):
        """
        Return the length of the dataset if it can be pre-calculated; otherwise, calculate it dynamically.
        """
        # Dynamically compute if not available
        self._length = sum(1 for _ in iter(self.dataset_split))
        return self._length

# Streaming the dataset
dataset = load_dataset("rfcx/frugalai", streaming=True)

# Wrap the training data in the streaming IterableDataset
train_dataset = StreamingAudioDataset(dataset['train'])

# Create DataLoader for batch processing
batch_size = 32
train_loader = DataLoader(
    train_dataset,
    batch_size=batch_size,
    num_workers=0,  # Adjust based on system resources
    prefetch_factor=2,  # Preload 2 batches per worker
    persistent_workers=True  # Keep workers alive for multi-epoch training
)



In [None]:
'''# Other possibility without creating the spectrogram class, 
from torch.utils.data import DataLoader

# Create DataLoader for the train set
batch_size = 32  # Adjust as needed
train_loader = DataLoader(
    dataset['train'], 
    batch_size=batch_size, 
    shuffle=False,  # Shuffling is not allowed for IterableDataset
    num_workers=4
)

# Iterate through batches
for batch_idx, batch in enumerate(train_loader):
    audio_arrays = batch['audio']['array']  # Access audio data
    labels = batch['label']  # Access labels
    print(f"Batch {batch_idx}")
    print("Audio arrays shape:", audio_arrays.shape)
    print("Labels shape:", labels.shape)
    break
'''

## CNN

#### Normal Script

In [2]:
import torch.nn as nn
import torch.nn.functional as F

class CNNModel(nn.Module):
    """
    CNNModel is a Convolutional Neural Network (CNN) for binary classification.

    The model consists of two convolutional layers followed by a set of fully connected layers.
    It uses ReLU activation after each convolutional and fully connected layer.
    The final output layer produces a single value representing the probability of the positive class,
    which is passed through a sigmoid activation function.

    Attributes:
        conv1 (nn.Conv2d): First convolutional layer.
        conv2 (nn.Conv2d): Second convolutional layer.
        pool (nn.MaxPool2d): Max pooling layer for downsampling.
        fc1 (nn.Linear): First fully connected layer.
        fc2 (nn.Linear): Second fully connected layer (output layer).
    """
    
    def __init__(self):
        """
        Initializes the CNNModel by defining the layers (convolutional and fully connected).
        The model follows a standard architecture with convolutional layers for feature extraction
        and fully connected layers for classification.

        Args:
            None: The model architecture is predefined.
        """
        super(CNNModel, self).__init__()

        # Define the convolutional layers
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
        
        # Max pooling layer
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)

        # Fully connected layers
        self.fc1 = nn.Linear(64 * 32 * 32, 128)  # Adjust based on input size
        self.fc2 = nn.Linear(128, 1)  # Output layer for binary classification

    def forward(self, x):
        """
        Defines the forward pass of the CNN model.

        Args:
            x (torch.Tensor): The input tensor with shape (batch_size, channels, height, width).
        
        Returns:
            torch.Tensor: The output tensor of shape (batch_size, 1), representing the probability of the positive class.
        """
        # Apply conv1, ReLU activation, and max pooling
        x = self.pool(F.relu(self.conv1(x)))
        # Apply conv2, ReLU activation, and max pooling
        x = self.pool(F.relu(self.conv2(x)))

        # Flatten the output for the fully connected layers
        x = x.view(x.size(0), -1)

        # Apply the first fully connected layer with ReLU activation
        x = F.relu(self.fc1(x))

        # Apply the final fully connected layer
        x = self.fc2(x)

        # Apply sigmoid to the output to get a probability between 0 and 1
        return torch.sigmoid(x)

# Example of using BCEWithLogitsLoss (handles the sigmoid internally)
import torch.optim as optim

# Initialize the model, loss function, and optimizer
model = CNNModel()  # Create an instance of the CNNModel
criterion = nn.BCEWithLogitsLoss()  # Binary Cross-Entropy loss for binary classification
optimizer = optim.Adam(model.parameters(), lr=0.001)  # Adam optimizer with a learning rate of 0.001

#### Optimized script

In [27]:
# Optimized and bettered CNNModel
import torch.nn as nn
import torch.nn.functional as F

class CNNModel(nn.Module):
    """
    CNNModel is a Convolutional Neural Network (CNN) for binary classification.

    The model consists of two convolutional layers with Batch Normalization, 
    followed by a set of fully connected layers with dropout for regularization.
    It uses LeakyReLU activation for better gradient flow, and the final output 
    layer produces a single value representing the probability of the positive class.

    Attributes:
        conv1 (nn.Conv2d): First convolutional layer with 32 filters.
        bn1 (nn.BatchNorm2d): Batch normalization for the first convolutional layer.
        conv2 (nn.Conv2d): Second convolutional layer with 64 filters.
        bn2 (nn.BatchNorm2d): Batch normalization for the second convolutional layer.
        pool (nn.MaxPool2d): Max pooling layer for downsampling.
        fc1 (nn.Linear): First fully connected layer with 128 neurons.
        dropout (nn.Dropout): Dropout layer for regularization.
        fc2 (nn.Linear): Second fully connected layer (output layer).
    """
    
    def __init__(self):
        """
        Initializes the CNNModel by defining the layers (convolutional, normalization,
        pooling, and fully connected). This model includes dropout and Batch Normalization
        to improve generalization and training stability.

        Args:
            None: The model architecture is predefined.
        """
        super(CNNModel, self).__init__()
        
        # Define the first convolutional layer and BatchNorm
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1)
        self.bn1 = nn.BatchNorm2d(32)
        
        # Define the second convolutional layer and BatchNorm
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(64)
        
        # Define the MaxPooling layer for downsampling
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        
        # Define the fully connected layers
        self.fc1 = nn.Linear(64 * 32 * 32, 128)  # Adjust based on input size
        self.dropout = nn.Dropout(0.5)  # Dropout for regularization
        self.fc2 = nn.Linear(128, 1)  # Output layer for binary classification

    def forward(self, x):
        """
        Defines the forward pass of the CNN model.

        The forward pass includes convolutional layers followed by Batch Normalization
        and LeakyReLU activation, max pooling for downsampling, and fully connected layers
        with dropout. The final layer applies a sigmoid activation to output probabilities.

        Args:
            x (torch.Tensor): The input tensor with shape (batch_size, channels, height, width).
        
        Returns:
            torch.Tensor: The output tensor of shape (batch_size, 1), representing the probability of the positive class.
        """
        # Apply first convolutional layer, BatchNorm, LeakyReLU, and max pooling
        x = self.pool(F.leaky_relu(self.bn1(self.conv1(x))))
        
        # Apply second convolutional layer, BatchNorm, LeakyReLU, and max pooling
        x = self.pool(F.leaky_relu(self.bn2(self.conv2(x))))
        
        # Flatten the output for the fully connected layers
        x = x.view(x.size(0), -1)
        
        # Apply the first fully connected layer with dropout and ReLU activation
        x = self.dropout(F.relu(self.fc1(x)))
        
        # Apply the final fully connected layer
        x = self.fc2(x)
        
        # Apply sigmoid to the output to get a probability between 0 and 1
        return torch.sigmoid(x)


## Training

In [10]:
# See whether cuda is available for GPU
import torch
print(torch.__version__)
print(torch.cuda.is_available()) # False means not available

2.2.2
False


In [11]:
next(iter(train_loader))

[tensor([[[[0.3280, 0.4255, 0.4397,  ..., 0.0000, 0.0000, 0.0000],
           [0.1475, 0.3320, 0.3144,  ..., 0.0000, 0.0000, 0.0000],
           [0.3071, 0.3802, 0.3775,  ..., 0.0000, 0.0000, 0.0000],
           ...,
           [0.2711, 0.2894, 0.2162,  ..., 0.0000, 0.0000, 0.0000],
           [0.1515, 0.1842, 0.1277,  ..., 0.0000, 0.0000, 0.0000],
           [0.1413, 0.1480, 0.0774,  ..., 0.0000, 0.0000, 0.0000]]],
 
 
         [[[0.5005, 0.5371, 0.4653,  ..., 0.0000, 0.0000, 0.0000],
           [0.5819, 0.5546, 0.4063,  ..., 0.0000, 0.0000, 0.0000],
           [0.6304, 0.6595, 0.5979,  ..., 0.0000, 0.0000, 0.0000],
           ...,
           [0.1368, 0.1523, 0.2050,  ..., 0.0000, 0.0000, 0.0000],
           [0.1529, 0.1337, 0.1328,  ..., 0.0000, 0.0000, 0.0000],
           [0.0491, 0.0251, 0.0325,  ..., 0.0000, 0.0000, 0.0000]]],
 
 
         [[[0.3942, 0.4402, 0.4222,  ..., 0.0000, 0.0000, 0.0000],
           [0.3958, 0.3910, 0.3807,  ..., 0.0000, 0.0000, 0.0000],
           [0.3908

#### Normal script

In [None]:
import torch
import torch.optim as optim

# Initialize model, loss function, and optimizer
model = CNNModel()
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 2

for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    max_batches = 100  # Set the maximum number of batches to process

    for batch_idx, (spectrograms, labels) in enumerate(train_loader):

        if batch_idx >= max_batches:  # Stop after 1000 batches
            break
        '''# Move data to GPU if available
        spectrograms, labels = spectrograms.to('cuda'), labels.to('cuda')
        model = model.to('cuda')'''
        
        labels = labels.unsqueeze(1).float()  # This reshapes labels to (batch_size, 1)

        # Forward pass
        outputs = model(spectrograms)
        loss = criterion(outputs, labels)
        
        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        # Log loss
        running_loss += loss.item()
        if (batch_idx + 1) % 10 == 0:  # Log every 10 batches
            print(f"Epoch [{epoch+1}/{num_epochs}], Step [{batch_idx+1}/{len(train_loader)}], Loss: {loss.item():.4f}")
    
    print(f"Epoch [{epoch+1}/{num_epochs}], Average Loss: {running_loss/len(train_loader):.4f}")


In [None]:
# 6h48 - 7h20

#### Optimized script

In [28]:
# Optimized code
import torch
import torch.optim as optim
from tqdm import tqdm  # For progress visualization
from torch.cuda.amp import GradScaler, autocast  # For mixed precision training

# Initialize model, loss function, and optimizer
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = CNNModel().to(device)
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Mixed precision training
scaler = GradScaler()

# Training loop
num_epochs = 2
max_batches = 100  # Maximum number of batches to process per epoch

for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    
    # Wrap DataLoader with tqdm for progress bar
    progress_bar = tqdm(enumerate(train_loader), total=min(max_batches, len(train_loader)), desc=f"Epoch {epoch+1}/{num_epochs}")
    
    for batch_idx, (spectrograms, labels) in progress_bar:
        if batch_idx >= max_batches:  # Stop after max_batches
            break
        
        # Move data to the same device as the model
        spectrograms, labels = spectrograms.to(device), labels.to(device)
        labels = labels.unsqueeze(1).float()  # Reshape labels to (batch_size, 1)
        
        # Forward pass with mixed precision
        with autocast():
            outputs = model(spectrograms)
            loss = criterion(outputs, labels)
        
        # Backward pass and optimization
        optimizer.zero_grad()
        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()
        
        # Update running loss
        running_loss += loss.item()
        
        # Update progress bar with current loss
        progress_bar.set_postfix({"Batch Loss": loss.item()})
    
    # Log average loss for the epoch
    avg_loss = running_loss / max_batches
    print(f"Epoch [{epoch+1}/{num_epochs}], Average Loss: {avg_loss:.4f}")


'HTTPSConnectionPool(host='cdn-lfs-us-1.hf.co', port=443): Read timed out.' thrown while requesting GET https://huggingface.co/datasets/rfcx/frugalai/resolve/a14fd5b7a22d5c03781db9e270162d946a49a99e/data/train-00000-of-00006.parquet
Retrying in 1s [Retry 1/5].
'(ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: 425f3228-80c3-48dd-85e5-582e276ed1fc)')' thrown while requesting GET https://huggingface.co/datasets/rfcx/frugalai/resolve/a14fd5b7a22d5c03781db9e270162d946a49a99e/data/train-00000-of-00006.parquet
Retrying in 2s [Retry 2/5].


ChunkedEncodingError: ('Connection broken: IncompleteRead(13451264 bytes read, 16575744 more expected)', IncompleteRead(13451264 bytes read, 16575744 more expected))

In [None]:
# changer pour checker codecarbon sur l'ensemble du loading et training et juste sur le training