# Cnn classifier

In this notebook, we implemented a Convolutional Neural Network (CNN) that processes images through convolutional and pooling layers. These layers reduce the original dimensionality of the data while increasing the number of features, thereby extracting meaningful patterns from the input. After this, the data is linearized, and the network is trained to hoppefuly produce a correct output.

### Imports

All the necesssary imports in this notebook

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import seaborn as sns
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from tensorflow.keras import layers, models
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, GlobalAveragePooling2D
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from glob import glob

import librosa
import librosa.display
import IPython.display as ipd

from itertools import cycle

sns.set_theme(style="white", palette=None)
color_pal = plt.rcParams["axes.prop_cycle"].by_key()["color"]
color_cycle = cycle(plt.rcParams["axes.prop_cycle"].by_key()["color"])

# Terms to know for Audio in Digital Form:

## Frequency (Hz)
- Frequency describes the differences of wave lengths.
- We interperate frequency has high and low pitches.

<img src="https://uploads-cdn.omnicalculator.com/images/britannica-wave-frequency.jpg" width="400"/>

## Intensity (db / power)
- Intensity describes the amplitude (height) of the wave.

<img src="https://ars.els-cdn.com/content/image/3-s2.0-B9780124722804500162-f13-15-9780124722804.gif" width="400"/>

## Sample Rate
- Sample rate is specific to how the computer reads in the audio file.
- Think of it as the "resolution" of the audio.

<img src="https://www.headphonesty.com/wp-content/uploads/2019/07/Sample-Rate-Bit-Depth-and-Bit-Rate.jpeg" width="400"/>


# Reading in Audio Files
There are many types of audio files: `mp3`, `wav`, `m4a`, `flac`, `ogg`

In [None]:
audio_files = glob('UrbanSound8K/audio/fold1/*.wav')

In [None]:
# Play audio file
ipd.Audio(audio_files[0])

In [None]:
y, sr = librosa.load(audio_files[0])
print(f'y: {y[:10]}')
print(f'shape y: {y.shape}')
print(f'sr: {sr}')

In [None]:
pd.Series(y).plot(figsize=(10, 5),
                  lw=1,
                  title='Raw Audio Example',
                 color=color_pal[0])
plt.show()

In [None]:
# Trimming leading/lagging silence
y_trimmed, _ = librosa.effects.trim(y, top_db=20)
pd.Series(y_trimmed).plot(figsize=(10, 5),
                  lw=1,
                  title='Raw Audio Trimmed Example',
                 color=color_pal[1])
plt.show()

In [None]:
pd.Series(y[30000:30500]).plot(figsize=(10, 5),
                  lw=1,
                  title='Raw Audio Zoomed In Example',
                 color=color_pal[2])
plt.show()

# Spectogram

In [None]:
D = librosa.stft(y)
S_db = librosa.amplitude_to_db(np.abs(D), ref=np.max)
S_db.shape

In [None]:
# Plot the transformed audio data
fig, ax = plt.subplots(figsize=(10, 5))
img = librosa.display.specshow(S_db,
                              x_axis='time',
                              y_axis='log',
                              ax=ax)
ax.set_title('Spectogram Example', fontsize=20)
fig.colorbar(img, ax=ax, format=f'%0.2f')
plt.show()

# Mel Spectogram

In [None]:
S = librosa.feature.melspectrogram(y=y,
                                   sr=sr,
                                   n_mels=512,)
S_db_mel = librosa.amplitude_to_db(S, ref=np.max)

In [None]:
fig, ax = plt.subplots(figsize=(10, 5))
# Plot the mel spectogram
img = librosa.display.specshow(S_db_mel,
                              x_axis='time',
                              y_axis='log',
                              ax=ax)
ax.set_title('Mel Spectogram Example', fontsize=20)
fig.colorbar(img, ax=ax, format=f'%0.2f')
plt.show()

# Data preparation

This code processes audio data from the UrbanSound8K dataset to prepare features and labels for a machine learning model. The key steps include:

1. Feature Extraction:

    Each WAV file is converted into a log Mel spectrogram using librosa, representing its frequency content.
    Audio is standardized to 4 seconds by either truncating longer files or padding shorter ones by repeating the signal.

2. Label Processing:

    Labels are extracted from the filenames and converted into integers.
    Labels are then one-hot encoded for compatibility with classification models.

3. Data Saving:

    Features and one-hot encoded labels are saved as .npy files for efficient loading and use in subsequent steps.

4. Structure:

    The process is applied to each fold of the UrbanSound8K dataset, and results are saved in a specified output directory.
    This ensures the audio data is preprocessed and organized for easy integration into deep learning pipelines.

In [None]:
import glob
import os
import librosa
import numpy as np
from librosa.feature import melspectrogram

def LOG_MEL_SPEC(parent_dir, sub_dir):
    labels = []
    log_mel_spectrogram = []
    exten = "*.wav"
    
    for filename in glob.glob(os.path.join(parent_dir, sub_dir, exten)):
        # Extract label from filename - assuming the class name is embedded in the filename
        label = filename.split('fold')[1].split('-')[1]  # Adjust this as necessary for the filename format
        labels.append(int(label))  # Convert label to integer
        f, sr = librosa.load(filename, sr=16000)  # Load audio file with 16kHz sample rate
        
        # Define target duration (4 seconds)
        four_sec_samples = 4 * sr
        if len(f) >= four_sec_samples:
            # Take first 4 seconds if the audio is long enough
            log_mel_spec = librosa.power_to_db(
                melspectrogram(y=f[:four_sec_samples], sr=sr, n_fft=1024, hop_length=128)
            )
        else:
            # Pad audio to 4 seconds if it's shorter
            while len(f) < four_sec_samples:
                f = np.concatenate((f, f))  # Repeat audio to pad
            log_mel_spec = librosa.power_to_db(
                melspectrogram(y=f[:four_sec_samples], sr=sr, n_fft=1024, hop_length=128)
            )
        
        log_mel_spectrogram.append(log_mel_spec)
    
    return np.array(log_mel_spectrogram), np.array(labels, dtype=int)

def encode(labels):
    # One hot encoding of labels
    labels_total = len(labels)
    unique_labels_total = len(np.unique(labels))
    one_hot_encoded = np.zeros((labels_total, unique_labels_total))
    one_hot_encoded[np.arange(labels_total), labels] = 1
    return one_hot_encoded

def file_creator(final_path, filename):
    new_path = os.path.join(os.getcwd(), final_path)
    if not os.path.exists(new_path):
        os.makedirs(new_path)
    return os.path.join(new_path, filename)

# Set the parent directory where UrbanSound8K data is stored
parent_directory = 'UrbanSound8K/audio'  # Adjust this path as needed
final_dir = "UrbanSound8K/UrbanSound8K_Processed"

# Process each fold and save the features and labels
sub_dirs = ['fold1', 'fold2', 'fold3', 'fold4', 'fold5', 'fold6', 'fold7', 'fold8', 'fold9', 'fold10']

for sub_dir in sub_dirs:
    print(f"Processing {sub_dir}...")
    features, labels = LOG_MEL_SPEC(parent_directory, sub_dir)
    
    # One hot encode the labels
    labels_encoded = encode(labels)
    
    # Create filenames for saving features and labels
    feature_file = file_creator(final_dir, f'{sub_dir}_features.npy')
    labels_file = file_creator(final_dir, f'{sub_dir}_labels.npy')
    
    # Save the extracted features and labels
    np.save(feature_file, features)
    print(f"Saved features for {sub_dir} at {feature_file}")
    np.save(labels_file, labels_encoded)
    print(f"Saved labels for {sub_dir} at {labels_file}")


1. Fold Assignment and Data Preparation
Assigns specific folds for training, validation, and testing, ensuring proper data splits. Also loads preprocessed features and labels.

In [None]:
# Load the features and labels for each fold
processed_dir = 'UrbanSound8K/UrbanSound8K_Processed'

# Specify folds
folds = [f'fold{i}' for i in range(1, 11)]  # Folds 1 to 10

# Assign test, validation, and training folds dynamically
test_fold = folds[9]  # Use fold 8 for testing (index 7)
validation_fold = folds[8]  # Use fold 9 for validation (index 8)
train_folds = [fold for fold in folds if fold not in [test_fold, validation_fold]]  # Remaining for training

# Debugging: Print the fold assignments
print(f"Training folds: {train_folds}")
print(f"Validation fold: {validation_fold}")
print(f"Test fold: {test_fold}")

2. Data Loading Function
Defines a function to load features and labels for specified folds and concatenates them.

In [None]:
def load_data(folds, processed_dir):
    features_list, labels_list = [], []
    for fold in folds:
        features_file = f"{processed_dir}/{fold}_features.npy"
        labels_file = f"{processed_dir}/{fold}_labels.npy"
        features_list.append(np.load(features_file))
        labels_list.append(np.load(labels_file))
    return np.concatenate(features_list), np.concatenate(labels_list)
# Load training, validation, and test data
X_train, y_train = load_data(train_folds, processed_dir)
X_val, y_val = load_data([validation_fold], processed_dir)
X_test, y_test = load_data([test_fold], processed_dir)

# Reshape features for CNN input: (samples, height, width, channels)
X_train = X_train[..., np.newaxis]
X_val = X_val[..., np.newaxis]
X_test = X_test[..., np.newaxis]

# Ensure labels are one-hot encoded
num_classes = 10
y_train = np.reshape(y_train, (-1, num_classes))
y_val = np.reshape(y_val, (-1, num_classes))
y_test = np.reshape(y_test, (-1, num_classes))

print(f"X_train shape: {X_train.shape}, y_train shape: {y_train.shape}")
print(f"X_val shape: {X_val.shape}, y_val shape: {y_val.shape}")
print(f"X_test shape: {X_test.shape}, y_test shape: {y_test.shape}")


3. CNN Model Architecture
Defines a sequential CNN model for classification, including convolutional layers, pooling, and dense layers.

In [None]:
model = models.Sequential([
    layers.InputLayer(input_shape=(128, 251, 1)),  # Input shape matches your data
    
    # Example of convolutional layers with 'same' padding
    layers.Conv2D(32, (3, 3), activation='tanh'),  # 'same' padding keeps dimensions
    layers.MaxPooling2D(pool_size=(2, 2)),  # Pooling reduces dimensions

    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D(pool_size=(2, 2)),  # Pooling further reduces dimensions

    # Example of a smaller pool size to prevent dimension collapse
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D(pool_size=(2, 2)),  # More pooling layers

    # Flatten and output layers
    layers.GlobalAveragePooling2D(),
    layers.Dense(128,activation='relu'),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')  # Adjust to match the number of classes in your dataset
])

model.summary()

4. Generating Noisy Examples
Generates noisy data based on misclassified validation samples to augment the training data.

In [None]:
def generate_noisy_examples(X, y, model, noise_factor=0.1, ratio=0.05):
    """
    Generate a limited number of noisy examples for misclassified samples.
    Args:
        X: Input validation data.
        y: True labels for validation data.
        model: Trained model for prediction.
        noise_factor: The magnitude of noise to add.
        ratio: Proportion of noisy samples to generate relative to misclassifications (e.g., 0.1 for 1:10).
    Returns:
        X_noisy: Generated noisy samples.
        y_noisy: Corresponding labels for noisy samples.
    """
    # Predict on the validation data
    y_pred = model.predict(X)
    y_pred_labels = np.argmax(y_pred, axis=1)
    y_true_labels = np.argmax(y, axis=1)
    
    # Identify misclassified samples
    misclassified_indices = np.where(y_pred_labels != y_true_labels)[0]
        # Determine the number of noisy samples to generate
    num_noisy_samples = max(1, int(len(misclassified_indices) * ratio))  # At least 1 sample
    print(f"Total misclassified: {len(misclassified_indices)}, Ratio: {ratio}, Noisy samples to generate: {num_noisy_samples}")

    
    # Determine the number of noisy samples to generate
    num_noisy_samples = max(1, int(len(misclassified_indices) * ratio))  # At least 1 sample
    selected_indices = np.random.choice(misclassified_indices, size=num_noisy_samples, replace=False)
    
    X_noisy = []
    y_noisy = []
    
    for idx in selected_indices:
        # Add Gaussian noise to the misclassified sample
        noise = noise_factor * np.random.normal(size=X[idx].shape)
        noisy_sample = np.clip(X[idx] + noise, 0, 1)  # Ensure values are within range
        X_noisy.append(noisy_sample)
        y_noisy.append(y[idx])  # Use the correct label
    
    return np.array(X_noisy), np.array(y_noisy)

5. Training the Model
Trains the model with early stopping, dynamically adds noisy examples, and saves the best weights.

In [None]:
# Compile the model
model.compile(optimizer=Adam(learning_rate=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

# Callbacks
early_stopping = EarlyStopping(monitor='val_loss', patience=6, restore_best_weights=True)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3, min_lr=1e-6)

# Training with noisy examples between epochs

history = []  # To store history across all epochs
max_epochs = 50
patience_limit = 6  # To check for manual early stopping (optional)

# Initialize counters and flags
early_stop_counter = 0
best_val_loss = float('inf')  # Start with a very high value

best_weights = None  # Initialize variable to store the best weights

for epoch in range(1, max_epochs + 1):
    print(f"Epoch {epoch}/{max_epochs}")

    # Train model for one epoch
    hist = model.fit(
        X_train,
        y_train,
        batch_size=64,
        epochs=1,
        validation_data=(X_val, y_val),
        verbose=1,
        callbacks=[early_stopping, reduce_lr]
    )

    # Store history
    history.append(hist.history)

    # Extract validation loss for manual early stopping check
    val_loss = hist.history.get('val_loss', [None])[-1]
    if val_loss is not None:
        print(f"Validation Loss: {val_loss:.4f}")
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            best_weights = model.get_weights()  # Save best weights
            early_stop_counter = 0  # Reset counter if there's improvement
        else:
            early_stop_counter += 1  # Increment counter if no improvement

    # Check for early stopping (manual or callback-based)
    if early_stop_counter >= patience_limit:
        print(f"Early stopping triggered after {epoch} epochs. Best val_loss: {best_val_loss:.4f}")
        model.set_weights(best_weights)  # Restore the best weights
        break

    # Generate noisy examples based on validation performance
    X_noisy, y_noisy = generate_noisy_examples(X_val, y_val, model, noise_factor=0.05, ratio=0.05)

    # Inject noisy examples back into training
    if len(X_noisy) > 0:
        X_train = np.concatenate([X_train, X_noisy])
        y_train = np.concatenate([y_train, y_noisy])
        print(f"Added {len(X_noisy)} noisy examples to the training set.")





6. Model Evaluation
Evaluates the model on the test dataset.

In [None]:
# Evaluate on the test set
test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f"Test Loss: {test_loss}, Test Accuracy: {test_accuracy}")
