In [None]:
from google.colab import drive
drive.mount('/content/drive/')

Mounted at /content/drive/


This code defines a function called `load_data` that loads audio data from a specified directory and converts it into a format suitable for machine learning. It processes audio files organized into subfolders based on their labels, such as 'clean_audio', 'gaussian_noise', and 'impulse_noise'. For each audio file, it uses the `librosa` library to read the audio and extract Mel-frequency cepstral coefficients (MFCCs), which are useful features for audio analysis. The function ensures that all MFCC arrays have a consistent length (defined by `max_length`) by either truncating longer arrays or padding shorter ones with zeros. Finally, it returns two arrays: one containing the processed MFCC data (`X`) and the other containing the corresponding labels (`y`) for each audio file.

In [None]:
import os
import numpy as np
import librosa
import tensorflow as tf

def load_data(data_dir, max_length=100):  # Set a max length for MFCC
    X, y = [], []
    labels = {'clean_audio': 0, 'gaussian_noise': 1, 'impulse_noise': 2}

    for label, index in labels.items():
        folder_path = os.path.join(data_dir, label)
        for filename in os.listdir(folder_path):
            if filename.endswith('.wav'):
                file_path = os.path.join(folder_path, filename)
                audio, sr = librosa.load(file_path, sr=None)
                mfccs = librosa.feature.mfcc(y=audio, sr=sr, n_mfcc=13)

                # Pad or truncate the MFCCs
                if mfccs.shape[1] > max_length:
                    mfccs = mfccs[:, :max_length]  # Truncate
                elif mfccs.shape[1] < max_length:
                    pad_width = max_length - mfccs.shape[1]
                    mfccs = np.pad(mfccs, ((0, 0), (0, pad_width)), mode='constant')  # Pad

                X.append(mfccs)
                y.append(index)

    return np.array(X), np.array(y)

This code defines a neural network model using TensorFlow's Keras library and prepares it for training on audio data. First, it creates a function called `build_model` that sets up a sequential model with several layers: a 1D convolutional layer to extract features, a max pooling layer to reduce dimensionality, an LSTM layer to capture temporal patterns, and a dense layer to make predictions. The final layer uses the softmax activation function to classify the data into three categories. The code then loads the audio data using the `load_data` function and reshapes it to fit the LSTM's input requirements. Next, it splits the data into training and validation sets using `train_test_split` to ensure the model can be evaluated properly. Finally, the model is compiled with a loss function and an optimizer, and it is trained on the training set for 30 epochs, while also validating its performance on the validation set.

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM, Flatten, Conv1D, MaxPooling1D, Dropout

def build_model(input_shape):
    model = Sequential()
    model.add(Conv1D(32, kernel_size=3, activation='relu', input_shape=input_shape))
    model.add(MaxPooling1D(pool_size=2))
    model.add(LSTM(64, return_sequences=True))
    model.add(Flatten())
    model.add(Dropout(0.5))  # Regularization
    model.add(Dense(32, activation='relu'))
    model.add(Dense(3, activation='softmax'))  # 3 classes
    return model

X, y = load_data('/content/drive/MyDrive/Data Directory/')

# Reshape X to have the right input shape for LSTM
X = np.array([x.reshape(-1, 13) for x in X])  # Reshape for LSTM input

# Split your data into training and validation sets
from sklearn.model_selection import train_test_split
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

model = build_model(X_train.shape[1:])
model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=30)


Epoch 1/30
[1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 57ms/step - accuracy: 0.4066 - loss: 1.1030 - val_accuracy: 0.5724 - val_loss: 0.8561
Epoch 2/30
[1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 53ms/step - accuracy: 0.6263 - loss: 0.8023 - val_accuracy: 0.6103 - val_loss: 0.7793
Epoch 3/30
[1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 67ms/step - accuracy: 0.6769 - loss: 0.7185 - val_accuracy: 0.6414 - val_loss: 0.7552
Epoch 4/30
[1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 57ms/step - accuracy: 0.7497 - loss: 0.5860 - val_accuracy: 0.6552 - val_loss: 0.7548
Epoch 5/30
[1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 60ms/step - accuracy: 0.7377 - loss: 0.5791 - val_accuracy: 0.6690 - val_loss: 0.6886
Epoch 6/30
[1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 35ms/step - accuracy: 0.8283 - loss: 0.4573 - val_accuracy: 0.7103 - val_loss: 0.6621
Epoch 7/30
[1m37/37[0m [32m━━━━

<keras.src.callbacks.history.History at 0x7c6b8fb5aa10>

In [None]:
# Save the model
model.save('/content/drive/MyDrive/Data Directory/audio_classification_model.h5')

# Load the model
loaded_model = tf.keras.models.load_model('/content/drive/MyDrive/Data Directory/audio_classification_model.h5')



The `predict_audio` function is designed to take an audio file, process it, and predict its class using a trained model. It starts by loading the audio file and extracting its Mel-frequency cepstral coefficients (MFCCs) using the `librosa` library. To ensure the MFCCs are the correct length for the model, it either truncates them if they are too long or pads them with zeros if they are too short. After that, the MFCCs are reshaped to match the input format required by the model. The function then uses the model to predict the class of the audio, obtaining a prediction score for each possible class. It determines the class index with the highest score using `np.argmax`, which corresponds to the predicted class. Finally, the function returns this class index. In the example usage, the function is called with a specific audio file, and the predicted class is printed to the console.

In [None]:
def predict_audio(file_path, model, max_length=100):
    audio, sr = librosa.load(file_path, sr=None)
    mfccs = librosa.feature.mfcc(y=audio, sr=sr, n_mfcc=13)

    # Pad or truncate the MFCCs
    if mfccs.shape[1] > max_length:
        mfccs = mfccs[:, :max_length]  # Truncate
    elif mfccs.shape[1] < max_length:
        pad_width = max_length - mfccs.shape[1]
        mfccs = np.pad(mfccs, ((0, 0), (0, pad_width)), mode='constant')  # Pad

    mfccs_reshaped = mfccs.reshape(1, max_length, 13)  # Reshape for model input
    prediction = model.predict(mfccs_reshaped)
    class_index = np.argmax(prediction)
    return class_index

# Example usage
result = predict_audio('/content/drive/MyDrive/Data Directory/noisy_audio_impulse_clean_audio2.wav', loaded_model)
print(f'Predicted class: {result}')


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 200ms/step
Predicted class: 2


Class Description:

`labels` = {`clean_audio`: 0, `gaussian_noise`: 1, `impulse_noise`: 2}