## Transfer Learning using Kaggle Models

In this notebook, I've demonstrated how to perform audio classification using a pre-trained model from [Kaggle models](https://www.kaggle.com/models), called [yamnet](https://www.kaggle.com/models/google/yamnet).

## Imports

In [46]:
!pip install tensorflow_io==0.23.1
!pip install tensorflow==2.7.1

Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com


In [47]:
!pip install soundfile

Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com


In [48]:
import numpy as np 
import pandas as pd 
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_io as tfio
import os, random
import shutil
from pydub import AudioSegment
from glob import glob #2 List the files in a directory
from pathlib import Path
from IPython.display import display, Audio
import soundfile as sf

## Load and Pre-process the dataset
For time and memory management, we'll be taking random sample of 15 birds, we'll also convert the audio files from ogg to wav because only wav can be used as input to the yamnet model.


In [49]:
ROOT = "/home/gridsan/clast/hackathon-april"
train_metadata = pd.read_csv(os.path.join(ROOT, 'metadata.csv'))[['primary_label', 'filename']]
train_metadata['filepath'] = 'data/Binary_Drone_Audio/' + train_metadata['filename']
train_metadata

Unnamed: 0,primary_label,filename,filepath
0,0,unknown/1-100032-A-00.wav,data/Binary_Drone_Audio/unknown/1-100032-A-00.wav
1,0,unknown/1-100032-A-01.wav,data/Binary_Drone_Audio/unknown/1-100032-A-01.wav
2,0,unknown/1-100032-A-02.wav,data/Binary_Drone_Audio/unknown/1-100032-A-02.wav
3,0,unknown/1-100032-A-03.wav,data/Binary_Drone_Audio/unknown/1-100032-A-03.wav
4,0,unknown/1-100032-A-04.wav,data/Binary_Drone_Audio/unknown/1-100032-A-04.wav
...,...,...,...
11699,1,yes_drone/mixed_membo_9-membo_000_.wav,data/Binary_Drone_Audio/yes_drone/mixed_membo_...
11700,1,yes_drone/mixed_membo_9-membo_001_.wav,data/Binary_Drone_Audio/yes_drone/mixed_membo_...
11701,1,yes_drone/mixed_membo_9-membo_002_.wav,data/Binary_Drone_Audio/yes_drone/mixed_membo_...
11702,1,yes_drone/mixed_membo_9-membo_003_.wav,data/Binary_Drone_Audio/yes_drone/mixed_membo_...


In [50]:
#Random sample of 15 birds
classes = set(random.sample(train_metadata['primary_label'].unique().tolist(), 2)) 
print(classes)

{0, 1}


In [51]:
train_metadata = train_metadata[train_metadata.primary_label.apply(lambda x: x in classes)].reset_index(drop=True)
keys = set(train_metadata.primary_label)
values = np.arange(0, len(keys))
code_dict = dict(zip(sorted(keys), values))
train_metadata['label'] = train_metadata['primary_label'].apply(lambda x: code_dict[x])
train_metadata.head()

Unnamed: 0,primary_label,filename,filepath,label
0,0,unknown/1-100032-A-00.wav,data/Binary_Drone_Audio/unknown/1-100032-A-00.wav,0
1,0,unknown/1-100032-A-01.wav,data/Binary_Drone_Audio/unknown/1-100032-A-01.wav,0
2,0,unknown/1-100032-A-02.wav,data/Binary_Drone_Audio/unknown/1-100032-A-02.wav,0
3,0,unknown/1-100032-A-03.wav,data/Binary_Drone_Audio/unknown/1-100032-A-03.wav,0
4,0,unknown/1-100032-A-04.wav,data/Binary_Drone_Audio/unknown/1-100032-A-04.wav,0


In [52]:
classes_df = pd.DataFrame()
classes_df = train_metadata.filter(['primary_label','label'],axis=1)
classes_df = classes_df.drop_duplicates()
classes_df.reset_index(drop=True, inplace=True)
classes_df

Unnamed: 0,primary_label,label
0,0,0
1,1,1


In [53]:
train_list = []

for x in classes_df['label']:
    train_sng_temp = train_metadata[train_metadata['label'] == x]
    train_list.append(train_sng_temp)
print(train_list[0])

       primary_label                    filename  \
0                  0   unknown/1-100032-A-00.wav   
1                  0   unknown/1-100032-A-01.wav   
2                  0   unknown/1-100032-A-02.wav   
3                  0   unknown/1-100032-A-03.wav   
4                  0   unknown/1-100032-A-04.wav   
...              ...                         ...   
10367              0  unknown/white_noise007.wav   
10368              0  unknown/white_noise008.wav   
10369              0  unknown/white_noise009.wav   
10370              0  unknown/white_noise010.wav   
10371              0  unknown/white_noise011.wav   

                                                filepath  label  
0      data/Binary_Drone_Audio/unknown/1-100032-A-00.wav      0  
1      data/Binary_Drone_Audio/unknown/1-100032-A-01.wav      0  
2      data/Binary_Drone_Audio/unknown/1-100032-A-02.wav      0  
3      data/Binary_Drone_Audio/unknown/1-100032-A-03.wav      0  
4      data/Binary_Drone_Audio/unknown/1-1000

In [54]:
DATASET_ROOT = os.path.join("")
DATASET_AUDIO_PATH = os.path.join('./Data_Train/')

In [55]:
for x in range(classes_df[classes_df.columns[1]].count()): 
    if os.path.exists(DATASET_AUDIO_PATH + "/" + classes_df['primary_label'][x]) is False:
        os.makedirs(DATASET_AUDIO_PATH + "/" + classes_df['primary_label'][x])
    for z in range(train_metadata.pivot_table(index = ['primary_label'], aggfunc ='size').min()):
        data, samplerate = sf.read("/kaggle/input/birdclef-2023/train_audio/" + str(train_list[x].iat[z,1])) 
        sf.write(DATASET_AUDIO_PATH +    str(train_list[x].iat[z,1])[:-4] + ".wav",data, samplerate, subtype='PCM_16')

TypeError: can only concatenate str (not "numpy.int64") to str

In [None]:
train_metadata.head()

Unnamed: 0,primary_label,filename,filepath,label
0,0,unknown/1-100032-A-00.wav,data/Binary_Drone_Audio/unknown/1-100032-A-00.wav,0
1,0,unknown/1-100032-A-01.wav,data/Binary_Drone_Audio/unknown/1-100032-A-01.wav,0
2,0,unknown/1-100032-A-02.wav,data/Binary_Drone_Audio/unknown/1-100032-A-02.wav,0
3,0,unknown/1-100032-A-03.wav,data/Binary_Drone_Audio/unknown/1-100032-A-03.wav,0
4,0,unknown/1-100032-A-04.wav,data/Binary_Drone_Audio/unknown/1-100032-A-04.wav,0


In [None]:
for x in train_metadata.index:
    train_metadata['new_filepath'] = DATASET_AUDIO_PATH + str(train_metadata['filename'][0])[:-4] + ".wav"
train_metadata.head()

Unnamed: 0,primary_label,filename,filepath,label,new_filepath
0,0,unknown/1-100032-A-00.wav,data/Binary_Drone_Audio/unknown/1-100032-A-00.wav,0,./Data_Train/unknown/1-100032-A-00.wav
1,0,unknown/1-100032-A-01.wav,data/Binary_Drone_Audio/unknown/1-100032-A-01.wav,0,./Data_Train/unknown/1-100032-A-00.wav
2,0,unknown/1-100032-A-02.wav,data/Binary_Drone_Audio/unknown/1-100032-A-02.wav,0,./Data_Train/unknown/1-100032-A-00.wav
3,0,unknown/1-100032-A-03.wav,data/Binary_Drone_Audio/unknown/1-100032-A-03.wav,0,./Data_Train/unknown/1-100032-A-00.wav
4,0,unknown/1-100032-A-04.wav,data/Binary_Drone_Audio/unknown/1-100032-A-04.wav,0,./Data_Train/unknown/1-100032-A-00.wav


In [None]:
filenames = train_metadata['filepath']
targets = train_metadata['label']

main_ds = tf.data.Dataset.from_tensor_slices((filenames, targets))
main_ds.element_spec

2024-04-20 12:14:19.204052: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2024-04-20 12:14:19.204921: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2024-04-20 12:14:19.204948: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (login-2): /proc/driver/nvidia/version does not exist
2024-04-20 12:14:19.275306: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


(TensorSpec(shape=(), dtype=tf.string, name=None),
 TensorSpec(shape=(), dtype=tf.int64, name=None))

In [None]:
filenames, targets

(0        data/Binary_Drone_Audio/unknown/1-100032-A-00.wav
 1        data/Binary_Drone_Audio/unknown/1-100032-A-01.wav
 2        data/Binary_Drone_Audio/unknown/1-100032-A-02.wav
 3        data/Binary_Drone_Audio/unknown/1-100032-A-03.wav
 4        data/Binary_Drone_Audio/unknown/1-100032-A-04.wav
                                ...                        
 11699    data/Binary_Drone_Audio/yes_drone/mixed_membo_...
 11700    data/Binary_Drone_Audio/yes_drone/mixed_membo_...
 11701    data/Binary_Drone_Audio/yes_drone/mixed_membo_...
 11702    data/Binary_Drone_Audio/yes_drone/mixed_membo_...
 11703    data/Binary_Drone_Audio/yes_drone/mixed_membo_...
 Name: filepath, Length: 11704, dtype: object,
 0        0
 1        0
 2        0
 3        0
 4        0
         ..
 11699    1
 11700    1
 11701    1
 11702    1
 11703    1
 Name: label, Length: 11704, dtype: int64)

## Utility functions for loading audio files

In [None]:
@tf.function
def load_wav_16k_mono(filename):
    """ Load a WAV file, convert it to a float tensor, resample to 16 kHz single-channel audio. """
    file_contents = tf.io.read_file(filename)
    wav, sample_rate = tf.audio.decode_wav(
          file_contents,
          desired_channels=1)
    wav = tf.squeeze(wav, axis=-1)
    sample_rate = tf.cast(sample_rate, dtype=tf.int64)
    wav = tfio.audio.resample(wav, rate_in=sample_rate, rate_out=16000)
    return wav

In [None]:
def load_wav_for_map(filename, label):
    return load_wav_16k_mono(filename), label

In [None]:
main_ds

<TensorSliceDataset shapes: ((), ()), types: (tf.string, tf.int64)>

In [None]:
main_ds = main_ds.map(load_wav_for_map)



In [None]:
main_ds.element_spec

(TensorSpec(shape=<unknown>, dtype=tf.float32, name=None),
 TensorSpec(shape=(), dtype=tf.int64, name=None))

## Loading the Model

In [None]:
yamnet_model_handle = 'https://kaggle.com/models/google/yamnet/frameworks/TensorFlow2/variations/yamnet/versions/1'
yamnet_model = hub.load(yamnet_model_handle)

In [None]:
# applies the embedding extraction model to a wav data
def extract_embedding(wav_data, label):
  ''' run YAMNet to extract embedding from the wav data '''
  scores, embeddings, spectrogram = yamnet_model(wav_data)
  num_embeddings = tf.shape(embeddings)[0]
  return (embeddings,
            tf.repeat(label, num_embeddings))

# extract embedding
main_ds = main_ds.map(extract_embedding).unbatch()
main_ds.element_spec

(TensorSpec(shape=(1024,), dtype=tf.float32, name=None),
 TensorSpec(shape=(), dtype=tf.int64, name=None))

In [None]:
cached_ds = main_ds.cache()

In [None]:
train_ds = cached_ds.cache().shuffle(1000).batch(32).repeat().prefetch(tf.data.AUTOTUNE)


In [None]:
train_ds

<PrefetchDataset shapes: ((None, 1024), (None,)), types: (tf.float32, tf.int64)>

In [None]:
my_model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(1024), dtype=tf.float32,
                          name='input_embedding'),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(len(classes))
], name='my_model')

my_model.summary()

Model: "my_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense (Dense)               (None, 512)               524800    
                                                                 
 dense_1 (Dense)             (None, 2)                 1026      
                                                                 
Total params: 525,826
Trainable params: 525,826
Non-trainable params: 0
_________________________________________________________________


In [None]:
my_model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                 optimizer="adam",
                 metrics=['accuracy'])

callback = tf.keras.callbacks.EarlyStopping(monitor='loss',
                                            patience=3,
                                            restore_best_weights=True)

In [None]:
STEPS_PER_EPOCH = train_metadata.shape[0] // 32

In [None]:
train_metadata.shape[0]

11704

## Training the Model

In [None]:
STEPS_PER_EPOCH

365

In [None]:
train_ds

<PrefetchDataset shapes: ((None, 1024), (None,)), types: (tf.float32, tf.int64)>

In [None]:
history = my_model.fit(train_ds,
                       steps_per_epoch = STEPS_PER_EPOCH,
                       epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [56]:
import os

def list_files_in_directory(directory):
    """Returns a list of full file paths from a specified directory."""
    file_paths = [os.path.join(directory, file) for file in os.listdir(directory)
                  if os.path.isfile(os.path.join(directory, file))]
    return file_paths

# Specify the directory
directory = 'data/Binary_Drone_Audio/yes_drone'

# Get the list of files
files = list_files_in_directory(directory)
print(files)


['data/Binary_Drone_Audio/yes_drone/B_S2_D1_089-bebop_003_.wav', 'data/Binary_Drone_Audio/yes_drone/Membo_2_016-membo_004_.wav', 'data/Binary_Drone_Audio/yes_drone/Membo_2_020-membo_004_.wav', 'data/Binary_Drone_Audio/yes_drone/B_S2_D1_086-bebop_001_.wav', 'data/Binary_Drone_Audio/yes_drone/mixed_membo_32-membo_003_.wav', 'data/Binary_Drone_Audio/yes_drone/mixed_membo_15-membo_000_.wav', 'data/Binary_Drone_Audio/yes_drone/mixed_membo_37-membo_001_.wav', 'data/Binary_Drone_Audio/yes_drone/B_S2_D1_082-bebop_001_.wav', 'data/Binary_Drone_Audio/yes_drone/mixed_membo_33-membo_004_.wav', 'data/Binary_Drone_Audio/yes_drone/mixed_membo_1-membo_001_.wav', 'data/Binary_Drone_Audio/yes_drone/Membo_0_040-membo_004_.wav', 'data/Binary_Drone_Audio/yes_drone/B_S2_D1_112-bebop_004_.wav', 'data/Binary_Drone_Audio/yes_drone/mixed_51-bebop_001_.wav', 'data/Binary_Drone_Audio/yes_drone/mixed_34-bebop_000_.wav', 'data/Binary_Drone_Audio/yes_drone/B_S2_D1_129-bebop_002_.wav', 'data/Binary_Drone_Audio/yes_dr

In [57]:
# List of file paths

# Create a TensorFlow dataset from the file paths
test_ds = tf.data.Dataset.from_tensor_slices(files[3:100])

# Apply the function to load and preprocess the audio
test_ds = test_ds.map(load_wav_16k_mono)

In [58]:
test_ds

<MapDataset shapes: <unknown>, types: tf.float32>

In [59]:
# Modified extract_embedding function for prediction (no labels needed)
def extract_embedding_for_prediction(wav_data):
    scores, embeddings, spectrogram = yamnet_model(wav_data)
    return embeddings

# Apply the function to extract embeddings
test_ds = test_ds.map(extract_embedding_for_prediction).unbatch()


In [60]:
test_ds

<_UnbatchDataset shapes: (1024,), types: tf.float32>

In [61]:
# Batch the dataset
batch_size = 32  # You can adjust the batch size according to your system's capability
test_ds = test_ds.batch(batch_size)


In [62]:
test_ds

<BatchDataset shapes: (None, 1024), types: tf.float32>

In [63]:
# Make predictions
predictions = my_model.predict(test_ds)


In [65]:
import numpy as np

def softmax(x):
    e_x = np.exp(x - np.max(x, axis=1, keepdims=True))
    return e_x / e_x.sum(axis=1, keepdims=True)

# Applying softmax to the predictions array
probabilities = softmax(predictions)


In [66]:
predicted_classes = np.argmax(probabilities, axis=1)


In [None]:

# Print results
print("Probabilities:\n", probabilities)
print("Predicted Classes:", predicted_classes)

## Conclusion

Here in this notebook, I've illustrated how [Kaggle models](https://www.kaggle.com/models) can be used to perform audio classification using a pre-trained model, called [yamnet](https://www.kaggle.com/models/google/yamnet), with an accuracy of more than 95%.

Now, it's your turn to create some amazing transfer learning notebooks using [Kaggle Models](https://www.kaggle.com/models)

## Useful resources which helped 
* https://www.kaggle.com/models/google/yamnet
* https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/audio/transfer_learning_audio.ipynb
* https://www.kaggle.com/code/asisheriberto/convert-ogg-to-wav-and-predict/notebook