## Transfer Learning using Kaggle Models

In this notebook, I've demonstrated how to perform audio classification using a pre-trained model from [Kaggle models](https://www.kaggle.com/models), called [yamnet](https://www.kaggle.com/models/google/yamnet).

## Imports

In [46]:
!pip install tensorflow_io==0.23.1
!pip install tensorflow==2.7.1

Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com


In [47]:
!pip install soundfile

Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com


In [2]:
import numpy as np 
import pandas as pd 
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_io as tfio
import os, random
import shutil
from pydub import AudioSegment
from glob import glob #2 List the files in a directory
from pathlib import Path
from IPython.display import display, Audio
import soundfile as sf

2024-04-20 15:30:08.667301: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2024-04-20 15:30:08.667326: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


## Load and Pre-process the dataset
For time and memory management, we'll be taking random sample of 15 birds, we'll also convert the audio files from ogg to wav because only wav can be used as input to the yamnet model.


In [3]:
ROOT = "/home/gridsan/clast/hackathon-april"
train_metadata = pd.read_csv(os.path.join(ROOT, 'metadata.csv'))[['primary_label', 'filename']]
train_metadata['filepath'] = 'data/Binary_Drone_Audio/' + train_metadata['filename']
train_metadata

Unnamed: 0,primary_label,filename,filepath
0,0,unknown/1-100032-A-00.wav,data/Binary_Drone_Audio/unknown/1-100032-A-00.wav
1,0,unknown/1-100032-A-01.wav,data/Binary_Drone_Audio/unknown/1-100032-A-01.wav
2,0,unknown/1-100032-A-02.wav,data/Binary_Drone_Audio/unknown/1-100032-A-02.wav
3,0,unknown/1-100032-A-03.wav,data/Binary_Drone_Audio/unknown/1-100032-A-03.wav
4,0,unknown/1-100032-A-04.wav,data/Binary_Drone_Audio/unknown/1-100032-A-04.wav
...,...,...,...
11699,1,yes_drone/mixed_membo_9-membo_000_.wav,data/Binary_Drone_Audio/yes_drone/mixed_membo_...
11700,1,yes_drone/mixed_membo_9-membo_001_.wav,data/Binary_Drone_Audio/yes_drone/mixed_membo_...
11701,1,yes_drone/mixed_membo_9-membo_002_.wav,data/Binary_Drone_Audio/yes_drone/mixed_membo_...
11702,1,yes_drone/mixed_membo_9-membo_003_.wav,data/Binary_Drone_Audio/yes_drone/mixed_membo_...


In [4]:
#Random sample of 2 classes
classes = set(random.sample(train_metadata['primary_label'].unique().tolist(), 2)) 
print(classes)

{0, 1}


In [5]:
train_metadata = train_metadata[train_metadata.primary_label.apply(lambda x: x in classes)].reset_index(drop=True)
keys = set(train_metadata.primary_label)
values = np.arange(0, len(keys))
code_dict = dict(zip(sorted(keys), values))
train_metadata['label'] = train_metadata['primary_label'].apply(lambda x: code_dict[x])
train_metadata.head()

Unnamed: 0,primary_label,filename,filepath,label
0,0,unknown/1-100032-A-00.wav,data/Binary_Drone_Audio/unknown/1-100032-A-00.wav,0
1,0,unknown/1-100032-A-01.wav,data/Binary_Drone_Audio/unknown/1-100032-A-01.wav,0
2,0,unknown/1-100032-A-02.wav,data/Binary_Drone_Audio/unknown/1-100032-A-02.wav,0
3,0,unknown/1-100032-A-03.wav,data/Binary_Drone_Audio/unknown/1-100032-A-03.wav,0
4,0,unknown/1-100032-A-04.wav,data/Binary_Drone_Audio/unknown/1-100032-A-04.wav,0


In [6]:
classes_df = pd.DataFrame()
classes_df = train_metadata.filter(['primary_label','label'],axis=1)
classes_df = classes_df.drop_duplicates()
classes_df.reset_index(drop=True, inplace=True)
classes_df

Unnamed: 0,primary_label,label
0,0,0
1,1,1


In [7]:
train_list = []

for x in classes_df['label']:
    train_sng_temp = train_metadata[train_metadata['label'] == x]
    train_list.append(train_sng_temp)
print(train_list[0])

       primary_label                    filename  \
0                  0   unknown/1-100032-A-00.wav   
1                  0   unknown/1-100032-A-01.wav   
2                  0   unknown/1-100032-A-02.wav   
3                  0   unknown/1-100032-A-03.wav   
4                  0   unknown/1-100032-A-04.wav   
...              ...                         ...   
10367              0  unknown/white_noise007.wav   
10368              0  unknown/white_noise008.wav   
10369              0  unknown/white_noise009.wav   
10370              0  unknown/white_noise010.wav   
10371              0  unknown/white_noise011.wav   

                                                filepath  label  
0      data/Binary_Drone_Audio/unknown/1-100032-A-00.wav      0  
1      data/Binary_Drone_Audio/unknown/1-100032-A-01.wav      0  
2      data/Binary_Drone_Audio/unknown/1-100032-A-02.wav      0  
3      data/Binary_Drone_Audio/unknown/1-100032-A-03.wav      0  
4      data/Binary_Drone_Audio/unknown/1-1000

In [8]:
DATASET_ROOT = os.path.join("")
DATASET_AUDIO_PATH = os.path.join('./Data_Train/')

In [9]:
train_metadata.head()

Unnamed: 0,primary_label,filename,filepath,label
0,0,unknown/1-100032-A-00.wav,data/Binary_Drone_Audio/unknown/1-100032-A-00.wav,0
1,0,unknown/1-100032-A-01.wav,data/Binary_Drone_Audio/unknown/1-100032-A-01.wav,0
2,0,unknown/1-100032-A-02.wav,data/Binary_Drone_Audio/unknown/1-100032-A-02.wav,0
3,0,unknown/1-100032-A-03.wav,data/Binary_Drone_Audio/unknown/1-100032-A-03.wav,0
4,0,unknown/1-100032-A-04.wav,data/Binary_Drone_Audio/unknown/1-100032-A-04.wav,0


In [10]:
filenames = train_metadata['filepath']
targets = train_metadata['label']

main_ds = tf.data.Dataset.from_tensor_slices((filenames, targets))
main_ds.element_spec

2024-04-20 15:31:55.297308: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2024-04-20 15:31:55.297342: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2024-04-20 15:31:55.297379: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (login-3): /proc/driver/nvidia/version does not exist
2024-04-20 15:31:55.297807: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


(TensorSpec(shape=(), dtype=tf.string, name=None),
 TensorSpec(shape=(), dtype=tf.int64, name=None))

In [11]:
filenames, targets

(0        data/Binary_Drone_Audio/unknown/1-100032-A-00.wav
 1        data/Binary_Drone_Audio/unknown/1-100032-A-01.wav
 2        data/Binary_Drone_Audio/unknown/1-100032-A-02.wav
 3        data/Binary_Drone_Audio/unknown/1-100032-A-03.wav
 4        data/Binary_Drone_Audio/unknown/1-100032-A-04.wav
                                ...                        
 11699    data/Binary_Drone_Audio/yes_drone/mixed_membo_...
 11700    data/Binary_Drone_Audio/yes_drone/mixed_membo_...
 11701    data/Binary_Drone_Audio/yes_drone/mixed_membo_...
 11702    data/Binary_Drone_Audio/yes_drone/mixed_membo_...
 11703    data/Binary_Drone_Audio/yes_drone/mixed_membo_...
 Name: filepath, Length: 11704, dtype: object,
 0        0
 1        0
 2        0
 3        0
 4        0
         ..
 11699    1
 11700    1
 11701    1
 11702    1
 11703    1
 Name: label, Length: 11704, dtype: int64)

## Utility functions for loading audio files

In [12]:
@tf.function
def load_wav_16k_mono(filename):
    """ Load a WAV file, convert it to a float tensor, resample to 16 kHz single-channel audio. """
    file_contents = tf.io.read_file(filename)
    wav, sample_rate = tf.audio.decode_wav(
          file_contents,
          desired_channels=1)
    wav = tf.squeeze(wav, axis=-1)
    sample_rate = tf.cast(sample_rate, dtype=tf.int64)
    wav = tfio.audio.resample(wav, rate_in=sample_rate, rate_out=16000)
    return wav

In [13]:
def load_wav_for_map(filename, label):
    return load_wav_16k_mono(filename), label

In [14]:
main_ds

<TensorSliceDataset shapes: ((), ()), types: (tf.string, tf.int64)>

In [15]:
main_ds = main_ds.map(load_wav_for_map)



In [16]:
main_ds.element_spec

(TensorSpec(shape=<unknown>, dtype=tf.float32, name=None),
 TensorSpec(shape=(), dtype=tf.int64, name=None))

## Loading the Model

In [17]:
yamnet_model_handle = 'https://kaggle.com/models/google/yamnet/frameworks/TensorFlow2/variations/yamnet/versions/1'
yamnet_model = hub.load(yamnet_model_handle)

In [18]:
# applies the embedding extraction model to a wav data
def extract_embedding(wav_data, label):
  ''' run YAMNet to extract embedding from the wav data '''
  scores, embeddings, spectrogram = yamnet_model(wav_data)
  num_embeddings = tf.shape(embeddings)[0]
  return (embeddings,
            tf.repeat(label, num_embeddings))

# extract embedding
main_ds = main_ds.map(extract_embedding).unbatch()
main_ds.element_spec

(TensorSpec(shape=(1024,), dtype=tf.float32, name=None),
 TensorSpec(shape=(), dtype=tf.int64, name=None))

In [54]:
# Specify the path where you want to save the model weights
weights_path = 'weights/my_model_weights.h5'

In [55]:
ROOT = "/home/gridsan/clast/hackathon-april"
train_metadata = pd.read_csv(os.path.join(ROOT, 'metadata.csv'))[['primary_label', 'filename']]
train_metadata['filepath'] = 'data/Binary_Drone_Audio/' + train_metadata['filename']
classes = set(random.sample(train_metadata['primary_label'].unique().tolist(), 2)) 


In [58]:
cached_ds = main_ds.cache()

In [59]:
train_ds = cached_ds.cache().shuffle(1000).batch(32).repeat().prefetch(tf.data.AUTOTUNE)


In [60]:
train_ds

<PrefetchDataset shapes: ((None, 1024), (None,)), types: (tf.float32, tf.int64)>

In [41]:
my_model = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(1024), dtype=tf.float32,
                          name='input_embedding'),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(len(classes))
], name='my_model')

my_model.summary()

Model: "my_model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_2 (Dense)             (None, 512)               524800    
                                                                 
 dense_3 (Dense)             (None, 2)                 1026      
                                                                 
Total params: 525,826
Trainable params: 525,826
Non-trainable params: 0
_________________________________________________________________


In [42]:
my_model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                 optimizer="adam",
                 metrics=['accuracy'])

callback = tf.keras.callbacks.EarlyStopping(monitor='loss',
                                            patience=3,
                                            restore_best_weights=True)

In [38]:
STEPS_PER_EPOCH = train_metadata.shape[0] // 32

In [37]:
train_metadata.shape[0]

11704

## Training the Model

In [39]:
STEPS_PER_EPOCH

365

In [35]:
train_ds

<PrefetchDataset shapes: ((None, 1024), (None,)), types: (tf.float32, tf.int64)>

In [43]:
history = my_model.fit(train_ds,
                       steps_per_epoch = STEPS_PER_EPOCH,
                       epochs=100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

In [44]:
def list_files_in_directory(directory):
    """Returns a list of full file paths from a specified directory."""
    file_paths = [os.path.join(directory, file) for file in os.listdir(directory)
                  if os.path.isfile(os.path.join(directory, file))]
    return file_paths

# Specify the directory
drone_directory = 'data/test_data'
unknown_directory = 'data/Binary_Drone_Audio/unknown'

# Get the list of files
drone_files = list_files_in_directory(drone_directory)
unknown_files = list_files_in_directory(unknown_directory)

files = [] 
files.extend(drone_files)
files.extend(unknown_files)
print(files)


['data/test_data/DRONE_018.wav', 'data/test_data/DRONE_007.wav', 'data/test_data/DRONE_016.wav', 'data/test_data/DRONE_009.wav', 'data/test_data/DRONE_002.wav', 'data/test_data/DRONE_013.wav', 'data/test_data/DRONE_003.wav', 'data/test_data/DRONE_014.wav', 'data/test_data/DRONE_015.wav', 'data/test_data/DRONE_025.wav', 'data/test_data/DRONE_022.wav', 'data/test_data/DRONE_005.wav', 'data/test_data/DRONE_030.wav', 'data/test_data/DRONE_023.wav', 'data/test_data/DRONE_021.wav', 'data/test_data/DRONE_020.wav', 'data/test_data/DRONE_017.wav', 'data/test_data/DRONE_008.wav', 'data/test_data/DRONE_024.wav', 'data/test_data/DRONE_027.wav', 'data/test_data/DRONE_012.wav', 'data/test_data/DRONE_006.wav', 'data/test_data/DRONE_028.wav', 'data/test_data/DRONE_029.wav', 'data/test_data/DRONE_026.wav', 'data/test_data/DRONE_004.wav', 'data/test_data/DRONE_019.wav', 'data/test_data/DRONE_011.wav', 'data/test_data/DRONE_010.wav', 'data/test_data/DRONE_001.wav', 'data/Binary_Drone_Audio/unknown/2-1093

In [45]:
print(len(drone_files))
print(len(unknown_files))

30
10372


In [46]:
# List of file paths

# Create a TensorFlow dataset from the file paths
test_ds = tf.data.Dataset.from_tensor_slices(files[0:100])

# Apply the function to load and preprocess the audio
test_ds = test_ds.map(load_wav_16k_mono)

In [88]:
test_ds

<MapDataset shapes: <unknown>, types: tf.float32>

In [47]:
# Modified extract_embedding function for prediction (no labels needed)
def extract_embedding_for_prediction(wav_data):
    scores, embeddings, spectrogram = yamnet_model(wav_data)
    return embeddings

# Apply the function to extract embeddings
test_ds = test_ds.map(extract_embedding_for_prediction).unbatch()


In [106]:
test_ds

<_UnbatchDataset shapes: (1024,), types: tf.float32>

In [48]:
# Batch the dataset
batch_size = 32  # You can adjust the batch size according to your system's capability
test_ds = test_ds.batch(batch_size)


In [28]:
test_ds

<BatchDataset shapes: (None, 1024), types: tf.float32>

In [49]:
# Make predictions
predictions = my_model.predict(test_ds)


In [50]:
print(len(predictions))


745


In [51]:

def softmax(x):
    e_x = np.exp(x - np.max(x, axis=1, keepdims=True))
    return e_x / e_x.sum(axis=1, keepdims=True)

# Applying softmax to the predictions array
probabilities = softmax(predictions)


In [52]:
predicted_classes = np.argmax(probabilities, axis=1)


In [53]:

# Print results
print("Probabilities:\n", probabilities)
print("Predicted Classes:", predicted_classes)

Probabilities:
 [[3.2867727e-13 1.0000000e+00]
 [6.0381335e-09 1.0000000e+00]
 [2.5595642e-14 1.0000000e+00]
 ...
 [1.0000000e+00 3.4318128e-09]
 [1.0000000e+00 3.2546690e-10]
 [9.9993706e-01 6.2904553e-05]]
Predicted Classes: [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
 0 0 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0
 0 0 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 0
 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1
 1 0 0 0 0 0 0 1 0 0 1 1 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 0 0 0 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 

In [97]:
# Specify the path where you want to save the model weights
weights_path = 'weights/my_model_weights.h5'

# Save the weights
my_model.save_weights(weights_path)


In [98]:
# Rebuild the model architecture
model_reloaded = tf.keras.Sequential([
    tf.keras.layers.Input(shape=(1024), dtype=tf.float32, name='input_embedding'),
    tf.keras.layers.Dense(512, activation='relu'),
    tf.keras.layers.Dense(len(classes))  # Make sure `classes` is defined or replace `len(classes)` with the actual number
], name='my_reloaded_model')

# Load the weights
model_reloaded.load_weights(weights_path)

# If your model requires compiling to make predictions, compile the model
model_reloaded.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
                       optimizer="adam",
                       metrics=['accuracy'])

# Now the model is ready to be used for predictions or evaluation


In [99]:
model_reloaded.predict(test_ds)

array([[-1.04249220e+01,  1.11071348e+01],
       [-7.76161528e+00,  8.42612934e+00],
       [-2.97999096e+01,  3.12642365e+01],
       [-1.42362499e+01,  1.50345612e+01],
       [-1.37223015e+01,  1.44489946e+01],
       [-3.17082329e+01,  3.24270935e+01],
       [-1.13672628e+01,  1.22905121e+01],
       [-1.02168550e+01,  1.18112144e+01],
       [-3.37094078e+01,  3.35786285e+01],
       [-2.30033302e+01,  2.33618908e+01],
       [-1.85081654e+01,  1.89501743e+01],
       [-1.15651894e+01,  1.18442163e+01],
       [-4.61016321e+00,  5.39763594e+00],
       [-3.53773155e+01,  3.80335159e+01],
       [-2.33605404e+01,  2.41827259e+01],
       [-1.32813873e+01,  1.42397614e+01],
       [-9.52165222e+00,  1.04242039e+01],
       [-1.31337137e+01,  1.37023087e+01],
       [-3.53678894e+01,  3.79014549e+01],
       [-1.71434898e+01,  1.82271042e+01],
       [-1.26559658e+01,  1.32701159e+01],
       [-3.49149590e+01,  3.69632378e+01],
       [-2.79181042e+01,  2.89041939e+01],
       [-2.

## Conclusion

Here in this notebook, I've illustrated how [Kaggle models](https://www.kaggle.com/models) can be used to perform audio classification using a pre-trained model, called [yamnet](https://www.kaggle.com/models/google/yamnet), with an accuracy of more than 95%.

Now, it's your turn to create some amazing transfer learning notebooks using [Kaggle Models](https://www.kaggle.com/models)

## Useful resources which helped 
* https://www.kaggle.com/models/google/yamnet
* https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/audio/transfer_learning_audio.ipynb
* https://www.kaggle.com/code/asisheriberto/convert-ogg-to-wav-and-predict/notebook