# <img src="https://img.icons8.com/bubbles/50/000000/mind-map.png" style="height:50px;display:inline"> ECE 046211 - Technion - Deep Learning
---

## Project
---

Checking the best finetuned AST model genre estimation on any music file that is in the directory `music_dir` below.
jamendo_music_samples is given as an example of a directory.
The estimation algorithm:
 * Randomly sample chunks of the song.
 * Calculates the model's most likely class for each chunk.
  * Estimates the final class as the chunks' majority class.

In [60]:
music_dir="jamendo_music_samples"

Install packages that are not part of the basic virtual environment defined on the ReadMe.

Import relevant packages

In [62]:
from datasets import Dataset,Audio
import numpy as np
from transformers import AutoFeatureExtractor, AutoModelForAudioClassification
import torch
import os
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

Upload the best finetuned AST model

In [63]:
model_id = "MIT/ast-finetuned-audioset-10-10-0.4593"
model = AutoModelForAudioClassification.from_pretrained('Best_Model').to(device)
feature_extractor = AutoFeatureExtractor.from_pretrained(model_id)
sampling_rate = feature_extractor.sampling_rate

Estimate music genre using the uploaded model.

In [64]:
#Create a list of files in music_dir directory
song_files=[os.path.join(music_dir,music_file) for music_file in os.listdir(music_dir)]
for song_file in song_files:
    #Upload a directory song to the workspace,
    audio= Dataset.from_dict({"audio": [song_file]}).cast_column("audio", Audio())
    #Adapt raw audio vector to the expected model's feature extractor sampling rate.
    audio = audio.cast_column("audio", Audio(sampling_rate=sampling_rate))
    #Sample random chunks of the song.
    max_duration = 30.0
    samples=20
    length=int(sampling_rate*max_duration)
    if audio['audio'][0]['array'].shape[0]<length:
        print('The music file must be at least 30 sec long.')
    dataset=[]
    for i in np.arange(samples):
        end=np.random.randint(length,audio['audio'][0]['array'].shape[0])
        dataset.append(audio['audio'][0]['array'][(end-length):end])
    #Calculate the audio array's spectrogram and preprocess it
    inputs = feature_extractor(dataset, sampling_rate=sampling_rate, return_tensors="pt").to(device)
    #Calculate the model's response to the sampled chunks
    with torch.no_grad():
        logits = model(**inputs).logits
    #Calculate each chunk most likely class
    predicted_classes_ids = torch.argmax(logits, axis=1).cpu().detach().numpy()
    #Calculate the majority class
    majority_vote_ids=np.argmax(np.bincount(predicted_classes_ids))
    majority_vote_class=model.config.id2label[majority_vote_ids]
    #Calculate average chosen class probability of the chunks that predicted the majority class
    avg_confidence=torch.mean(torch.nn.functional.softmax(logits,dim=1)[predicted_classes_ids==majority_vote_ids,majority_vote_ids]).cpu().detach().numpy()*100
    print(f"The estimated genre of the song {os.path.splitext(os.path.basename(song_file))[0]} is {majority_vote_class} with average confidence {avg_confidence:.0f}%")

  context_layer = torch.nn.functional.scaled_dot_product_attention(


The estimated genre of the song Blues_Burning_You is blues with average confidence 28%
The estimated genre of the song Classical_Clair_De_Lune is classical with average confidence 39%
The estimated genre of the song Country_Golden_Standard is country with average confidence 57%
The estimated genre of the song Disco_Magenta_Six is pop with average confidence 48%
The estimated genre of the song Hiphop_Royalty is pop with average confidence 30%
The estimated genre of the song Jazz_For_the_Fifth is jazz with average confidence 25%
The estimated genre of the song Metal_After_Us is metal with average confidence 48%
The estimated genre of the song Pop_Love_You_Anymore is pop with average confidence 81%
The estimated genre of the song Reggae_The_River is pop with average confidence 44%
