<a href="https://colab.research.google.com/github/ipanditi/ML/blob/main/spotify.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [4]:
import librosa
import os
import math
import json

dataset_path = "genres"
json_path = "data_json"

We first define the number of segments and sample rate of each segment. The sample rate is required in order to know the playback speed of the song.

In [5]:
sample_rate = 22050
samples_per_track = sample_rate*30
#Because 30 seconds is the length of each track

We then create a loop in which we open up every song file from each genre folder and split it into 10 segments.

[Link to learn about MFCC](http://practicalcryptography.com/miscellaneous/machine-learning/guide-mel-frequency-cepstral-coefficients-mfccs/)

In [39]:
def preprocess(dataset_path, json_path, num_mfcc=13, n_fft=2048, hop_length=512, num_segment=5):
  #We then extract the MFCC features for each of that segment and append it to the dictionary under the genre name.
  data = {
      "mapping": [],
      "labels": [],
      "mfcc": []
  }

  #samples_per_segment = ((sample_rate*30)/number of segents)
  samples_per_segment = int(samples_per_track / num_segment) 
  
  #number of vectors using ceiling or greatest integer function
  num_mfcc_vectors_per_segment = math.ceil(samples_per_segment / hop_length)

  #Looping
  for i,(dirpath,dirnames,filenames) in enumerate(os.walk(dataset_path)):

    #check if the path is genres
    if dirpath != dataset_path:

      #add all the labels
      label = str(dirpath).split('\\')[-1]
      data["mapping"].append(label)
      print("\nInside", label)

      #Going through each track within a created label
      for f in filenames:
        
        #Address compatible with Windows OS
        file_path = dataset_path+"/"+str(label)+"/"+ str(f)

        #
        y, sr = librosa.load(file_path, sr = sample_rate) 

        #Cutting each song into 5 segments(num_segments)
        for n in range(num_segment):

          #for 1 segment multiply with 1; similarly go on till 5
          start = samples_per_segment*n 

          #finish = (samples_per_segment)(i+1)
          finish  = start + samples_per_segment

          #define mfcc; print(start, finish)
          mfcc = librosa.feature.mfcc(y[start:finish], sample_rate, n_mfcc=num_mfcc, n_fft = n_fft, hop_length = hop_length)
          mfcc = mfcc.T #259*13

          #If length of mfcc is equal to total number of mfcc vectors we defined earlier
          if len(mfcc) == num_mfcc_vectors_per_segment:

            #append the following in the dictionary
            data["mfcc"].append(mfcc.tolist())
            data["labels"].append(i-1)
            
            #Print out the track name
            print("Track Name :", file_path, n+1)
        
  #Dump all the processed data into the json file using Write function(w)
  with open(json_path, "w") as fp:
    json.dump(data, fp, indent = 4)

#The driving/ main function
if __name__ == "__main__":
  preprocess(dataset_path, json_path, num_segment=10)

The above script will create segments and extract features and dump the features into data_json.json file.

**Training the model**
We use the LSTM Recurrent neural networks [Link to Learn about LSTM model](https://courses.cognitiveclass.ai/courses/course-v1:BigDataUniversity+ML0120EN+v2/courseware/bd64ccdf56ad4ea1afe870e26d583038/8f960392dacc48bebb8230b9efad3f8b/?activate_block_id=block-v1%3ABigDataUniversity%2BML0120EN%2Bv2%2Btype%40sequential%2Bblock%408f960392dacc48bebb8230b9efad3f8b).  But before we build the model, we have to load the model into our program and split it into training and testing.

In [40]:
#import required modules
import tensorflow as tf
from sklearn.model_selection import train_test_split 
import numpy as np
import matplotlib.pyplot as plt


data_path = "data_json"

#Load the data
def load_data(data_path):
  print("Data Loading\n ")
  with open(data_path, "r") as fp:
    data = json.load(fp)

  #define the dependent and the independent variables
  #mfcc being the independent variable:
  x = np.array(data["mfcc"])

  #labels being the dependent variable:
  y = np.array(data["labels"])

  #This code snippet tells us that we are classifying labels/genres using their mfcc values
  print("Loaded data:")
  return x,y

Now that we have loaded the data and labeled it accordingly, we now go onto split it into train and test datasets

In [41]:
def prepare_datasets(test_size, val_size):

  #load
  x, y = load_data(data_path)
  #split
  x_train, x_test, y_train, y_test = train_test_split(x,y, test_size= test_size, random_state=42)

  #splitting the training set further:
  x_train, x_val, x_train, y_val = train_test_split(x_train, y_train, test_size = val_size, random_state = 42)

  #return the splitted dataset
  return x_train, y_train, x_test, y_test , x_val, y_val


Now we get into the Model building part.  the LSTM network is created using tensorflow. Here, we have created an LSTM network of 4 layers, including two hidden layers. The following code snippet shows the network creation.

In [42]:
def build_model(input_shape):
  # We use a sequential model since the data i.e, Sound is sequential in nature.
  model = tf.keras.Sequential()

  #Building the layers of the neural network:
  model.add(tf.keras.layers.LSTM(64, input_shape = input_shape, return_sequences=True))
  model.add(tf.keras.layers.LSTM(64))

  #Define the density and activation function for the meural network:
  model.add(tf.keras.layers.Dense(64, activation="relu"))

  #add another activation function to the end of the NN, which gives out the probability.
  model.add(tf.keras.layers.Dense(10, activation="Softmax"))
  return model

Let us now write the driving function or the main function

In [43]:
if __name__ == "__main__":
  #call the prepare_dataset function with test_size = 0.25 and val_size = 0.2:
  x_train, x_test, x_val, y_train, y_test, y_val = prepare_datasets(0.25, 0.2)

  print(x_train.shape[0])
  input_shape = (x_train.shape[1], x_train.shape[2])
  model = build_model(input_shape)

  #compile the model with teh learning rate as 0.001
  optimiser = tf.keras.optimizers.Adam(learning_rate=0.001)

  #use cross_entropy as the loss function and accuracy as the testing metrics
  model.compile(optimizer = optimiser, loss = 'sparse_categorical_cross_entropy', metrics = ['accuracy'])
   
  #summarize
  model.summary()

  #curve fitting
  #model.fit(x_train, y_train, validation_data=(x_val, y_val), batch_size=32, epochs=50)
  #model.save("model_RNN_LSTM.h5")
  #print("Saved model to disk")

Data Loading
 
Loaded data:


ValueError: ignored