# LSTM Model for three langauages
This part contains the attempts I made for LSTMs in three languages. The data set is the same one I used for 5, but I removed two languages from the csv sheet I had saved everything in and left only 3. I tried various different models, each having different levels of effectiveness, the details of which I will denote later.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import librosa
import os
import csv
# Preprocessing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from keras.utils import np_utils

import tensorflow as tf
import keras
from tensorflow_core.keras import Model
from tensorflow_core.python.keras.layers import Input, Dense, GRU, LSTM, Dropout, Bidirectional
from tensorflow_core.keras import optimizers
from tensorflow_core.keras.models import load_model

Using TensorFlow backend.


The following code snippets is to read the datasets thst we have and remove the first filename column so that it becomes a proper array we can use later.

In [17]:
data = pd.read_csv('NEWts3lang.csv')
data.head()

Unnamed: 0,filename,seq1,seq2,seq3,seq4,seq5,seq6,seq7,seq8,seq9,...,seq79,seq80,seq81,seq82,seq83,seq84,seq85,seq86,seq87,label
0,common_voice_tr_17341269.mp3,-28.671259,-28.671259,-28.671259,-28.594355,-23.302088,-21.683058,-22.529316,-20.512342,-19.944502,...,-8.218327,-7.162225,-7.152306,-7.921572,-8.872492,-10.029106,-8.932535,-8.830759,-8.814616,Turkish
1,common_voice_tr_17341270.mp3,-18.675406,-12.77542,-10.191453,-9.995229,-9.822378,-10.787068,-11.464541,-11.791995,-12.880933,...,-5.941443,-5.437989,-4.313529,-3.59451,-4.578447,-4.841236,-6.277249,-7.31436,-6.984949,Turkish
2,common_voice_tr_17341271.mp3,-29.168598,-28.44541,-25.697535,-21.021667,-18.727261,-19.127209,-20.424198,-18.682182,-16.259388,...,-6.50146,-4.253098,-3.48328,-3.627386,-5.621589,-7.772541,-10.60848,-10.832165,-9.111898,Turkish
3,common_voice_tr_17341278.mp3,-35.198715,-35.198715,-35.198715,-35.198715,-35.198715,-35.198715,-35.198715,-34.657932,-32.157036,...,-35.465096,-34.565926,-34.064175,-25.434107,-18.991602,-16.503279,-16.020468,-17.432213,-17.578808,Turkish
4,common_voice_tr_17341279.mp3,-32.834057,-32.834057,-32.834057,-32.834057,-32.834057,-32.834057,-32.834057,-32.834057,-32.834057,...,-21.73634,-19.915949,-13.989322,-12.898703,-16.812244,-17.894398,-15.678299,-14.883287,-15.698987,Turkish


In [18]:
# Dropping unneccesary columns
data = data.drop(['filename'],axis=1)

In [19]:
data.head()

Unnamed: 0,seq1,seq2,seq3,seq4,seq5,seq6,seq7,seq8,seq9,seq10,...,seq79,seq80,seq81,seq82,seq83,seq84,seq85,seq86,seq87,label
0,-28.671259,-28.671259,-28.671259,-28.594355,-23.302088,-21.683058,-22.529316,-20.512342,-19.944502,-19.473763,...,-8.218327,-7.162225,-7.152306,-7.921572,-8.872492,-10.029106,-8.932535,-8.830759,-8.814616,Turkish
1,-18.675406,-12.77542,-10.191453,-9.995229,-9.822378,-10.787068,-11.464541,-11.791995,-12.880933,-12.484132,...,-5.941443,-5.437989,-4.313529,-3.59451,-4.578447,-4.841236,-6.277249,-7.31436,-6.984949,Turkish
2,-29.168598,-28.44541,-25.697535,-21.021667,-18.727261,-19.127209,-20.424198,-18.682182,-16.259388,-15.557657,...,-6.50146,-4.253098,-3.48328,-3.627386,-5.621589,-7.772541,-10.60848,-10.832165,-9.111898,Turkish
3,-35.198715,-35.198715,-35.198715,-35.198715,-35.198715,-35.198715,-35.198715,-34.657932,-32.157036,-28.223465,...,-35.465096,-34.565926,-34.064175,-25.434107,-18.991602,-16.503279,-16.020468,-17.432213,-17.578808,Turkish
4,-32.834057,-32.834057,-32.834057,-32.834057,-32.834057,-32.834057,-32.834057,-32.834057,-32.834057,-32.834057,...,-21.73634,-19.915949,-13.989322,-12.898703,-16.812244,-17.894398,-15.678299,-14.883287,-15.698987,Turkish


The below code is to encode the labels into numbers, where each language is represented by a number for the classification algorithm to begin working on it. The encoder encodes the langauges in alphabetical order.
After this, the labels are converted into a form that uses a categorical crossentropy loss function, where each sample of the labels have three data points, and 0 if it is not the language, and 1 if it is.

In [20]:
language_list = data.iloc[:, -1]
language_list

0       Turkish
1       Turkish
2       Turkish
3       Turkish
4       Turkish
         ...   
8995    Swedish
8996    Swedish
8997    Swedish
8998    Swedish
8999    Swedish
Name: label, Length: 9000, dtype: object

In [21]:
encoder = LabelEncoder()
y = encoder.fit_transform(language_list)
print(y)

[2 2 2 ... 1 1 1]


In [22]:
encoder = LabelEncoder()
encoder.fit(y)
encoded_Y = encoder.transform(y)
# convert integers to dummy variables (i.e. one hot encoded)
dummy_y = np_utils.to_categorical(encoded_Y)

The next code scales the data in X and then separates it into train and test data along with the labels. A ratio of 0.2 is used for this purpose, as it is the common ratio used for train and test data.

In [24]:
scaler = StandardScaler()
X = scaler.fit_transform(np.array(data.iloc[:, :-1], dtype = float))

In [25]:
X_train, X_test, y_train, y_test = train_test_split(X, dummy_y, test_size=0.2)

This was to reshape the data to be able to pass it through the LSTM. The first number should contain the number of samples that were going to be passed through, the second one should be the number of time stamps taken, and the third one is 1 to reflect one feature that has 87 timestamps for each MFCC.

In [27]:
X_train = X_train.reshape(7200,87,1)
X_test = X_test.reshape(1800,87,1)
y_train = y_train.reshape(7200,3)
y_test = y_test.reshape(1800,3)

Now, the model.
The final model with the parameters obtained is given below, which gave around 77% accuracy on three languages on the test data that was used. Some details of other models that I tried will be explained.

## Attempt 1:With mean of MFCCs
This was the attempt I did with the wrong shape of data; however it seemed to have a high accuracy even for the test data. On further test data, it failed.

| LSTM layers | Dense layers | Dropout values | Accuracy | Test Accuracy |
| :- | :- | :- | :- | :- |
| Two LSTM layers, 20 and 10 units each | Two Dense layers, 100 units and final layer with 3 units | 0.2 | 0.82 | 0.79(but only 0.32 on further test data) |

## Attempt 2:With data size 50; taken in parts from samples
This was with proper data but might not have taken enough samples.
(wherever not mentioned, tanh activation is used)

| LSTM layers | Dense layers | Dropout values | Accuracy | Test Accuracy |
| :- | :- | :- | :- | :- |
| One bidirectional layer, 100 units | Two Dense layers, 100 units and final layer with 3 units | 0.3 | 0.75 | 0.66 |
| One bidirectional layer, 50 units, | Two Dense layers, 100 units and final layer with 3 units | 0.3 | 0.69 | 0.66 |
| One bidirectional layer, 50 units, One bidirectional layer, 25 units | Two Dense layers, 100 units(relu) and final layer with 3 units | 0.3 | 0.73 | 0.7 |
| One bidirectional layer, 50 units, One bidirectional layer, 25 units | Two Dense layers, 200 units and final layer with 3 units | 0.3 | 0.72 | 0.69 |

At this point I decided to try with the other data set, as I thought maybe the samples were too short and/or there weren't enough audio files sampled.

## Attempt 3:with data size 87; taken in parts from samples

| LSTM layers | Dense layers | Dropout values | Accuracy | Test Accuracy |
| :- | :- | :- | :- | :- |
| One bidirectional layer, 87 units, One bidirectional layer, 43 units | Two Dense layers, 100(relu) units and final layer with 3 units | 0.3 | 0.77 | 0.73 |
| One bidirectional layer, 87 units, One bidirectional layer, 43 units | Two Dense layers, 100(relu) units and final layer with 3 units | 0.4 | 0.79 | 0.76 |
| One bidirectional layer, 87 units, One bidirectional layer, 43 units | Two Dense layers, 100(relu) units and final layer with 3 units | 0.4 (rec dropout=0.2)| 0.77 | 0.78 |

These are all the various different models attempted with their results compiled.

In [35]:
optimizer = optimizers.Adam(decay=1e-4)
main_input = Input(shape=(87,1), name='main_input')
layer1 = Bidirectional(LSTM(87, return_sequences=True, name='layer1', recurrent_dropout=0.2))(main_input)
layer2 = Dropout(0.4)(layer1)
layer3 = Bidirectional(LSTM(43, return_sequences=False, name='layer2', recurrent_dropout=0.2))(layer2)
layer4 = Dropout(0.4)(layer3)
layer5 = Dense(100, activation='relu', name='layer3')(layer4)
layer6 = Dropout(0.4)(layer5)
rnn_output = Dense(3, activation='softmax', name='rnn_output')(layer6)

model = Model(inputs=main_input, outputs=rnn_output)
print('\nCompiling model...')
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=['acc'])
model.summary()
history = model.fit(X_train, y_train, batch_size=32, epochs=70, validation_data=(X_test, y_test), shuffle=True, verbose=1)



Compiling model...
Model: "model_8"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
main_input (InputLayer)      [(None, 87, 1)]           0         
_________________________________________________________________
bidirectional_16 (Bidirectio (None, 87, 174)           61944     
_________________________________________________________________
dropout_24 (Dropout)         (None, 87, 174)           0         
_________________________________________________________________
bidirectional_17 (Bidirectio (None, 86)                74992     
_________________________________________________________________
dropout_25 (Dropout)         (None, 86)                0         
_________________________________________________________________
layer3 (Dense)               (None, 100)               8700      
_________________________________________________________________
dropout_26 (Dropout)         (None, 100

KeyboardInterrupt: 

In [52]:
model_json = model.to_json()
with open("model.json", "w") as json_file :
	json_file.write(model_json)

model.save_weights("model.h5")
print("Saved model to disk")

model.save('LSTM.model')

Saved model to disk


INFO:tensorflow:Assets written to: LSTM.model\assets


This final code snippet was used by me to plot the model on the basis of history and figure out how well the model is training by looking at the graph of the model.

In [2]:
print(history.history.keys())
plt.figure(1)
plt.plot(history.history['loss'])
plt.plot(history.history['acc'])
plt.plot(history.history['val_loss'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['loss', 'accuracy', 'val_loss', 'val_acc'], loc='upper left')

NameError: name 'history' is not defined