# Gender Classification Task

Starting from a dataset with all the ids of the speakers and the related audio tracks, we want to make a prediction about the gender of the speakers. To do this, we will proceed by steps and in particular we will:

1. Load all the audio files of the dataset and extract the related Mel Frequency Cepstrum Coefficients (MFCCs) 
2. Apply different Machine Learning Algorithms to make predictions about the gender of each speaker
3. Evaluate the performance of each algorithm

In [118]:
#Import Libraries

import os
import numpy as np
import pandas as pd
import librosa
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn import svm
from sklearn.linear_model import Perceptron
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import LabelEncoder
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Activation, Dense, Dropout, Conv2D, Flatten, MaxPooling2D

In [119]:
#Function to make all MFCCs have the same number of columns (add zeros to the end of each MFCC to increase its length
#and make it equal to the MFCCs which have the maximum number of columns)
#This tecnique is also know as zero padding

def pad_to_length(x, m):
    return np.pad(x,((0, 0), (0, m - x.shape[1])), mode = 'constant')

In [120]:
#Loading of audio files and extraction of MFCCs

dataset_path = 'LibriSpeech/dev-clean'    #definition of the path to find all the audio files

data1= {'ID SPEAKER' : [],                #inizialitation of the data structure which contains the data to be found
        'MFCCs' : []
       }

max_length = 1021

for path, subdirs, filenames in os.walk(dataset_path):  #scroll through all subdirectories and find audio files
    for names in filenames:
        if path is not dataset_path:
            if names.endswith('.flac'):
                
                speaker_path = path.split('/')                
                speaker_id = speaker_path[2]                              #find the list of all the speakers
                audio_path = os.path.join(path,names)                     
                audio, sr = librosa.load(audio_path, sr=16000)            #loading audio files
                mfccs = librosa.feature.mfcc(y=audio, sr=sr, n_mfcc=40)   #MFCCs extraction from all audio files 
                mfccs_pad = pad_to_length(mfccs, max_length)              #zero padding of MFCCs
                data1['ID SPEAKER'].append(speaker_id)                    #adding the found data to the data structure
                data1['MFCCs'].append(mfccs_pad)
                
#Definition of a dataframe that contains two columns: the first will contain all the audio file names (recognized as 
#the id of the speakers), the second will contain the MFCCs associated to each audio file
df1 = pd.DataFrame(data1, columns = ['ID SPEAKER', 'MFCCs'])    
df1



























































































































Unnamed: 0,ID SPEAKER,MFCCs
0,2412,"[[-635.7891, -635.67395, -635.6435, -635.66394..."
1,2412,"[[-637.5279, -637.41113, -637.30176, -637.3078..."
2,2412,"[[-664.281, -664.2972, -662.12994, -660.398, -..."
3,2412,"[[-649.1268, -648.9596, -648.921, -648.9577, -..."
4,2412,"[[-668.34106, -667.13873, -668.30634, -670.508..."
...,...,...
2698,2035,"[[-491.24396, -471.1516, -495.77585, -504.4165..."
2699,2035,"[[-480.8176, -482.12555, -483.3307, -483.97528..."
2700,2035,"[[-547.5311, -539.7334, -533.7997, -530.9519, ..."
2701,2035,"[[-468.6278, -405.07297, -371.62656, -407.3990..."


In [None]:
#The following code lines must only be run as a demonstration that max_length (the maximum length of the audio 
#frames) is equal to 1021. We can try these lines after deleting max_length = 1021, mfccs_pad in the code above 
#and considering data1['MFCCs'].append(mfccs). We run the above lines with these changes and then execute the
#lines in this box. 

frames = [] 
for i in range(len(df1['MFCCs'])): 
    frame = df1['MFCCs'].iloc[i] 
    frame_shape = len(frame[0])
    frames.append(frame_shape) 
    max_length = max(frames)

In [121]:
#Load SPEAKER.txt 

#This file tells us all the information related to the speakers (id, gender, duration of speech and name)
#The idea is to return a boolean variable to see if the speaker is a male or a female, so we are going 
#to create a second dataframe with two columns: the first will contain the id of the speakers, the second the gender 
#(0 if male, 1 if female)

speakers_path = 'LibriSpeech/SPEAKERS.TXT'              #definition of the path in order to load SPEAKER.TXT

data2 = {'ID SPEAKER': [],                              #inizialitation of the data structure
        'GENDER': []
        }

f = open(speakers_path, 'r')                            #open and read the document
document = f.readlines()

for line in document:                                   #scroll down each line of SPEAKER.txt 
    if 'dev-clean' in line:                             #we need only the lines related to the dev-clean corpus
        speaker_id = line.split('|')[0].strip()         #find the list of all speakers and their gender
        speaker_gender = line.split('|')[1].strip()
        
        data2['ID SPEAKER'].append(speaker_id)          #adding the found data to the data structure   
        data2['GENDER'].append(speaker_gender)
        
        
#Definition of the second dataframe and replacement of a 0 if the speaker is male, 1 if the speaker is female
df2 = pd.DataFrame(data2, columns= ['ID SPEAKER','GENDER'])
df2['GENDER'].replace({'M': 0, 'F': 1}, inplace=True)
df2

Unnamed: 0,ID SPEAKER,GENDER
0,84,1
1,174,0
2,251,0
3,422,0
4,652,0
5,777,0
6,1272,0
7,1462,1
8,1673,1
9,1919,1


In [122]:
#Merge the two dataframes according to the column with the id of the speakers to find a unique dataframe with all the
#necessary data

df = pd.merge(df1, df2, on = 'ID SPEAKER')
df

Unnamed: 0,ID SPEAKER,MFCCs,GENDER
0,2412,"[[-635.7891, -635.67395, -635.6435, -635.66394...",1
1,2412,"[[-637.5279, -637.41113, -637.30176, -637.3078...",1
2,2412,"[[-664.281, -664.2972, -662.12994, -660.398, -...",1
3,2412,"[[-649.1268, -648.9596, -648.921, -648.9577, -...",1
4,2412,"[[-668.34106, -667.13873, -668.30634, -670.508...",1
...,...,...,...
2698,2035,"[[-491.24396, -471.1516, -495.77585, -504.4165...",1
2699,2035,"[[-480.8176, -482.12555, -483.3307, -483.97528...",1
2700,2035,"[[-547.5311, -539.7334, -533.7997, -530.9519, ...",1
2701,2035,"[[-468.6278, -405.07297, -371.62656, -407.3990...",1


Therefore, we have obtained a dataframe consisting of three columns in which:

* The first contains the ids of the speakers (they are repeated because they represent all the audio files associated with each speaker)
* The second contains the MFCCs extracted related to each audio file
* The third contains the gender of each speaker

To divide the dataset into the train and test set, we first divide the dataframe according to the gender of the speakers, analyze the two new dataframes obtained by grouping them according to the speaker id and apply the train and test on both, in order to get the train and test divided by male and female. Then we merge the two train dataframes into a single dataframe and do the same for the test dataframes. Thus we get the train and the test set. The first will contain 80% of the data and the second the remaining 20%. 

NB: the audio files with the corresponding MFCCs present in the train set are not contained in the test set.

In [123]:
df_male = df.loc[df['GENDER'] == 0]
df_female = df.loc[df['GENDER'] == 1]

train_male = df_male.reset_index().groupby('ID SPEAKER').apply(lambda x: x.sample(frac=0.8)).reset_index(drop=True).set_index('index') 
test_male = df_male.drop(train_male.index) 

train_female = df_female.reset_index().groupby('ID SPEAKER').apply(lambda x: x.sample(frac=0.8)).reset_index(drop=True).set_index('index') 
test_female = df_female.drop(train_female.index) 

train = train_male.append(train_female, ignore_index=True, sort=False)
test = test_male.append(test_female, ignore_index=True, sort=False)

Now we divide the data of the train and test set according to an X which is the independent variable and represents the MFCCs and an y which is the dependent variable and represents the gender of the speakers (because the gender of the speakers depends on the MFCCs extracted and based on the MFCCs we can then determine whether the speaker is male or female).
In particular, we are going to transform X and y into numpy arrays, however first averaging the MFCCs because each of them is a two-dimensional array and we want to average on the second dimension.

In [138]:
X_train = [np.mean(train['MFCCs'].iloc[i], axis=1) for i in range(len(train))]
X_train = np.array(X_train)
X_test = [np.mean(test['MFCCs'].iloc[i], axis=1) for i in range(len(test))]
X_test = np.array(X_test)

y_train = np.array(train['GENDER'])
y_test = np.array(test['GENDER'])

## Machine Learning Algorithms

Now that we have all the necessary data, we can evaluate different machine learning algorithms to predict the gender of the speakers based on each audio file. In particular, we will evaluate the accuracy of the prediction for each algorithm and draw the necessary conclusions.

#### Naive Bayes (NB)

The Naive Bayes is a simple algorithm that uses Bayes' theorem to perform classification. The peculiarity of this algorithm is that it assumes that each feature makes an independent contribution to the target class. In simple terms, all the features are indipendent of each other. It is highly recommended when dealing with large amount of data because it is simple and fast.

In [125]:
#Naive Bayes (NB)

nb = GaussianNB()                                              #create the NB
nb_fit = nb.fit(X_train, y_train)                              #fit the model according to X, y
y_pred_nb = nb.predict(X_test)                                 #predict the output


#Evaluation of the model
#The Confusion Matrix is a table that displays and compares actual values of the model with its predicted values
cm_nb = confusion_matrix(y_test, y_pred_nb)                              
accuracy_nb = accuracy_score(y_test, y_pred_nb)
report_nb = classification_report(y_test, y_pred_nb)

print("Naive Bayes\n")  
print("Confusion Matrix:\n", cm_nb) 
print("\nAccuracy on test set:", accuracy_nb) 
print("\nClassification Report:\n", report_nb)

Naive Bayes

Confusion Matrix:
 [[216  50]
 [  6 268]]

Accuracy on test set: 0.8962962962962963

Classification Report:
               precision    recall  f1-score   support

           0       0.97      0.81      0.89       266
           1       0.84      0.98      0.91       274

    accuracy                           0.90       540
   macro avg       0.91      0.90      0.90       540
weighted avg       0.91      0.90      0.90       540



#### Decision Tree

The Decision Tree is an algorithm that uses decision rules taken from the train set to make prediction of the target class. Its name is due to the fact that we start from the root of a tree to make our predictions. The algorithm is implemented until reaching the target class which is represented by the leaves of the tree.

In [126]:
#Decision Tree 

tree = DecisionTreeClassifier()                       #create the Decision Tree
tree_fit = tree.fit(X_train, y_train)                 #fit the model according to X, y
y_pred_tree = tree.predict(X_test)                    #predict the output


#Evaluation of the model
cm_tree = confusion_matrix(y_test, y_pred_tree)
accuracy_tree = accuracy_score(y_test, y_pred_tree)
report_tree = classification_report(y_test, y_pred_tree)

print("Decision Tree\n")
print("Confusion Matrix:\n", cm_tree) 
print("\nAccuracy on test set:", accuracy_tree) 
print("\nClassification Report:\n", report_tree)

Decision Tree

Confusion Matrix:
 [[241  25]
 [ 24 250]]

Accuracy on test set: 0.9092592592592592

Classification Report:
               precision    recall  f1-score   support

           0       0.91      0.91      0.91       266
           1       0.91      0.91      0.91       274

    accuracy                           0.91       540
   macro avg       0.91      0.91      0.91       540
weighted avg       0.91      0.91      0.91       540



#### Random Forest 

The Random Forest algorithm is an amplified version of the Decision Tree as it consists of a set of Decision Trees that together form a forest. Every single tree in the forest makes a prediction and once all the predictions are obtained, the Random Forest combines them together to get a more accurate prediction.

In [127]:
#Random Forest 

forest = RandomForestClassifier(n_estimators=20)                     #create the Random Forest
forest_fit = forest.fit(X_train, y_train)                            #fit the model according to X, y
y_pred_forest = forest.predict(X_test)                               #predict the output


#Evaluation of the model
cm_forest = confusion_matrix(y_test, y_pred_forest)
accuracy_forest = accuracy_score(y_test, y_pred_forest)
report_forest = classification_report(y_test, y_pred_forest)

print("Random Forest\n")
print("Confusion Matrix:\n", cm_forest) 
print("\nAccuracy on test set:", accuracy_forest) 
print("\nClassification Report:\n", report_forest)

Random Forest

Confusion Matrix:
 [[258   8]
 [  4 270]]

Accuracy on test set: 0.9777777777777777

Classification Report:
               precision    recall  f1-score   support

           0       0.98      0.97      0.98       266
           1       0.97      0.99      0.98       274

    accuracy                           0.98       540
   macro avg       0.98      0.98      0.98       540
weighted avg       0.98      0.98      0.98       540



#### K-Nearest Neighbors (KNN)

The KNN algorithm takes the data stored in the train set and classifies them into classes, then estimates the probability that a given data is a member of one class or the other depending on which class the nearest data points are in. K is the number of nearest neighbors to include in the prediction process.

In [128]:
#K-Nearest Neighbors (KNN)

knn = KNeighborsClassifier(n_neighbors=5)              #create the K-Nearest Neighbors
knn_fit = knn.fit(X_train, y_train)                    #fit the model according to X, y
y_pred_knn = knn.predict(X_test)                       #predict the output


#Evaluation of the model
cm_knn = confusion_matrix(y_test, y_pred_knn)
accuracy_knn = accuracy_score(y_test, y_pred_knn)
report_knn = classification_report(y_test, y_pred_knn)

print("K-Nearest Neighbour\n")
print("Confusion Matrix:\n", cm_knn) 
print("\nAccuracy on test set:", accuracy_knn) 
print("\nClassification Report:\n", report_knn)

K-Nearest Neighbour

Confusion Matrix:
 [[243  23]
 [ 22 252]]

Accuracy on test set: 0.9166666666666666

Classification Report:
               precision    recall  f1-score   support

           0       0.92      0.91      0.92       266
           1       0.92      0.92      0.92       274

    accuracy                           0.92       540
   macro avg       0.92      0.92      0.92       540
weighted avg       0.92      0.92      0.92       540



#### Logistic Regression 

The Logistic Regression algorithm is used when the label is categorical, so it is perfect for gender prediction. It determines whether the independent variable, in this particular case the MFCCs, has an effect on the dependent one, which is represented by the gender of the speakers.

In [129]:
#Logistic Regression 

logr = LogisticRegression(max_iter=100)                #create the Logistic Regression
logr_fit = logr.fit(X_train, y_train)                  #fit the model according to X, y
y_pred_logr = logr.predict(X_test)                     #predict the output


#Evaluation of the model
cm_logr = confusion_matrix(y_test, y_pred_logr)
accuracy_logr = accuracy_score(y_test, y_pred_logr)
report_logr = classification_report(y_test, y_pred_logr)

print("LOgistic Regression\n")
print("Confusion Matrix:\n", cm_logr) 
print("\nAccuracy on test set:", accuracy_logr) 
print("\nClassification Report:\n", report_logr)

LOgistic Regression

Confusion Matrix:
 [[258   8]
 [  7 267]]

Accuracy on test set: 0.9722222222222222

Classification Report:
               precision    recall  f1-score   support

           0       0.97      0.97      0.97       266
           1       0.97      0.97      0.97       274

    accuracy                           0.97       540
   macro avg       0.97      0.97      0.97       540
weighted avg       0.97      0.97      0.97       540



STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


#### Support Vector Machine (SVM)

The SVM algorithm examines data and sorts it into one of two categories. It separates two categories by choosing hyperplanes and the goal is to find the plane with the maximum margin.

In [130]:
#Support Vector Machine (SVM)

sv = svm.SVC()                                         #create the Support Vector Machine
sv_fit = sv.fit(X_train, y_train)                      #fit the model according to X, y
y_pred_sv = sv.predict(X_test)                         #predict the output


#Evaluation of the model
cm_sv = confusion_matrix(y_test, y_pred_sv)
accuracy_sv = accuracy_score(y_test, y_pred_sv)
report_sv = classification_report(y_test, y_pred_sv)

print("Support Vector Machine\n")
print("Confusion Matrix:\n", cm_sv) 
print("\nAccuracy on test set:", accuracy_sv) 
print("\nClassification Report:\n", report_sv)

Support Vector Machine

Confusion Matrix:
 [[230  36]
 [ 17 257]]

Accuracy on test set: 0.9018518518518519

Classification Report:
               precision    recall  f1-score   support

           0       0.93      0.86      0.90       266
           1       0.88      0.94      0.91       274

    accuracy                           0.90       540
   macro avg       0.90      0.90      0.90       540
weighted avg       0.90      0.90      0.90       540



#### Perceptron 

A Perceptron is the building block of artificial neural networks. The Perceptron algorithm is the simplest neural network because it consists of only one neuron. It starts with a series of inputs to find an output after applying the weighted sum to the inputs. 

In [142]:
#Perceptron 

pct = Perceptron()                                     #create the Perceptron
pct_fit = pct.fit(X_train, y_train)                    #fit the model according to X, y
y_pred_pct = pct.predict(X_test)                       #predict the output


#Evaluation of the model
cm_pct = confusion_matrix(y_test, y_pred_pct)
accuracy_pct = accuracy_score(y_test, y_pred_pct)
report_pct = classification_report(y_test, y_pred_pct)

print("Perceptron\n")
print("Confusion Matrix:\n", cm_pct) 
print("\nAccuracy on test set:", accuracy_pct) 
print("\nClassification Report:\n", report_pct)

Perceptron

Confusion Matrix:
 [[266   0]
 [115 159]]

Accuracy on test set: 0.7870370370370371

Classification Report:
               precision    recall  f1-score   support

           0       0.70      1.00      0.82       266
           1       1.00      0.58      0.73       274

    accuracy                           0.79       540
   macro avg       0.85      0.79      0.78       540
weighted avg       0.85      0.79      0.78       540



#### Multi-layer Perceptron (MLP)

The MLP algorithm is a more extended version of the Perceptron algorithm because it considers not one, but more than one layer. In simple words, it is made up of a combination of neurons.

In [132]:
#Multi-layer Perceptron (MLP)

mlp = MLPClassifier()                                   #create the Multi-layer Perceptron
mlp_fit = mlp.fit(X_train, y_train)                     #fit the model according to X, y
y_pred_mlp = mlp.predict(X_test)                        #predict the output


#Evaluation of the model
cm_mlp = confusion_matrix(y_test, y_pred_mlp)
accuracy_mlp = accuracy_score(y_test, y_pred_mlp)
report_mlp = classification_report(y_test, y_pred_mlp)

print("Multi-layer Perceptron\n")
print("Confusion Matrix:\n", cm_mlp) 
print("\nAccuracy on test set:", accuracy_mlp) 
print("\nClassification Report:\n", report_mlp)

Multi-layer Perceptron

Confusion Matrix:
 [[265   1]
 [  1 273]]

Accuracy on test set: 0.9962962962962963

Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00       266
           1       1.00      1.00      1.00       274

    accuracy                           1.00       540
   macro avg       1.00      1.00      1.00       540
weighted avg       1.00      1.00      1.00       540





### Deep Learning Model

After evaluating several Machine Learning algorithms, we want to consider a Deep Learning model. Deep Learning is a subset of Machine Learning and its algorithms try to behave like humans when taking decisions. Deep Learning is made up of a series of neural networks, just like those of the human brain.

#### Convolutional Neural Network (CNN)

The basic structure of the CNN is formed by several convolutional and pooling layers to then arrive at the completely connected level and the final result. The convolution, which uses several filters, is able to extract features from the dataset. The pooling is used to downscale the features obtained from the convolution operation.

In [133]:
#Convolutional Neural Network (CNN)

X_train = [np.array(train['MFCCs'].iloc[i]) for i in range(len(train))]
X_train = np.array(X_train)
X_test = [np.array(test['MFCCs'].iloc[i]) for i in range(len(test))]
X_test = np.array(X_test)

y_train = np.array(train['GENDER'])
y_test = np.array(test['GENDER'])


#Convert the dimension of X_train and X_test in order to be able to train the CNN
X_train = X_train.reshape(X_train.shape[0],40,1021,1)    
X_test = X_test.reshape(X_test.shape[0],40,1021,1)       


#Convert the class labels into one-hot encoding vectors (because the algorithm cannot work with categorical data but
#only with arrays of numbers
le = LabelEncoder()                                      
yy_train = to_categorical(le.fit_transform(y_train)) 
yy_test = to_categorical(le.fit_transform(y_test))
num_classes = yy_train.shape[1]


#Create the model using three convolutional layers with 32, 64 and 128 filters
#Dropout is necessary to prevent overfitting
model = Sequential()

model.add(Conv2D(32, 2, input_shape = (40,1021,1)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.2))

model.add(Conv2D(64, 2))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.2))

model.add(Conv2D(128, 2))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Dropout(0.2))


#Flattening and connection of levels
model.add(Flatten())
model.add(Dense(num_classes, activation='softmax'))


#Visualization of the layers created
model.summary()


#Compile the model 
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

batch_size = 32
epochs = 20

#Train the model for tot number of epochs
cnn_train = model.fit(X_train, yy_train, batch_size=batch_size, epochs=epochs, verbose=1, validation_data=(X_test, yy_test))

#Measure the accuracy of the CNN
accuracy_cnn = model.evaluate(X_test, yy_test, verbose=0)

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_3 (Conv2D)            (None, 39, 1020, 32)      160       
_________________________________________________________________
activation_3 (Activation)    (None, 39, 1020, 32)      0         
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 19, 510, 32)       0         
_________________________________________________________________
dropout_3 (Dropout)          (None, 19, 510, 32)       0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 18, 509, 64)       8256      
_________________________________________________________________
activation_4 (Activation)    (None, 18, 509, 64)       0         
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 9, 254, 64)       

In [143]:
print("Naive Bayes Accuracy: {:.3f}".format(accuracy_nb*100))
print("Decision Tree Accuracy: {:.3f}".format(accuracy_tree*100))
print("Random Forest Accuracy: {:.3f}".format(accuracy_forest*100))
print("K-Nearest Neighbor Accuracy: {:.3f}".format(accuracy_knn*100))
print("Logistic Regression Accuracy: {:.3f}".format(accuracy_logr*100))
print("Support Vector Machine Accuracy: {:.3f}".format(accuracy_sv*100))
print("Perceptron Accuracy: {:.3f}".format(accuracy_pct*100))
print("Multi-layer Perceptron Accuracy: {:.3f}".format(accuracy_mlp*100))
print("Convolutional Neural Network Accuracy: {:.3f}".format(accuracy_cnn[1]*100))

Naive Bayes Accuracy: 89.630
Decision Tree Accuracy: 90.926
Random Forest Accuracy: 97.778
K-Nearest Neighbor Accuracy: 91.667
Logistic Regression Accuracy: 97.222
Support Vector Machine Accuracy: 90.185
Perceptron Accuracy: 78.704
Multi-layer Perceptron Accuracy: 99.630
Convolutional Neural Network Accuracy: 99.074


## Evaluation of the performance

From the results obtained, it is clear that the best accuracy is obtained with the Multi-layer Perceptron and with the Convolutional Neural Network. However, we first evaluate the performance of the other algorithms taken into consideration, making comparisons regarding accuracy, and then arrive at the two best performances just mentioned.
Starting from Naive Bayes, we can observe a slightly lower accuracy than the other algorithms and this is probably due to the fact that NB is an extremely simple classifier and therefore a stronger algorithm is worth trying, even if it still has a very good accuracy.
Looking at the Decision Tree and the Random Forest, we can see better accuracy of the latter than the former. This result was expected as the Random Forest is formed by multiple Decision Trees and, as we have already expressed above, takes all the Decision Trees to have a more precise result.
As for the Logistic Regression, the accuracy is very high as we expected, being the Logistic Regression a very good algorithm when considering categorical labels. 
The K-Nearest Neighbor and the Support Vector Machine show a slightly higher accuracy than the Naive Bayes, both being two more complex algorithms than NB.
Then evaluating the accuracy of the Perceptron and the Multi-Layer Perceptron, we notice a great improvement of the second. This is due to the fact that the MLP is a set of perceptrons, i.e. it considers not only one neuron as the Perceptron algorithm, but a series of neurons and therefore is able to provide a better result.
In conclusion, the Convolutional Neural Network is able to provide an excellent prediction of the gender of the speakers and could be even better probably by increasing the number of epochs or considering higher convolutional layers.