# SVM Classifier for Voice classification

# Data Set: Voice Data Set

This database was created to identify a voice as male or female, based upon acoustic properties of the voice and speech. The dataset consists of 3800 recorded voice samples. The voice samples are pre-processed by acoustic analysis in R using the seewave and tuneR packages.

The following acoustic properties of each voice are measured and included within the CSV:

<ul>
<li><strong>meanfreq</strong>: mean frequency (in kHz)</li>
<li><strong>sd</strong>: standard deviation of frequency</li>
<li><strong>median</strong>: median frequency (in kHz)</li>
<li><strong>Q25</strong>: first quantile (in kHz)</li>
<li><strong>Q75</strong>: third quantile (in kHz)</li>
<li><strong>IQR</strong>: interquantile range (in kHz)</li>
<li><strong>skew</strong>: skewness (see note in specprop description)</li>
<li><strong>kurt</strong>: kurtosis (see note in specprop description)</li>
<li><strong>sp.ent</strong>: spectral entropy</li>
<li><strong>sfm</strong>: spectral flatness</li>
<li><strong>mode</strong>: mode frequency</li>
<li><strong>centroid</strong>: frequency centroid (see specprop)</li>
<li><strong>peakf</strong>: peak frequency (frequency with highest energy)</li>
<li><strong>meanfun</strong>: average of fundamental frequency measured across acoustic signal</li>
<li><strong>minfun</strong>: minimum fundamental frequency measured across acoustic signal</li>
<li><strong>maxfun</strong>: maximum fundamental frequency measured across acoustic signal</li>
<li><strong>meandom</strong>: average of dominant frequency measured across acoustic signal</li>
<li><strong>mindom</strong>: minimum of dominant frequency measured across acoustic signal</li>
<li><strong>maxdom</strong>: maximum of dominant frequency measured across acoustic signal</li>
<li><strong>dfrange</strong>: range of dominant frequency measured across acoustic signal</li>
<li><strong>modindx</strong>: modulation index. Calculated as the accumulated absolute difference between adjacent measurements of fundamental frequencies divided by the frequency range</li>
<li><strong>label</strong>: male or female</li>
</ul>

In [None]:
import pandas as pd

In [None]:
voice_df = pd.read_csv("voice-classification.csv")
voice_df.shape

In [None]:
voice_df.head(5)

In [None]:
voice_df.describe().T

In [None]:
voice_df["label"].value_counts()

In [None]:
voice_df.isnull().sum()

In [None]:
voice_df["label"]

In [None]:
X = voice_df.drop("label" , axis = 1)
y=voice_df["label"]

In [None]:
X

In [None]:
# Label encode the target column

from sklearn.preprocessing import LabelEncoder

LE= LabelEncoder()

y = LE.fit_transform(voice_df["label"])

In [None]:
print(y)

# Scale the dataset 

In [None]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X1 = sc.fit_transform(X)


In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train , y_test = train_test_split(X1, y, test_size = 0.25 , random_state = 21)

In [None]:
print("Shape of the training features is ", X_train.shape)
print("Shape of the training target is ", y_train.shape)
print("Shape of the testing features is ", X_test.shape)
print("Shape of the testing target is ", y_test.shape)

In [None]:
# Building the support vector classifier model

from sklearn.svm import SVC

svc_model = SVC()

svc_model.fit(X_train, y_train)

y_test_pred = svc_model.predict(X_test)

y_train_pred = svc_model.predict(X_train)


In [None]:
from sklearn.metrics import accuracy_score
print("The test data accuracy of the SVC model is ",accuracy_score(y_test , y_test_pred))

In [None]:
print("The train data accuracy of the SVC model is ",accuracy_score(y_train , y_train_pred))

# End of SVM