##My first kernel. Here, I am only going to focus on the classification task. Visualization will be for another day. Since the data is all clean and perfect, I will cut straight to the task.

Let's Import all the stuff we need.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.svm import SVC

Let's read the data from csv and display the first five rows.

In [None]:
df = pd.read_csv('../input/voice.csv')
df.head()

Checking out the info, all the data are of similar datatype, no need for conversions.

In [None]:
df.info()

Splitting the data into train and test data. 30% of the data to the end of the whole dataset will be used for testing.

In [None]:
X = df.drop('label', axis = 1)
y = df['label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 45)

Model the classifier with the default parameters C=1.0, kernel='rbf', gamma='auto'

In [None]:
model = SVC()

In [None]:
model.fit(X_train, y_train)

In [None]:
prediction = model.predict(X_test)

In [None]:
print(classification_report(y_test,prediction))
print(confusion_matrix(y_test, prediction))

Ouch!. The average the precision of the model is around 73%.  And the confusion matrix gave values for TruePositive = 371, TrueNegative = 314, FalsePositive = 172, FalseNegative = 94
Looks like the relatively low accuracy was due to the parameters given. Let's try GridSearchCV from sklearn to find the optimal parameters for C, gamma and kernel from a given set of values to improve our accuracy.

In [None]:
param_grid = {'C':[1,10,100,1000],'gamma':[1,0.1,0.001,0.0001], 'kernel':['linear','rbf']}

In [None]:
grid = GridSearchCV(SVC(),param_grid,refit = True, verbose=2)

In [None]:
grid.fit(X_train,y_train)

Lets see what the best parameters are from the given list of parameters. Note that this can take quite a lot of time depending on the size of the dataset , the number of parameters and the specs of your computer. It took around 15.9 minutes on mine.

In [None]:
grid.best_params_

In [None]:
predic = grid.predict(X_test)

In [None]:
print(classification_report(y_test,predic))
print(confusion_matrix(y_test, predic))

The final result shows an immense improvement in the accuracy.

##To Conclude
I have only used a few parameters, values of C and gamma can take up a lot of values, and there is one more 'polynomial' kernel which I have ommitted. This notebook will be reinforced with more details and analysis in the future version