<a href="https://colab.research.google.com/github/swaroopkasaraneni/DatasciencePython/blob/main/Supervisor2Case3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Since "The Star RJ" is a reality show, the time to select candidates is very short.  The whole success of the show and hence the profits depends upon quick and smooth execution


```
Fields in Data
•meanfreq: mean frequency (in kHz)
•sd: standard deviation of the frequency
•median: median frequency (in kHz)
•Q25: first quantile (in kHz)
•Q75: third quantile (in kHz)
•IQR: interquantile range (in kHz)
•skew: skewness (see note in specprop description)
•kurt: kurtosis (see note in specprop description)
•sp.ent: spectral entropy
•sfm: spectral flatness
•mode: mode frequency
•centroid: frequency centroid (see specprop)
•peakf: peak frequency (frequency with the highest energy)
•meanfun: average of fundamental frequency measured across the acoustic signal
•minfun: minimum fundamental frequency measured across the acoustic signal
•maxfun: maximum fundamental frequency measured across the acoustic signal
•meandom: average of dominant frequency measured across the acoustic signal
•mindom: minimum of dominant frequency measured across the acoustic signal
•maxdom: maximum of dominant frequency measured across the acoustic signal
•dfrange: range of dominant frequency measured across the acoustic signal
•modindx: modulation index. Calculated as the accumulated absolute difference between adjacent measurements of fundamental frequencies divided by the frequency range
•label: male or female

```

In [2]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

df = pd.read_csv("/content/voice-classification.csv")

# Check for missing values
#print(df.isnull().sum())

label_encoder = LabelEncoder()
df['label'] = label_encoder.fit_transform(df['label'])  # 'male' -> 0, 'female' -> 1
X = df.drop(columns=['label'])
y = df['label']

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

# Train an SVM with an RBF kernel
svm = SVC(kernel='rbf', random_state=42)
svm.fit(X_train, y_train)

y_pred = svm.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

report = classification_report(y_test, y_pred)
print("Classification Report:\n", report)

conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", conf_matrix)


Accuracy: 98.26%
Classification Report:
               precision    recall  f1-score   support

           0       0.98      0.99      0.98       297
           1       0.99      0.98      0.98       337

    accuracy                           0.98       634
   macro avg       0.98      0.98      0.98       634
weighted avg       0.98      0.98      0.98       634

Confusion Matrix:
 [[293   4]
 [  7 330]]


In [3]:
from sklearn.model_selection import GridSearchCV

param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [1, 0.1, 0.01, 0.001],
    'kernel': ['rbf']
}

grid_search = GridSearchCV(SVC(random_state=42), param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

best_params = grid_search.best_params_
best_score = grid_search.best_score_

print(f"Best Parameters: {best_params}")
print(f"Best Cross-Validation Accuracy: {best_score * 100:.2f}%")

best_model = grid_search.best_estimator_
y_pred_best = best_model.predict(X_test)
accuracy_best = accuracy_score(y_test, y_pred_best)
print(f"Test Accuracy with Best Parameters: {accuracy_best * 100:.2f}%")


Best Parameters: {'C': 10, 'gamma': 0.1, 'kernel': 'rbf'}
Best Cross-Validation Accuracy: 98.15%
Test Accuracy with Best Parameters: 98.42%
