**Q1. Problem Statement: Grid Search**

Load the ‘voice.csv’ dataset into a DataFrame and perform the following tasks:
1.	Considering the ‘label’ column as the target variable, rename the column as ‘Gender_Identified’
2.	Using the preprocessing() function, label the target column
3.	Separate the target variable and the feature vectors
4.	Build a RandomForestClassifier model and find the best parameters using a Grid search
5.	Print the best parameters and the best estimator


**Step-1:** Importing the required libraries.

In [3]:
import pandas as pd
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

**Step-2:** Loading the CSV data into a DataFrame.

In [4]:
data=pd.read_csv('voice.csv')
data.head()

Unnamed: 0,meanfreq,sd,median,Q25,Q75,IQR,skew,kurt,sp.ent,sfm,...,centroid,meanfun,minfun,maxfun,meandom,mindom,maxdom,dfrange,modindx,label
0,0.059781,0.064241,0.032027,0.015071,0.090193,0.075122,12.863462,274.402906,0.893369,0.491918,...,0.059781,0.084279,0.015702,0.275862,0.007812,0.007812,0.007812,0.0,0.0,male
1,0.066009,0.06731,0.040229,0.019414,0.092666,0.073252,22.423285,634.613855,0.892193,0.513724,...,0.066009,0.107937,0.015826,0.25,0.009014,0.007812,0.054688,0.046875,0.052632,male
2,0.077316,0.083829,0.036718,0.008701,0.131908,0.123207,30.757155,1024.927705,0.846389,0.478905,...,0.077316,0.098706,0.015656,0.271186,0.00799,0.007812,0.015625,0.007812,0.046512,male
3,0.151228,0.072111,0.158011,0.096582,0.207955,0.111374,1.232831,4.177296,0.963322,0.727232,...,0.151228,0.088965,0.017798,0.25,0.201497,0.007812,0.5625,0.554688,0.247119,male
4,0.13512,0.079146,0.124656,0.07872,0.206045,0.127325,1.101174,4.333713,0.971955,0.783568,...,0.13512,0.106398,0.016931,0.266667,0.712812,0.007812,5.484375,5.476562,0.208274,male


**Step-3:** Considering the 'label' column as target variable , renaming the column as 'Gender_Identified'.

In [5]:
data.rename(columns = {'label':'Gender_Identified'}, inplace = True)
data.head()

Unnamed: 0,meanfreq,sd,median,Q25,Q75,IQR,skew,kurt,sp.ent,sfm,...,centroid,meanfun,minfun,maxfun,meandom,mindom,maxdom,dfrange,modindx,Gender_Identified
0,0.059781,0.064241,0.032027,0.015071,0.090193,0.075122,12.863462,274.402906,0.893369,0.491918,...,0.059781,0.084279,0.015702,0.275862,0.007812,0.007812,0.007812,0.0,0.0,male
1,0.066009,0.06731,0.040229,0.019414,0.092666,0.073252,22.423285,634.613855,0.892193,0.513724,...,0.066009,0.107937,0.015826,0.25,0.009014,0.007812,0.054688,0.046875,0.052632,male
2,0.077316,0.083829,0.036718,0.008701,0.131908,0.123207,30.757155,1024.927705,0.846389,0.478905,...,0.077316,0.098706,0.015656,0.271186,0.00799,0.007812,0.015625,0.007812,0.046512,male
3,0.151228,0.072111,0.158011,0.096582,0.207955,0.111374,1.232831,4.177296,0.963322,0.727232,...,0.151228,0.088965,0.017798,0.25,0.201497,0.007812,0.5625,0.554688,0.247119,male
4,0.13512,0.079146,0.124656,0.07872,0.206045,0.127325,1.101174,4.333713,0.971955,0.783568,...,0.13512,0.106398,0.016931,0.266667,0.712812,0.007812,5.484375,5.476562,0.208274,male


**Step-4:** identifying the null values.

In [6]:
data.isnull().sum()

meanfreq             0
sd                   0
median               0
Q25                  0
Q75                  0
IQR                  0
skew                 0
kurt                 0
sp.ent               0
sfm                  0
mode                 0
centroid             0
meanfun              0
minfun               0
maxfun               0
meandom              0
mindom               0
maxdom               0
dfrange              0
modindx              0
Gender_Identified    0
dtype: int64

**Step-5:** Using the preprocessing() function labeling the target column.

In [7]:
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
data.Gender_Identified = le.fit_transform(data['Gender_Identified'])

In [8]:
data.sample(10)

Unnamed: 0,meanfreq,sd,median,Q25,Q75,IQR,skew,kurt,sp.ent,sfm,...,centroid,meanfun,minfun,maxfun,meandom,mindom,maxdom,dfrange,modindx,Gender_Identified
1189,0.206442,0.060208,0.240289,0.137771,0.255687,0.117916,2.481726,9.252438,0.855626,0.265558,...,0.206442,0.12727,0.047904,0.27907,0.763871,0.023438,4.570312,4.546875,0.126172,1
2213,0.189496,0.043427,0.19701,0.172113,0.210361,0.038247,2.567235,10.710449,0.88383,0.363339,...,0.189496,0.150131,0.046967,0.27907,1.351004,0.023438,10.945312,10.921875,0.088684,0
1744,0.186594,0.038707,0.18757,0.168301,0.204731,0.03643,2.277118,7.80144,0.864232,0.2775,...,0.186594,0.170741,0.036782,0.205128,0.451823,0.164062,5.992188,5.828125,0.096981,0
891,0.193155,0.066995,0.209278,0.138196,0.252216,0.114021,2.613076,11.093426,0.906532,0.462571,...,0.193155,0.14349,0.050955,0.277457,0.778424,0.023438,5.71875,5.695312,0.108333,1
771,0.165863,0.058406,0.181561,0.10745,0.206565,0.099115,2.218335,9.604486,0.928958,0.472284,...,0.165863,0.098008,0.017957,0.225352,0.841064,0.101562,6.585938,6.484375,0.170643,1
1595,0.193004,0.038431,0.204637,0.170734,0.213307,0.042572,3.649283,19.036583,0.873564,0.27753,...,0.193004,0.157393,0.015826,0.246154,0.647629,0.007812,5.953125,5.945312,0.102606,0
1925,0.151283,0.110261,0.208732,0.000267,0.245567,0.2453,25.152536,680.485837,0.768258,0.288361,...,0.151283,0.182116,0.015984,0.275862,0.007812,0.007812,0.007812,0.0,0.0,0
332,0.158994,0.061619,0.177844,0.103906,0.208469,0.104563,3.12804,15.619118,0.908233,0.483818,...,0.158994,0.11035,0.011161,0.27027,0.415039,0.097656,0.756836,0.65918,0.384832,1
1519,0.142579,0.082545,0.126502,0.087578,0.22852,0.140942,4.051959,33.206868,0.939789,0.653992,...,0.142579,0.092487,0.015873,0.262295,0.058919,0.007812,0.257812,0.25,0.263587,1
842,0.174892,0.055032,0.183887,0.119373,0.207147,0.087774,2.040028,7.880566,0.905661,0.351681,...,0.174892,0.112162,0.046921,0.277457,1.005632,0.023438,7.03125,7.007812,0.087485,1


**Step-6:** Separating the feature vectors and the target variable.

In [9]:
X=data.drop(['Gender_Identified'],axis=1)
y=data.Gender_Identified

In [10]:
X.head()

Unnamed: 0,meanfreq,sd,median,Q25,Q75,IQR,skew,kurt,sp.ent,sfm,mode,centroid,meanfun,minfun,maxfun,meandom,mindom,maxdom,dfrange,modindx
0,0.059781,0.064241,0.032027,0.015071,0.090193,0.075122,12.863462,274.402906,0.893369,0.491918,0.0,0.059781,0.084279,0.015702,0.275862,0.007812,0.007812,0.007812,0.0,0.0
1,0.066009,0.06731,0.040229,0.019414,0.092666,0.073252,22.423285,634.613855,0.892193,0.513724,0.0,0.066009,0.107937,0.015826,0.25,0.009014,0.007812,0.054688,0.046875,0.052632
2,0.077316,0.083829,0.036718,0.008701,0.131908,0.123207,30.757155,1024.927705,0.846389,0.478905,0.0,0.077316,0.098706,0.015656,0.271186,0.00799,0.007812,0.015625,0.007812,0.046512
3,0.151228,0.072111,0.158011,0.096582,0.207955,0.111374,1.232831,4.177296,0.963322,0.727232,0.083878,0.151228,0.088965,0.017798,0.25,0.201497,0.007812,0.5625,0.554688,0.247119
4,0.13512,0.079146,0.124656,0.07872,0.206045,0.127325,1.101174,4.333713,0.971955,0.783568,0.104261,0.13512,0.106398,0.016931,0.266667,0.712812,0.007812,5.484375,5.476562,0.208274


In [11]:
y.head()

0    1
1    1
2    1
3    1
4    1
Name: Gender_Identified, dtype: int32

**Step-7** Building a RandomForestClassifier model and finding the best parameters using Grid search.

In [12]:
params = { "criterion" : ["gini", "entropy"], "n_estimators": [100, 150, 200,300]}
rf_gsv=GridSearchCV(estimator=RandomForestClassifier(),param_grid=params,cv=3,scoring='accuracy')
rf_gsv.fit(X,y)

In [13]:
pd.DataFrame(rf_gsv.cv_results_).sort_values('rank_test_score')

Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_criterion,param_n_estimators,params,split0_test_score,split1_test_score,split2_test_score,mean_test_score,std_test_score,rank_test_score
5,1.583361,0.084915,0.02603,0.007372,entropy,150,"{'criterion': 'entropy', 'n_estimators': 150}",0.946023,0.982955,0.970644,0.96654,0.015354,1
4,1.043604,0.065685,0.015619,7e-06,entropy,100,"{'criterion': 'entropy', 'n_estimators': 100}",0.94697,0.982008,0.969697,0.966225,0.014513,2
1,1.613823,0.090261,0.015624,3.3e-05,gini,150,"{'criterion': 'gini', 'n_estimators': 150}",0.942235,0.982008,0.973485,0.965909,0.017098,3
6,2.14601,0.125844,0.032376,0.000845,entropy,200,"{'criterion': 'entropy', 'n_estimators': 200}",0.946023,0.982955,0.96875,0.965909,0.015211,4
3,3.205979,0.152776,0.047052,0.000279,gini,300,"{'criterion': 'gini', 'n_estimators': 300}",0.944129,0.981061,0.971591,0.965593,0.015662,5
7,3.197276,0.230546,0.047073,0.00031,entropy,300,"{'criterion': 'entropy', 'n_estimators': 300}",0.942235,0.981061,0.970644,0.964646,0.016408,6
0,1.134776,0.075287,0.016037,0.000256,gini,100,"{'criterion': 'gini', 'n_estimators': 100}",0.940341,0.982008,0.970644,0.964331,0.017586,7
2,2.196556,0.142879,0.031269,5.5e-05,gini,200,"{'criterion': 'gini', 'n_estimators': 200}",0.939394,0.982008,0.96875,0.963384,0.017806,8


**Step-8:** Printing the best parameters.

In [14]:
print("The best parameters are:")
rf_gsv.best_params_

The best parameters are:


{'criterion': 'entropy', 'n_estimators': 150}

**Step-9:** Printing the best estimator.

In [15]:
print("The best estimator is:")
rf_gsv.best_estimator_

The best estimator is:
