# Task-1 Classification
***
The dataset consists of 97 features of 354 mobile phones. Column 1 is the PhoneID and column 2 is the alias of the phone, which doesn't affect the model.

### Part-1 Importing all required libraries
***
1. Matplotlib- To plot graphs <br>
2. Numpy- For calculations <br>
3. Pandas- To modify given data into required dataframe <br>
4. Train_Test_Split- To split data into training and testing sets <br>
5. Accuracy Score- To calculate accuracy score of the model.<br>
6. OneHotEncoder/LabelEncoder- Used to Encode text, into integer labels. <br>
7. StandardScaler-Standardization of a dataset is a common requirement for many machine learning estimators: they might behave badly if the individual features do not more or less look like standard normally distributed data <br>
8. Imputer- Used to complete incomplete values <br>
9. SVC- Classifier used <br>

In [1]:
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from sklearn.preprocessing import Imputer
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV

### Part-2 Reading and displaying dataset for analysis
***
The displayed dataset is to be classified.

In [2]:
data=pd.read_csv("dataset.csv")
data

Unnamed: 0,PhoneId,Also Known As,Applications,Architecture,Aspect Ratio,Audio Features,Audio Jack,Autofocus,Bezel-less display,Bluetooth,...,Video Recording,Video Recording Features,VoLTE,Waterproof,Weight,Wi-Fi,Wi-Fi Features,Width,Wireless Charging,Rating
0,0,,,64 bit,19:9,,3.5 mm,Phase Detection autofocus,yes,v5.0,...,1920x1080 @ 30 fps,,yes,,182 grams Below Average ▾Weight compared to11 ...,"Wi-Fi 802.11, a/b/g/n","Wi-Fi Direct, Mobile Hotspot",76.4 mm,,4.5
1,1,,Oppo Browser,64 bit,19:9,,3.5 mm,Phase Detection autofocus,yes,v4.2,...,1920x1080 @ 30 fps,,yes,,168 grams Average ▾Weight compared to8 - 12 K ...,"Wi-Fi 802.11, b/g/n",Mobile Hotspot,75.6 mm,,4.5
2,2,,,64 bit,19.5:9,,3.5 mm,yes,yes,v4.2,...,,,yes,,168 grams Average ▾Weight compared to10 - 14 K...,"Wi-Fi 802.11, b/g/n",Mobile Hotspot,74 mm,,4.4
3,4,,,64 bit,18.5:9,Dolby Atmos,3.5 mm,No,yes,v5.0,...,1920x1080 @ 30 fps,,yes,,169 grams Average ▾Weight compared to15 - 23 K...,"Wi-Fi 802.11, a/ac/b/g/n","Wi-Fi Direct, Mobile Hotspot",76.8 mm,,4.3
4,5,,,64 bit,19.5:9,,3.5 mm,Phase Detection autofocus,yes,v4.2,...,1920x1080 @ 60 fps,,yes,,175 grams Below Average ▾Weight compared to12 ...,"Wi-Fi 802.11, a/ac/b/g/n",Mobile Hotspot,76.6 mm,,4.4
5,6,,,,,,,,,,...,,,,,,,,,,4.4
6,7,Vivo V11 Pro,"Newspoint, Amazon, Amazon Prime Video, Phonepe...",64 bit,19.5:9,,3.5 mm,Phase Detection autofocus,yes,v5.0,...,3840x2160 @ 30 fps,,yes,,156 grams Average ▾Weight compared to18 - 28 K...,"Wi-Fi 802.11, b/g/n/n 5GHz","Wi-Fi Direct, Mobile Hotspot",75 mm,,4.5
7,8,,,64 bit,18:9,,3.5 mm,Phase Detection autofocus,yes,v4.2,...,"1920x1080 @ 30 fps, 1280x720 @ 30 fps",,yes,,145 grams Good ▾Weight compared to5 - 7 K Phon...,"Wi-Fi 802.11, b/g/n","Wi-Fi Direct, Mobile Hotspot",71.5 mm,,4.3
8,9,,,64 bit,19:9,,3.5 mm,Phase Detection autofocus,yes,v4.2,...,"1920x1080 @ 30 fps, 1280x720 @ 120 fps",,yes,,178 grams Below Average ▾Weight compared to8 -...,"Wi-Fi 802.11, a/b/g/n","Wi-Fi Direct, Mobile Hotspot",71.6 mm,,4.1
9,10,,,64 bit,19:9,,3.5 mm,yes,yes,v4.2,...,1920x1080 @ 30 fps,,yes,,168 grams Average ▾Weight compared to7 - 11 K ...,"Wi-Fi 802.11, b/g/n",Mobile Hotspot,75.6 mm,,4.3


## Part-3 Replacing all NANs with empty strings
***
All NaN elements in all respective columns are replaced with an empty string to avoid error

In [3]:
data = data.replace(np.nan,'',regex=True)
data.head(10)

Unnamed: 0,PhoneId,Also Known As,Applications,Architecture,Aspect Ratio,Audio Features,Audio Jack,Autofocus,Bezel-less display,Bluetooth,...,Video Recording,Video Recording Features,VoLTE,Waterproof,Weight,Wi-Fi,Wi-Fi Features,Width,Wireless Charging,Rating
0,0,,,64 bit,19:9,,3.5 mm,Phase Detection autofocus,yes,v5.0,...,1920x1080 @ 30 fps,,yes,,182 grams Below Average ▾Weight compared to11 ...,"Wi-Fi 802.11, a/b/g/n","Wi-Fi Direct, Mobile Hotspot",76.4 mm,,4.5
1,1,,Oppo Browser,64 bit,19:9,,3.5 mm,Phase Detection autofocus,yes,v4.2,...,1920x1080 @ 30 fps,,yes,,168 grams Average ▾Weight compared to8 - 12 K ...,"Wi-Fi 802.11, b/g/n",Mobile Hotspot,75.6 mm,,4.5
2,2,,,64 bit,19.5:9,,3.5 mm,yes,yes,v4.2,...,,,yes,,168 grams Average ▾Weight compared to10 - 14 K...,"Wi-Fi 802.11, b/g/n",Mobile Hotspot,74 mm,,4.4
3,4,,,64 bit,18.5:9,Dolby Atmos,3.5 mm,No,yes,v5.0,...,1920x1080 @ 30 fps,,yes,,169 grams Average ▾Weight compared to15 - 23 K...,"Wi-Fi 802.11, a/ac/b/g/n","Wi-Fi Direct, Mobile Hotspot",76.8 mm,,4.3
4,5,,,64 bit,19.5:9,,3.5 mm,Phase Detection autofocus,yes,v4.2,...,1920x1080 @ 60 fps,,yes,,175 grams Below Average ▾Weight compared to12 ...,"Wi-Fi 802.11, a/ac/b/g/n",Mobile Hotspot,76.6 mm,,4.4
5,6,,,,,,,,,,...,,,,,,,,,,4.4
6,7,Vivo V11 Pro,"Newspoint, Amazon, Amazon Prime Video, Phonepe...",64 bit,19.5:9,,3.5 mm,Phase Detection autofocus,yes,v5.0,...,3840x2160 @ 30 fps,,yes,,156 grams Average ▾Weight compared to18 - 28 K...,"Wi-Fi 802.11, b/g/n/n 5GHz","Wi-Fi Direct, Mobile Hotspot",75 mm,,4.5
7,8,,,64 bit,18:9,,3.5 mm,Phase Detection autofocus,yes,v4.2,...,"1920x1080 @ 30 fps, 1280x720 @ 30 fps",,yes,,145 grams Good ▾Weight compared to5 - 7 K Phon...,"Wi-Fi 802.11, b/g/n","Wi-Fi Direct, Mobile Hotspot",71.5 mm,,4.3
8,9,,,64 bit,19:9,,3.5 mm,Phase Detection autofocus,yes,v4.2,...,"1920x1080 @ 30 fps, 1280x720 @ 120 fps",,yes,,178 grams Below Average ▾Weight compared to8 -...,"Wi-Fi 802.11, a/b/g/n","Wi-Fi Direct, Mobile Hotspot",71.6 mm,,4.1
9,10,,,64 bit,19:9,,3.5 mm,yes,yes,v4.2,...,1920x1080 @ 30 fps,,yes,,168 grams Average ▾Weight compared to7 - 11 K ...,"Wi-Fi 802.11, b/g/n",Mobile Hotspot,75.6 mm,,4.3


## Part-4 Changing last column into Binary
***
Whenever Rating is more than 4, we assume the customer likes the phone. Hence whenever Rating column is greater than or equal to 4, we make it 1 and otherwise zero.

Currently facing an error, I am unable to fix.

In [4]:
def binary(x):
    if x > 4:
        x=1
    else:
        x=0

In [5]:
data.apply(binary,axis=0)

ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index PhoneId')

In [6]:
for i in range(5):
    print(data.iloc[i:i+1,98:99])

   Rating
0     4.5
   Rating
1     4.5
   Rating
2     4.4
   Rating
3     4.3
   Rating
4     4.4


## Part-5 Label encoding all columns
***
Each row is individually label encoded

In [7]:
le = LabelEncoder()
for i in range(97): 
    data.iloc[:,i+2] = le.fit_transform(data.iloc[:,i+2])

In [8]:
data

Unnamed: 0,PhoneId,Also Known As,Applications,Architecture,Aspect Ratio,Audio Features,Audio Jack,Autofocus,Bezel-less display,Bluetooth,...,Video Recording,Video Recording Features,VoLTE,Waterproof,Weight,Wi-Fi,Wi-Fi Features,Width,Wireless Charging,Rating
0,0,,0,2,7,0,1,7,1,9,...,3,0,1,0,228,8,3,102,0,17
1,1,,18,2,7,0,1,7,1,8,...,3,0,1,0,179,12,1,94,0,17
2,2,,0,2,6,0,1,13,1,8,...,0,0,1,0,173,12,1,80,0,16
3,4,,0,2,2,1,1,6,1,9,...,3,0,1,0,183,2,3,106,0,15
4,5,,0,2,6,0,1,7,1,8,...,6,0,1,0,206,2,1,104,0,16
5,6,,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,16
6,7,Vivo V11 Pro,15,2,6,0,1,7,1,9,...,9,0,1,0,113,13,3,88,0,17
7,8,,0,2,4,0,1,7,1,8,...,5,0,1,0,61,12,3,56,0,15
8,9,,0,2,7,0,1,7,1,8,...,4,0,1,0,219,8,3,57,0,13
9,10,,0,2,7,0,1,13,1,8,...,3,0,1,0,178,12,1,94,0,15


## Part-6 Splitting Input and output data
***
Columns 3 to 98 are used as input data, and the Rating column is used as output

In [9]:
Input= data.iloc[:,3:98]
Output= data.iloc[:,98:99]
Input.head(5),Output.head(5)

(   Architecture  Aspect Ratio  Audio Features  Audio Jack  Autofocus  \
 0             2             7               0           1          7   
 1             2             7               0           1          7   
 2             2             6               0           1         13   
 3             2             2               1           1          6   
 4             2             6               0           1          7   
 
    Bezel-less display  Bluetooth  Brand  Browser  Build Material  ...  \
 0                   1          9     45        0               0  ...   
 1                   1          8     37        0               0  ...   
 2                   1          8     37        0               0  ...   
 3                   1          9     39        0               0  ...   
 4                   1          8     10        0               0  ...   
 
    Video Player  Video Recording  Video Recording Features  VoLTE  Waterproof  \
 0             0                

## Part-7 Train-Test Splitting
***
Using a random state to split given data into, training and testing sets

In [10]:
Xtrain, Xtest, Ytrain, Ytest = train_test_split(Input, Output, test_size = 0.2, random_state = 9)

## Part-8 Making parameters for Random Forest classifier
***
Parameters for the random forest classifiers are set.

In [11]:
parameters={
    'min_samples_split':[2,3,4,5,6,7,8,9,10],
    'max_leaf_nodes':[100,200,300,400,500],
    'criterion':['entropy']
}

## Part-9 Initialising Classifier with parameters
***
Using GridSearchCV to use multiple parameters with RandomForestClassifier() and fitting the data

In [12]:
cf=GridSearchCV(
    estimator=RandomForestClassifier(),
    cv=5,
    scoring='accuracy',
    refit=True,
    param_grid=parameters
)
cf.fit(Xtrain,Ytrain)

  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_

  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_

  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_

  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_

  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_

  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_

  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_

  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_

  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  self.best_estimator_.fit(X, y, **fit_params)


GridSearchCV(cv=5, error_score='raise-deprecating',
             estimator=RandomForestClassifier(bootstrap=True, class_weight=None,
                                              criterion='gini', max_depth=None,
                                              max_features='auto',
                                              max_leaf_nodes=None,
                                              min_impurity_decrease=0.0,
                                              min_impurity_split=None,
                                              min_samples_leaf=1,
                                              min_samples_split=2,
                                              min_weight_fraction_leaf=0.0,
                                              n_estimators='warn', n_jobs=None,
                                              oob_score=False,
                                              random_state=None, verbose=0,
                                              warm_start=False),
             iid

## Accuracy
***
Printing the accuracy score of the classifier

In [13]:
print(accuracy_score(Ytest,cf.predict(Xtest)))

0.2112676056338028
