## Importing required libraries

Following are the commands to Import the required python libraries in our file.

In [2]:
import pandas as pd
from sklearn import model_selection
from sklearn.metrics import accuracy_score

# these are various machine learning models already stored in the sklearn library
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC

## Loading dataset

we shall load our dataset as a csv file, using read_csv() function of pandas library

In [4]:
file = pd.read_csv("iris.data", header=None)
file.head(5)

Unnamed: 0,0,1,2,3,4
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


## Exploring our dataset

Few things before building our model.

Run the following lines to print various information about the dataset we are going to use.


1. Finding dimensions

In [5]:
print(file.shape)

(150, 5)


2. Describing data with analytics

In [6]:
print(file.describe())

                0           1           2           3
count  150.000000  150.000000  150.000000  150.000000
mean     5.843333    3.054000    3.758667    1.198667
std      0.828066    0.433594    1.764420    0.763161
min      4.300000    2.000000    1.000000    0.100000
25%      5.100000    2.800000    1.600000    0.300000
50%      5.800000    3.000000    4.350000    1.300000
75%      6.400000    3.300000    5.100000    1.800000
max      7.900000    4.400000    6.900000    2.500000


3. Printing distribution of class

In [10]:
print(file.groupby(4).size())

4
Iris-setosa        50
Iris-versicolor    50
Iris-virginica     50
dtype: int64


## Making our First Model

#### Splitting dataset into Training and Testing
Following code is to firstly change the dataset into a 2D array, then separating target from it into Y, defining seed. And finally dividing our dataset into training and validation dataset.

In [14]:
array = file.values     # dataset to a 2d array
X = array[:,0:4]        # feature dataset
Y = array[:,4]          # target dataset
validation_size = 0.30  # validation size is used to take out 0.3 i.e 30% of our dataset into test dataset.
seed = 5                # why random seed is used its given
# finally slicing our dataset into trainin and testing
X_train, X_validation, Y_train, Y_validation = model_selection.train_test_split(X, Y, test_size=validation_size, random_state=seed)

# to test if its sliced properly
print(X_train[:3])

[[6.2 2.8 4.8 1.8]
 [5.9 3.0 4.2 1.5]
 [6.7 3.3 5.7 2.1]]


#### now lets define our model

In [16]:
model = LogisticRegression()

# fitting our model
model.fit(X_train, Y_train)

# predicting outcomes
predictions = model.predict(X_validation)

print(predictions[:10])

['Iris-versicolor' 'Iris-virginica' 'Iris-virginica' 'Iris-setosa'
 'Iris-virginica' 'Iris-versicolor' 'Iris-setosa' 'Iris-virginica'
 'Iris-setosa' 'Iris-virginica']


#### Printing the accuracy of our model

In [17]:
print(accuracy_score(Y_validation, predictions))

0.9333333333333333


## Testing other models

In [20]:
model = LogisticRegression()
model.fit(X_train, Y_train)
predictions = model.predict(X_validation)
print("Logistic Regression: ", accuracy_score(Y_validation, predictions, "\n"))

model = DecisionTreeClassifier()
model.fit(X_train, Y_train)
predictions = model.predict(X_validation)
print("DecisionTreeClassifier: ", accuracy_score(Y_validation, predictions, "\n"))

model = KNeighborsClassifier()
model.fit(X_train, Y_train)
predictions = model.predict(X_validation)
print("KNeigbhorsClassifier: ", accuracy_score(Y_validation, predictions, "\n"))

model = SVC()
model.fit(X_train, Y_train)
predictions = model.predict(X_validation)
print("SVC: ", accuracy_score(Y_validation, predictions, "\n"))

model = LinearDiscriminantAnalysis()
model.fit(X_train, Y_train)
predictions = model.predict(X_validation)
print("LinearDiscriminantAnalysis: ", accuracy_score(Y_validation, predictions, "\n"))

model = GaussianNB()
model.fit(X_train, Y_train)
predictions = model.predict(X_validation)
print("GaussianNB: ", accuracy_score(Y_validation, predictions, "\n"))

Logistic Regression:  0.9333333333333333
DecisionTreeClassifier:  0.9111111111111111
KNeigbhorsClassifier:  0.9555555555555556
SVC:  0.9777777777777777
LinearDiscriminantAnalysis:  0.9555555555555556
GaussianNB:  0.9333333333333333
