# **Beginners Iris Dataset for Machine Learning**
Iris Dataset is used to predict the the species of Iris flowers based on the features Sepal length, Sepal Width, Petal Length, Patal Width (All in cm units)

## Step 1 - Importing Library Files

In [1]:
import pandas as pd                                   #working with our data
from sklearn.linear_model import LogisticRegression   #Model Algorithm
from sklearn.model_selection import train_test_split  #Dividing the dataset
from sklearn.preprocessing import LabelEncoder        #Encoding categorical variables
from sklearn.metrics import accuracy_score            #Accuracy Score
from sklearn.metrics import classification_report     #Confusion matrix

## Step 2 - Importing Dataset

In [2]:
data=pd.read_csv("data.csv")
data.head()

Unnamed: 0,sepal length,sepal width,petal length,petal width,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [3]:
data.columns 

Index(['sepal length', 'sepal width', 'petal length', 'petal width',
       'species'],
      dtype='object')

### Encoding the Categorical Target Variable using LabelEncoder

In [4]:
encode=LabelEncoder()

In [5]:
data['species']=encode.fit_transform(data['species'])

In [6]:
data.head()

Unnamed: 0,sepal length,sepal width,petal length,petal width,species
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


## Step 3 - Dividing the dataset into Train & Test Sets

In [7]:
train,test=train_test_split(data, test_size=.2,random_state=10)

In [8]:
print('Shape of Training set: ', train.shape)
print('Shape of Testing set: ', test.shape)

Shape of Training set:  (120, 5)
Shape of Testing set:  (30, 5)


### Seperating the Features & Target Variables for Training set

In [9]:
train_x=train.drop(columns=['species'],axis=1)
train_y=train['species']

### Seperating the Features & Target Variables for Test set

In [10]:
test_x=test.drop(columns=['species'],axis=1)
test_y=test['species']

## Step 4 - Training our Model

In [11]:
model=LogisticRegression()

In [12]:
model.fit(train_x,train_y)

LogisticRegression()

## Step 5 - Using our model to predict data

In [13]:
predict=model.predict(test_x)

In [14]:
print('Predicted the species on test data: ',encode.inverse_transform(predict))

Predicted the species on test data:  ['Iris-versicolor' 'Iris-virginica' 'Iris-setosa' 'Iris-versicolor'
 'Iris-setosa' 'Iris-versicolor' 'Iris-versicolor' 'Iris-versicolor'
 'Iris-setosa' 'Iris-versicolor' 'Iris-versicolor' 'Iris-virginica'
 'Iris-versicolor' 'Iris-setosa' 'Iris-setosa' 'Iris-virginica'
 'Iris-versicolor' 'Iris-setosa' 'Iris-setosa' 'Iris-setosa'
 'Iris-virginica' 'Iris-virginica' 'Iris-virginica' 'Iris-setosa'
 'Iris-versicolor' 'Iris-setosa' 'Iris-versicolor' 'Iris-versicolor'
 'Iris-versicolor' 'Iris-virginica']


## Step 6 - Evaluation Metrics to Determine the quality of our Model

In [15]:
accuracy_score(test_y,predict)

1.0

In [16]:
print(classification_report(test_y,predict))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00         7

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

