# Iris Data set classification



## We are importing some important libraries for analysis

In [1]:
import numpy as np
import pandas as pd
import sklearn

## Loading Dataset.

In [2]:
data=pd.read_csv("iris.csv")


## Extract few rows of data for understanding data pattern


In [3]:
data.head()

Unnamed: 0,sepal length in cm,sepal width in cm,petal length in cm,petal width in cm,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


### The describe() function is used to view some basic statistical details like percentile,mean, std etc. of a data frame of numeric values.

In [4]:
data.describe()

Unnamed: 0,sepal length in cm,sepal width in cm,petal length in cm,petal width in cm
count,150.0,150.0,150.0,150.0
mean,5.843333,3.054,3.758667,1.198667
std,0.828066,0.433594,1.76442,0.763161
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


#### The info() function is used to print a concise summary of a DataFrame. This methodprints information about a DataFrame including the index dtype and column dtypes, non-null values and memory usage. Whether to print the full summary. ...max_info_columns is followed.

In [5]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   sepal length in cm   150 non-null    float64
 1   sepal width in cm    150 non-null    float64
 2    petal length in cm  150 non-null    float64
 3   petal width in cm    150 non-null    float64
 4   class                150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB


### The value_counts() method returns a Series containing the counts of unique values.


In [6]:
data["class"].value_counts()

Iris-versicolor    50
Iris-virginica     50
Iris-setosa        50
Name: class, dtype: int64

### We used isnull() function for detect missing values in Iris dataset. It return a boolean same-sized object indicating if the values are NA. Missing values gets mapped to True and non-missing value gets mapped to False

In [7]:
data.isnull().sum()

sepal length in cm     0
sepal width in cm      0
 petal length in cm    0
petal width in cm      0
class                  0
dtype: int64

### We used corr() for finding correlation. Correlation is a statistical measure. Correlation explains how one or more variables are related to each other. These variables can be input data features which have been used to classify our target variable

In [8]:
data.corr()

Unnamed: 0,sepal length in cm,sepal width in cm,petal length in cm,petal width in cm
sepal length in cm,1.0,-0.109369,0.871754,0.817954
sepal width in cm,-0.109369,1.0,-0.420516,-0.356544
petal length in cm,0.871754,-0.420516,1.0,0.962757
petal width in cm,0.817954,-0.356544,0.962757,1.0


### We import LabelEncoder from sklearn it can be used to normalize labels and to transform non-numerical labels to numerical labels.

In [9]:
from sklearn.preprocessing import LabelEncoder


In [10]:
le=LabelEncoder()

In [11]:
data["class"]=le.fit_transform(data["class"])
data.head()

Unnamed: 0,sepal length in cm,sepal width in cm,petal length in cm,petal width in cm,class
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


### The scikit-learn library provides an implementation of the train-test split evaluation procedure via the train_test_split() function. The function takes a loaded dataset as input and returns the dataset split into two subsets (Train & Test).


In [12]:
from sklearn.model_selection import train_test_split
x=data.drop(columns=['class'])
y=data['class']
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.30)

In [13]:
x_train.shape

(105, 4)

In [14]:
y_train.shape

(105,)

In [15]:
x_test.shape

(45, 4)

In [16]:
y_test.shape

(45,)

### The scikit-learn library provides Logistic regression it is a simple yet very effective classification algorithm so it is commonly used for many binary classification tasks.The basis of logistic regression is the logistic function, also called the sigmoid function.

In [17]:
from sklearn.linear_model import LogisticRegression
model=LogisticRegression()

### Model fitting is a measure of how well a machine learning model generalizes to similar data to that on which it was trained. A model that is well-fitted produces more accurate outcomes


In [18]:
model.fit(x_train,y_train)

LogisticRegression()

### Accuracy is one metric for evaluating classification models. Informally, accuracy is the fraction of predictions our model got right. Formally, accuracy has the following 
### definition: Accuracy = Number of correct predictions Total number of predictions.



In [19]:
print("Accuracy: ",model.score(x_test,y_test)*100)

Accuracy:  95.55555555555556


### Thank you 