# Project Name - Iris Flower Classification - lgmvip task1


### Project Summary

#### Project Overview:
The Iris Flower Classification project aims to develop a machine learning model capable of accurately classifying iris flowers into their respective species based on specific measurements. The three species—setosa, versicolor, and virginica—each have unique measurement characteristics.

#### Objective:
The main objective is to utilize machine learning techniques to create a model that can reliably determine the species of an iris flower based on its measurements. This model will automate the classification process, providing an efficient method for identifying iris species.

#### Key Project Details:
- The project deals with three iris species: setosa, versicolor, and virginica.
- These species are distinguishable through measurements like sepal length, sepal width, petal length, and petal width.
- A machine learning model will be trained using a dataset that includes these measurements along with their corresponding species.
- The trained model will then be used to classify iris flowers into one of the three species based on the given measurements.

# 1. Know the data

In [1]:
# Import Libraries
# Importing Numpy & Pandas for data processing & data wrangling
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris

In [2]:
# Load the Iris dataset
iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['species'] = iris.target

In [3]:
# Inspect the dataset
print(df.head())

   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  \
0                5.1               3.5                1.4               0.2   
1                4.9               3.0                1.4               0.2   
2                4.7               3.2                1.3               0.2   
3                4.6               3.1                1.5               0.2   
4                5.0               3.6                1.4               0.2   

   species  
0        0  
1        0  
2        0  
3        0  
4        0  


In [7]:
# Dataset Rows & Columns count
# Checking number of rows and columns of the dataset using shape
print("Number of rows are: ",df.shape[0])
print("Number of columns are: ",df.shape[1])

Number of rows are:  150
Number of columns are:  5


In [4]:
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   sepal length (cm)  150 non-null    float64
 1   sepal width (cm)   150 non-null    float64
 2   petal length (cm)  150 non-null    float64
 3   petal width (cm)   150 non-null    float64
 4   species            150 non-null    int64  
dtypes: float64(4), int64(1)
memory usage: 6.0 KB
None


The output of df.info() shows that the dataset has 150 entries with no missing values. The dataset contains 5 columns: sepal length (cm), sepal width (cm), petal length (cm), petal width (cm), and species.

In [5]:
print(df.describe())

       sepal length (cm)  sepal width (cm)  petal length (cm)  \
count         150.000000        150.000000         150.000000   
mean            5.843333          3.057333           3.758000   
std             0.828066          0.435866           1.765298   
min             4.300000          2.000000           1.000000   
25%             5.100000          2.800000           1.600000   
50%             5.800000          3.000000           4.350000   
75%             6.400000          3.300000           5.100000   
max             7.900000          4.400000           6.900000   

       petal width (cm)     species  
count        150.000000  150.000000  
mean           1.199333    1.000000  
std            0.762238    0.819232  
min            0.100000    0.000000  
25%            0.300000    0.000000  
50%            1.300000    1.000000  
75%            1.800000    2.000000  
max            2.500000    2.000000  


In [6]:
# Check for missing values
print(df.isnull().sum())

sepal length (cm)    0
sepal width (cm)     0
petal length (cm)    0
petal width (cm)     0
species              0
dtype: int64


Great! The output confirms that there are no missing values in the dataset. Let's proceed with the next steps.

## Dataset Overview
### Structure of the Dataset:

The dataset contains 150 entries (rows) and 5 columns.
The columns are:
sepal length (cm)
sepal width (cm)
petal length (cm)
petal width (cm)
species
### Data Types:

The features (sepal length (cm), sepal width (cm), petal length (cm), petal width (cm)) are of type float64.
The target variable (species) is of type int64.
### Missing Values:

There are no missing values in the dataset.
### Descriptive Statistics:
#### Sepal Length:
Mean: 5.84 cm
Standard Deviation: 0.83 cm
Minimum: 4.3 cm
Maximum: 7.9 cm
#### Sepal Width:
Mean: 3.06 cm
Standard Deviation: 0.44 cm
Minimum: 2.0 cm
Maximum: 4.4 cm
#### Petal Length:
Mean: 3.76 cm
Standard Deviation: 1.77 cm
Minimum: 1.0 cm
Maximum: 6.9 cm
#### Petal Width:
Mean: 1.20 cm
Standard Deviation: 0.76 cm
Minimum: 0.1 cm
Maximum: 2.5 cm
#### Species:
The target variable is encoded as integers (0, 1, 2), representing the three species of iris flowers.