In [None]:
# Iris Dataset Analysis

In this notebook, we perform data loading, exploration, analysis, and visualization on the classic Iris dataset. The Iris dataset contains measurements of sepal and petal lengths and widths for three species of iris flowers: setosa, versicolor, and virginica.


In [None]:
## Step 1: Importing Libraries

We import necessary libraries, including pandas for data manipulation, matplotlib and seaborn for visualization, and sklearn to load the Iris dataset.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris


In [None]:
## Step 2: Loading and Exploring the Dataset

We load the Iris dataset using `sklearn.datasets.load_iris()` and convert it into a pandas DataFrame. Then, we inspect the first few rows and check for data types and missing values.

try:
    iris = load_iris()
    iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
    iris_df['species'] = iris.target
    iris_df['species'] = iris_df['species'].map(dict(zip(range(3), iris.target_names)))

    print("First 5 rows:")
    display(iris_df.head())
    
    print("\nData Types:")
    print(iris_df.dtypes)

    print("\nMissing Values:")
    print(iris_df.isnull().sum())

except FileNotFoundError:
    print("Dataset file not found.")
except Exception as e:
    print("An error occurred:", e)


In [None]:
## Step 3: Basic Statistics and Grouped Analysis

We use `.describe()` to get summary statistics of the numerical columns. Then, we group the data by species to compute the mean values for each species and identify notable differences.

print("Summary statistics:")
display(iris_df.describe())

print("\nMean values per species:")
display(iris_df.groupby('species').mean())


In [None]:
## Step 4: Line Chart – Average Petal Length per Species

This line chart shows the average petal length for each iris species. It helps us understand how petal length varies across species.

grouped_means = iris_df.groupby('species').mean()

plt.figure(figsize=(8,5))
sns.lineplot(data=grouped_means[['petal length (cm)']].reset_index(), x='species', y='petal length (cm)', marker='o')
plt.title('Average Petal Length per Species')
plt.xlabel('Species')
plt.ylabel('Petal Length (cm)')
plt.tight_layout()
plt.show()


In [None]:
## Step 5: Bar Chart – Average Sepal Width per Species

This bar chart visualizes the average sepal width for each iris species, giving a clear comparison across categories.

plt.figure(figsize=(8,5))
sns.barplot(data=iris_df, x='species', y='sepal width (cm)', ci=None)
plt.title('Average Sepal Width per Species')
plt.xlabel('Species')
plt.ylabel('Sepal Width (cm)')
plt.tight_layout()
plt.show()


In [None]:
## Step 6: Histogram – Distribution of Petal Length

The histogram provides insight into how petal lengths are distributed in the dataset. It shows whether the data is skewed or normally distributed.

plt.figure(figsize=(8,5))
plt.hist(iris_df['petal length (cm)'], bins=20, color='teal', edgecolor='black')
plt.title('Distribution of Petal Length')
plt.xlabel('Petal Length (cm)')
plt.ylabel('Frequency')
plt.tight_layout()
plt.show()


In [None]:
## Step 7: Scatter Plot – Sepal Length vs Petal Length

This scatter plot displays the relationship between sepal length and petal length. The use of different colors for each species helps visualize clustering and separation.

plt.figure(figsize=(8,5))
sns.scatterplot(data=iris_df, x='sepal length (cm)', y='petal length (cm)', hue='species', palette='deep')
plt.title('Sepal Length vs Petal Length')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Petal Length (cm)')
plt.legend(title='Species')
plt.tight_layout()
plt.show()


In [None]:
## Conclusion

From our analysis and visualizations:

- **Iris-virginica** typically has the longest petals and sepals.
- Petal length is a strong indicator for distinguishing species.
- The scatter plot clearly shows how well-separated the species are based on sepal and petal measurements.

This analysis demonstrates how simple exploratory data analysis (EDA) techniques can provide meaningful insights into a dataset.
