
# Iris Data Analysis with Pandas & Matplotlib

This notebook explores the famous Iris dataset using **Pandas** for data analysis and **Matplotlib** for visualization.

> I tried to follow the steps given in the assignment, adding some personal notes along the way.


In [None]:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets

# I decided to use seaborn style because it makes plots look nicer
sns.set(style="whitegrid")

# Load Iris dataset
iris = datasets.load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)

# Just to see what the data looks like
df.head()


In [None]:

# Checking info about the dataset
df.info()

# Check missing values
df.isnull().sum()


In [None]:

# Quick statistics
df.describe()

# Group by species and calculate mean for each numeric column
df.groupby('species').mean()



From the grouping above, I can see that:
- *setosa* has generally smaller petal sizes.
- *virginica* tends to have the largest petals.
- *versicolor* is usually in between.


In [None]:

# 1. Line chart - just cumulative petal length for fun
plt.figure(figsize=(8,4))
plt.plot(df['petal length (cm)'].cumsum(), label="Cumulative Petal Length")
plt.title("Line Chart - Cumulative Petal Length")
plt.xlabel("Index")
plt.ylabel("Cumulative Length (cm)")
plt.legend()
plt.show()

# 2. Bar chart - average petal length per species
df.groupby('species')['petal length (cm)'].mean().plot(kind='bar', color=['#ff9999','#66b3ff','#99ff99'])
plt.title("Average Petal Length per Species")
plt.ylabel("Petal Length (cm)")
plt.show()

# 3. Histogram - sepal width distribution
plt.hist(df['sepal width (cm)'], bins=15, color='#66b3ff', edgecolor='black')
plt.title("Histogram - Sepal Width Distribution")
plt.xlabel("Sepal Width (cm)")
plt.ylabel("Frequency")
plt.show()

# 4. Scatter plot - sepal length vs petal length
sns.scatterplot(data=df, x='sepal length (cm)', y='petal length (cm)', hue='species', palette='Set1')
plt.title("Scatter Plot - Sepal vs Petal Length")
plt.show()



## Conclusion
This was a simple but interesting analysis.  
From the scatter plot, it's clear that the three species are quite separable based on petal and sepal measurements.

I also realized that **virginica** usually has the largest petals, while **setosa** has the smallest.
