<a href="https://colab.research.google.com/github/software-development-course-2025/python-week-7-assignment/blob/main/iris_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Iris Dataset - Basic Data Exploration

This notebook performs initial exploration of the classic Iris dataset.  
It includes data loading, summary statistics, grouping, and early observations.


In [None]:
import pandas as pd
from sklearn.datasets import load_iris
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df["species"] = pd.Categorical.from_codes(iris.target, iris.target_names)
df.head()
df.info()
df.isnull().sum()


## Descriptive Statistics


In [None]:
df.describe()


## Mean Measurements per Species


In [None]:
df.groupby("species", observed=True).mean()


## 🔍 Observations

- *Setosa* has significantly smaller petal length and width.
- *Virginica* generally shows the largest values for most features.
- No missing values found, and data types are consistent.

This prepares the ground for visualizations in the next step.


In [None]:
from sklearn.datasets import load_iris
import pandas as pd

iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['species'] = pd.Categorical.from_codes(iris.target, iris.target_names)

# Import plotting libraries
import matplotlib.pyplot as plt
import seaborn as sns

# Set seaborn style for nicer visuals
sns.set(style="whitegrid")

# 1. Line chart (simulated trend using sample index)
plt.figure(figsize=(8, 4))
plt.plot(df.index, df['sepal length (cm)'], label='Sepal Length')
plt.plot(df.index, df['sepal width (cm)'], label='Sepal Width')
plt.title('Sepal Length and Width Trends')
plt.xlabel('Sample Index')
plt.ylabel('Centimeters')
plt.legend()
plt.tight_layout()
plt.show()

# 2. Bar chart (average petal length by species)
mean_petal_length = df.groupby('species', observed=True)['petal length (cm)'].mean().reset_index()
plt.figure(figsize=(6, 4))
sns.barplot(data=mean_petal_length, x='species', y='petal length (cm)', palette='pastel', hue='species', legend=False)
plt.title('Average Petal Length by Species')
plt.xlabel('Species')
plt.ylabel('Average Petal Length (cm)')
plt.tight_layout()
plt.show()

# 3. Histogram (distribution of petal width)
plt.figure(figsize=(6, 4))
plt.hist(df['petal width (cm)'], bins=15, color='skyblue', edgecolor='black')
plt.title('Distribution of Petal Width')
plt.xlabel('Petal Width (cm)')
plt.ylabel('Frequency')
plt.tight_layout()
plt.show()

# 4. Scatter plot (sepal length vs petal length by species)
plt.figure(figsize=(6, 4))
sns.scatterplot(data=df, x='sepal length (cm)', y='petal length (cm)', hue='species', palette='deep')
plt.title('Sepal Length vs Petal Length by Species')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Petal Length (cm)')
plt.legend(title='Species')
plt.tight_layout()
plt.show()
