# Iris Dataset Analysis Assignment


## Task 1: Load and Explore the Dataset
- Load the dataset using pandas.
- Display the first few rows using `.head()`.
- Check data types and missing values.
- Clean the dataset by filling or dropping missing values.

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

# Load iris dataset
iris = load_iris()
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['species'] = iris.target_names[iris.target]

# Display first few rows
print(df.head())

# Check data types and missing values
print(df.info())
print(df.isnull().sum())


## Task 2: Basic Data Analysis
- Compute basic statistics (mean, median, std) using `.describe()`.
- Group by species and compute mean for each group.
- Identify interesting patterns or findings.

In [None]:
# Task 2: Basic Data Analysis

# Basic statistics
print("Basic Statistics:")
print(df.describe())

# Grouping by species and computing mean
print("\nMean values grouped by species:")
print(df.groupby('species').mean())

# Optional: You can add a median too
print("\nMedian values grouped by species:")
print(df.groupby('species').median())


## Task 3: Data Visualization
- Line chart: trends over time (if applicable).
- Bar chart: compare average petal length per species.
- Histogram: visualize distribution of a numerical column.
- Scatter plot: show relationship between two numerical columns.

In [None]:
# Task 3: Data Visualization

# Set a seaborn theme
sns.set(style="whitegrid")

# 1. Line chart – fake a time column to simulate trend over time
df['index'] = df.index  # simulate time with index
plt.figure(figsize=(10, 5))
sns.lineplot(x='index', y='sepal length (cm)', data=df)
plt.title('Simulated Trend of Sepal Length Over Index')
plt.xlabel('Index')
plt.ylabel('Sepal Length (cm)')
plt.show()

# 2. Bar chart – Average petal length per species
plt.figure(figsize=(8, 5))
sns.barplot(x='species', y='petal length (cm)', data=df)
plt.title('Average Petal Length per Species')
plt.xlabel('Species')
plt.ylabel('Petal Length (cm)')
plt.show()

# 3. Histogram – Distribution of Sepal Width
plt.figure(figsize=(8, 5))
sns.histplot(df['sepal width (cm)'], bins=15, kde=True)
plt.title('Distribution of Sepal Width')
plt.xlabel('Sepal Width (cm)')
plt.ylabel('Frequency')
plt.show()

# 4. Scatter plot – Sepal Length vs Petal Length
plt.figure(figsize=(8, 5))
sns.scatterplot(x='sepal length (cm)', y='petal length (cm)', hue='species', data=df)
plt.title('Sepal Length vs Petal Length by Species')
plt.xlabel('Sepal Length (cm)')
plt.ylabel('Petal Length (cm)')
plt.legend(title='Species')
plt.show()


## Findings and Conclusion
- Summarize insights from your analysis and visualizations.
- Mention any important patterns or observations.

## Error Handling (if applicable)
- Show how you handled missing files or bad data using `try-except`.

In [None]:
try:
    # Example of trying to load from a CSV (commented because we're using sklearn)
    # df = pd.read_csv('iris.csv')
    print("Dataset loaded successfully.")
except FileNotFoundError:
    print("Error: File not found.")
except Exception as e:
    print(f"An unexpected error occurred: {e}")
