
# Data Analysis and Visualization Assignment

**Objective:**  
- Load and analyze a dataset using the pandas library in Python.  
- Create simple plots and charts with the matplotlib library for visualizing the data.  


## Task 1: Load and Explore the Dataset

In [None]:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris

# Load dataset with error handling
try:
    iris = load_iris(as_frame=True)
    df = iris.frame  # Convert to pandas DataFrame
    print("Dataset loaded successfully!\n")
except FileNotFoundError:
    print("Error: Dataset file not found.")
except Exception as e:
    print(f"An error occurred: {e}")

# Display first few rows
print("First 5 rows:")
print(df.head(), "\n")

# Dataset info
print("Data Types and Null Values:")
print(df.info(), "\n")

# Check for missing values
print("Missing Values:")
print(df.isnull().sum(), "\n")

# Clean dataset (if missing values existed)
df = df.dropna()


## Task 2: Basic Data Analysis

In [None]:

# Basic statistics
print("Descriptive Statistics:")
print(df.describe(), "\n")

# Group by species and compute mean
print("Mean values per species:")
print(df.groupby("target").mean(), "\n")

# Add species names for better readability
df["species"] = df["target"].map({i: name for i, name in enumerate(iris.target_names)})

print("Interesting Finding:")
print("Setosa generally has smaller sepal and petal measurements compared to Virginica and Versicolor.\n")


## Task 3: Data Visualization

In [None]:

# 1. Line chart: Trend of sepal length across dataset index
plt.figure(figsize=(8,5))
plt.plot(df.index, df["sepal length (cm)"], label="Sepal Length")
plt.title("Line Chart: Sepal Length Trend")
plt.xlabel("Index")
plt.ylabel("Sepal Length (cm)")
plt.legend()
plt.show()

# 2. Bar chart: Average petal length per species
plt.figure(figsize=(8,5))
sns.barplot(x="species", y="petal length (cm)", data=df, ci=None)
plt.title("Bar Chart: Avg Petal Length per Species")
plt.xlabel("Species")
plt.ylabel("Petal Length (cm)")
plt.show()

# 3. Histogram: Distribution of sepal width
plt.figure(figsize=(8,5))
plt.hist(df["sepal width (cm)"], bins=15, edgecolor="black")
plt.title("Histogram: Sepal Width Distribution")
plt.xlabel("Sepal Width (cm)")
plt.ylabel("Frequency")
plt.show()

# 4. Scatter plot: Sepal length vs Petal length
plt.figure(figsize=(8,5))
sns.scatterplot(x="sepal length (cm)", y="petal length (cm)", hue="species", data=df)
plt.title("Scatter Plot: Sepal Length vs Petal Length")
plt.xlabel("Sepal Length (cm)")
plt.ylabel("Petal Length (cm)")
plt.legend(title="Species")
plt.show()



## Observations
1. Sepal length shows a general upward trend across the dataset index.  
2. Virginica species has the largest average petal length.  
3. Sepal width is mostly distributed between 2.5 - 3.5 cm.  
4. Sepal length and petal length are positively correlated, with clear separation among species.  
