## Final Summary and Observations

From the analysis of the Iris dataset, we observed the following:

- **Missing Values**: The dataset had no missing values, making it clean and ready for analysis.  
- **Statistical Summary**: Descriptive statistics revealed differences in petal and sepal dimensions across species.  
- **Visualizations**:  
  - The **line chart** showed general patterns in sepal length values.  
  - The **bar chart** highlighted how Setosa has smaller sepal lengths compared to Versicolor and Virginica.  
  - The **histogram** revealed the distribution of petal lengths, showing clusters of small and large values.  
  - The **scatter plot** showed clear separation between species based on sepal and petal dimensions.  

Overall, the Iris dataset demonstrates distinct patterns that make it suitable for classification tasks in machine learning.


### Bar Chart: Average Sepal Length per Species
This bar chart compares the **average Sepal Length** for each species of Iris.  
It clearly shows differences in average Sepal Length between *Setosa*, *Versicolor*, and *Virginica*.

### Histogram: Distribution of Petal Length
This histogram shows how the **Petal Length** values are distributed across the dataset.  
It helps us see the frequency of short vs. long petals.

### Scatter Plot: Sepal Length vs Petal Length

The scatter plot below shows the relationship between **Sepal Length** and **Petal Length**.  
The points are color-coded by species (Setosa, Versicolor, Virginica).  

- Setosa is clearly separated from the other two species.  
- Versicolor and Virginica overlap slightly but still form distinguishable groups.  

In [None]:
plt.figure(figsize=(6,4))
sns.scatterplot(
    x="sepal length (cm)", 
    y="petal length (cm)", 
    hue="target", 
    palette="deep", 
    data=df
)
plt.title("Sepal Length vs Petal Length")
plt.xlabel("Sepal Length (cm)")
plt.ylabel("Petal Length (cm)")
plt.show()


In [None]:
plt.figure(figsize=(6,4))
plt.hist(df["petal length (cm)"], bins=20, color="skyblue", edgecolor="black")
plt.title("Distribution of Petal Length")
plt.xlabel("Petal Length (cm)")
plt.ylabel("Frequency")
plt.show()

In [None]:
plt.figure(figsize=(6,4))
grouped["sepal length (cm)"].plot(kind="bar", color="orange")
plt.title("Average Sepal Length per Species")
plt.xlabel("Species (0=setosa, 1=versicolor, 2=virginica)")
plt.ylabel("Average Sepal Length (cm)")
plt.show()

In [None]:
plt.figure(figsize=(8,5))
plt.plot(df.index, df["sepal length (cm)"], label="Sepal Length")
plt.title("Sepal Length Trend")
plt.xlabel("Index")
plt.ylabel("Sepal Length (cm)")
plt.legend()
plt.show()

In [None]:
# Grouping by species (target column) and finding mean
grouped = df.groupby("target").mean()
print("Mean values grouped by species:")
grouped

In [None]:
print("Statistical summary of numerical columns:")
df.describe()

In [None]:
print("Dataset info:")
print(df.info())

print("\nMissing values per column:")
print(df.isnull().sum())


In [None]:
# Load the Iris dataset
iris = load_iris(as_frame=True)
df = iris.frame

print("First 5 rows of the dataset:")
df.head()

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris