# Palmer Penguins Dataset

This dataset contains size measurements, island locations, and species for three penguin species observed in the Palmer Archipelago, Antarctica. 
It includes variables like bill length, body mass, sex, and species — making it a great dataset for practicing data preprocessing, cleaning, 
and visualizations using `pandas`, `matplotlib`, and `seaborn`.

Let's start with installing the required Python packages.

In [None]:
%pip install pandas
%pip install seaborn

In [None]:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
# Load the dataset
df = sns.load_dataset("penguins")

In [None]:
# Quick overview
df.head()

In [None]:
# Check for datatypes and NaN values
df.info()

In [None]:
# Check missing values
print(df.isnull().sum())

In [None]:
# Drop rows with missing values (simplest option for this example)
df_clean = df.dropna()

In [None]:
# Histogram of bill length
plt.hist(df_clean["bill_length_mm"], bins=20, color="skyblue", edgecolor="black")
plt.title("Distribution of Bill Length")
plt.xlabel("Bill Length (mm)")
plt.ylabel("Frequency")
plt.show()

In [None]:
# Box plot of body mass by species
sns.boxplot(data=df_clean, x="species", y="body_mass_g", palette="pastel")
plt.title("Body Mass Distribution by Species")
plt.ylabel("Body Mass (g)")
plt.show()

In [None]:
# Scatterplot to explore relationships
sns.scatterplot(data=df_clean, x="bill_length_mm", y="bill_depth_mm", hue="species", style="sex")
plt.title("Bill Length vs. Depth by Species and Sex")
plt.xlabel("Bill Length (mm)")
plt.ylabel("Bill Depth (mm)")
plt.legend()
plt.show()