### Iris Dataset — Assignment Notebook

- **Author**: Mridul M Kumar
- **Course**:-LAB
- **Date**:- 28-08-2025

This notebook performs basic data loading and inspection on the classic Iris dataset.


### Assignment Brief

Complete the following tasks:
1. Load dataset
2. Print feature names
3. Print class names
4. Print shape of data
5. Correct/clean dataset
6. Map species names to labels 0/1/2
7. Display first five rows


### 1. Load Dataset


In [48]:
# Import libraries and load the Iris dataset
# The dataset contains 150 samples and 4 numeric features
import pandas as pd
from sklearn import datasets

iris = datasets.load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = pd.Series(iris.target, name='species_label')

# Combine features and target into one DataFrame for convenience
df = pd.concat([X, y], axis=1)
print("Loaded Iris dataset with", df.shape[0], "rows and", df.shape[1], "columns.")


Loaded Iris dataset with 150 rows and 5 columns.


### 2. Print Feature Names


In [49]:
# Display the feature (column) names provided by scikit-learn
print("Feature names:")
for name in iris.feature_names:
    print("-", name)


Feature names:
- sepal length (cm)
- sepal width (cm)
- petal length (cm)
- petal width (cm)


### 3. Print Class Names


In [50]:
# Display the target class names (species names)
print("Class names:")
for name in iris.target_names:
    print("-", name)


Class names:
- setosa
- versicolor
- virginica


### 4. Print Shape of Data


In [51]:
# Show the dimensions of features, target, and combined DataFrame
print("Data shape (features X):", X.shape)
print("Target shape (y):", y.shape)
print("Combined DataFrame shape (df):", df.shape)


Data shape (features X): (150, 4)
Target shape (y): (150,)
Combined DataFrame shape (df): (150, 5)


### 5. Correct / Clean Dataset


In [52]:
# Validate data: check missing values and data types
missing_counts = df.isna().sum()
print("Missing values per column:\n", missing_counts)

# Standardize column names: remove units and spaces for consistency
df.columns = [
    col.replace(" (cm)", "").replace(" ", "_") if isinstance(col, str) else col
    for col in df.columns
]

print("\nData types after load:\n", df.dtypes)


Missing values per column:
 sepal length (cm)    0
sepal width (cm)     0
petal length (cm)    0
petal width (cm)     0
species_label        0
dtype: int64

Data types after load:
 sepal_length     float64
sepal_width      float64
petal_length     float64
petal_width      float64
species_label      int64
dtype: object


### 6. Map Species Names to Labels (0/1/2)


In [53]:
# Map numeric labels to species names and back to labels 0/1/2
# scikit-learn iris.target is already 0/1/2, but we demonstrate mapping
label_to_species = {i: name for i, name in enumerate(iris.target_names)}
species_to_label = {name: i for i, name in label_to_species.items()}

# Add a readable species column
df['species_name'] = df['species_label'].map(label_to_species)

# Ensure mapping back to labels works as expected
df['species_label_mapped'] = df['species_name'].map(species_to_label)

# Convert NumPy scalars to Python types for cleaner display
unique_labels = sorted(map(int, df['species_label'].unique()))
unique_species = sorted(map(str, df['species_name'].unique()))

print("Unique labels:", unique_labels)
print("Unique species names:", unique_species)


Unique labels: [0, 1, 2]
Unique species names: ['setosa', 'versicolor', 'virginica']


### 7. Display First Five Rows


In [54]:
# Show the first five rows of the processed DataFrame
df.head()


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species_label,species_name,species_label_mapped
0,5.1,3.5,1.4,0.2,0,setosa,0
1,4.9,3.0,1.4,0.2,0,setosa,0
2,4.7,3.2,1.3,0.2,0,setosa,0
3,4.6,3.1,1.5,0.2,0,setosa,0
4,5.0,3.6,1.4,0.2,0,setosa,0
