# Machine Learning Basics: Iris Classification

## Overview
This notebook demonstrates a simple machine learning workflow using the famous Iris dataset. We'll build a model that can automatically identify iris flower species based on physical measurements.

## The Problem
Imagine you're a botanist who finds an iris flower and measures:
- Sepal length (cm)
- Sepal width (cm)
- Petal length (cm)
- Petal width (cm)

**Goal:** Predict which of the 3 iris species it belongs to:
- Iris Setosa
- Iris Versicolor
- Iris Virginica

## Why Machine Learning?
Instead of manually creating rules, we'll let the model **learn patterns** from 150 known examples, then use those patterns to classify new flowers automatically.

## Dataset
- **Size:** 150 flowers (50 of each species)
- **Features:** 4 measurements per flower
- **Target:** Species classification (3 classes)
- **Source:** Built into scikit-learn (no external files needed)

#### Descriptive statistics on the dataset

In [None]:
# Import required libraries
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report
import matplotlib.pyplot as plt
import scipy

# Load the Iris dataset
iris = load_iris()
X = iris.data      # Features: the 4 measurements
y = iris.target    # Labels: species (0, 1, or 2)

# Create a DataFrame for better visualization
df = pd.DataFrame(X, columns=iris.feature_names)
df['species'] = iris.target_names[y]

# Display basic information
print(f"Dataset shape: {X.shape}")
print(f"Number of samples: {len(X)}")
print(f"Number of features: {X.shape[1]}")
print(f"Species: {iris.target_names}")
print(df.describe())
print(df.groupby('species').size())
print("\nDistribution of length values")
plt.figure(figsize=(12, 8))
df.hist()
plt.tight_layout()
plt.show()


### The first rows of the dataset

In [None]:
df.head(20)
