# What is Classification?

**Classification** is a supervised machine learning task where the goal is to predict the category (class) of new observations based on past data. Each data point is assigned a label, and the model learns to distinguish between these labels.

## Examples of Classification Problems

- **Email Spam Detection:** Classify emails as "spam" or "not spam".
- **Image Recognition:** Identify objects in images (e.g., cat, dog, car).
- **Medical Diagnosis:** Predict if a patient has a disease (e.g., "positive" or "negative").

## How Classification Works

1. **Training Data:** You provide the model with labeled examples.
2. **Learning:** The model finds patterns that distinguish the classes.
3. **Prediction:** For new, unlabeled data, the model predicts the class.

## Common Classification Algorithms

- **Logistic Regression**
- **Decision Trees**
- **Random Forests**
- **Support Vector Machines (SVM)**
- **K-Nearest Neighbors (KNN)**
- **Neural Networks**

## Example: Iris Flower Classification

Suppose you have measurements of iris flowers and want to classify them into species (Setosa, Versicolor, Virginica).

```python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

# Load data
iris = load_iris()
X = iris.data
y = iris.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train classifier
clf = RandomForestClassifier()
clf.fit(X_train, y_train)

# Predict and evaluate
accuracy = clf.score(X_test, y_test)
print("Accuracy:", accuracy)
```

## Summary

- **Classification** assigns labels to data points.
- It’s used in many real-world applications.
- Many algorithms are available

In [1]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

# Load data
iris = load_iris()
X = iris.data
y = iris.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train classifier
clf = RandomForestClassifier()
clf.fit(X_train, y_train)

# Predict and evaluate
accuracy = clf.score(X_test, y_test)
print("Accuracy:", accuracy)

Accuracy: 1.0
