In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_wine
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# Load dataset
wine = load_wine()
X = wine.data
y = wine.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train model
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

# Predict
y_pred = rf.predict(X_test)

# Accuracy
acc = accuracy_score(y_test, y_pred)
print(f"Accuracy: {acc:.2f}")

# Classification report
print(classification_report(y_test, y_pred, target_names=wine.target_names))


Accuracy: 1.00
              precision    recall  f1-score   support

     class_0       1.00      1.00      1.00        19
     class_1       1.00      1.00      1.00        21
     class_2       1.00      1.00      1.00        14

    accuracy                           1.00        54
   macro avg       1.00      1.00      1.00        54
weighted avg       1.00      1.00      1.00        54



### Topic 16 – Random Forest Classification

In this notebook, we use a **Random Forest** classifier to predict wine categories from the Wine dataset.

Steps:
1. Loaded the Wine dataset from `sklearn.datasets`.
2. Split the data into training and test sets.
3. Trained a `RandomForestClassifier` with 100 trees.
4. Evaluated performance with accuracy and classification report.

Advantages of Random Forest:
- Handles high-dimensional data well,
- More robust to overfitting than a single decision tree,
- Can handle missing values and imbalanced data.

Common applications:
- Credit scoring,
- Medical diagnosis,
- Feature ranking and selection.
