<a href="https://colab.research.google.com/github/ranjithdurgunala/Data-Analytics-lab/blob/main/Lab_6_Random_forest_classifier.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Random Forest Classifier**

Random Forest Classifier is a supervised machine learning algorithm used for classification problems.

It works by building many decision trees and combining their results to make a more accurate and stable prediction.

Instead of relying on a single decision tree (which can overfit), Random Forest uses a “forest” of trees and takes a majority vote from them.

**Step 1: Import Required Libraries**

In [1]:
import pandas as pd
import numpy as np

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

**Step 2: Load the Dataset**

In [2]:
# Load dataset
iris = load_iris()

# Create DataFrame
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = pd.Series(iris.target, name="target")

print(X.head())
print(y.head())

   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)
0                5.1               3.5                1.4               0.2
1                4.9               3.0                1.4               0.2
2                4.7               3.2                1.3               0.2
3                4.6               3.1                1.5               0.2
4                5.0               3.6                1.4               0.2
0    0
1    0
2    0
3    0
4    0
Name: target, dtype: int64


**Step 3: Understand the Data (Optional but Good Practice)**

In [3]:
print("Features shape:", X.shape)
print("Target shape:", y.shape)
print("Target classes:", np.unique(y))

Features shape: (150, 4)
Target shape: (150,)
Target classes: [0 1 2]


**Step 4: Split Data into Training and Testing Sets**

In [4]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print("Training samples:", X_train.shape[0])
print("Testing samples:", X_test.shape[0])

Training samples: 120
Testing samples: 30


**Step 5: Create the Random Forest Classifier Model**

In [5]:
rf_model = RandomForestClassifier(
    n_estimators=100,   # number of trees
    random_state=42
)

**Step 6: Train (Fit) the Model**

In [6]:
rf_model.fit(X_train, y_train)
print("Model trained successfully!")

Model trained successfully!


**Step 7: Make Predictions**

In [7]:
y_pred = rf_model.predict(X_test)

**Step 8: Evaluate the Model**

In [8]:
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)

print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))

print("\nClassification Report:")
print(classification_report(y_test, y_pred))

Accuracy: 1.0

Confusion Matrix:
[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]

Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       1.00      1.00      1.00         9
           2       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



**Step 9: Predict for a New Sample**

In [9]:
new_sample = [[5.1, 3.5, 1.4, 0.2]]  # example feature values

prediction = rf_model.predict(new_sample)
print("Predicted class:", prediction[0])
print("Predicted class name:", iris.target_names[prediction[0]])

Predicted class: 0
Predicted class name: setosa


