# **Problem Statement**  
## **10. Build a random forest model using sklearn and explain the output.**

Build a Random Forest model using sklearn to solve a supervised learning problem and explain the model outputs, including:
- Predictions
- Accuracy / evaluation metrics
- Feature importance

### Constraints & Example Inputs/Outputs

### Constraints
- Use sklearn only (no deep learning)
- Dataset must be numeric or encoded
- Random Forest parameters should be configurable
- Must work for both classification and regression
- Train–test split required

Example Input:
```python
X = [[50, 30], [60, 35], [70, 40], [80, 45]]
y = [0, 0, 1, 1]
```

Expected Output:
```python
- Model predictions
- Accuracy score
- Feature importance values
```

### Solution Approach

### What is Random Forest?
Random Forest is an ensemble learning algorithm that:
- Builds multiple decision trees
- Uses bootstrapped samples
- Aggregates predictions via majority voting (classification) or averaging (regression)

### Workflow
1. Prepare dataset
2. Split into train and test sets
3. Train Random Forest model
4. Make predictions
5. Evaluate model
6. Interpret feature importance

### Solution Code

In [1]:
# Approach1: Brute Force Solution (Minimal tuning, default parameters, focus on understanding)
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report

# Dataset
X = np.array([
    [50, 30],
    [60, 35],
    [70, 40],
    [80, 45],
    [90, 50],
    [100, 55]
])

y = np.array([0, 0, 0, 1, 1, 1])

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=42
)

# Random Forest (basic)
rf = RandomForestClassifier(random_state=42)

# Train
rf.fit(X_train, y_train)

# Predict
y_pred = rf.predict(X_test)

# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))


Accuracy: 1.0
              precision    recall  f1-score   support

           0       1.00      1.00      1.00         2

    accuracy                           1.00         2
   macro avg       1.00      1.00      1.00         2
weighted avg       1.00      1.00      1.00         2



### Alternative Solution

In [2]:
# Approach2: Optimized Approach (Controlled parameters, better performance, reproducibility)
rf_optimized = RandomForestClassifier(
    n_estimators=100,
    max_depth=5,
    min_samples_split=2,
    min_samples_leaf=1,
    random_state=42
)

rf_optimized.fit(X_train, y_train)

y_pred_opt = rf_optimized.predict(X_test)

print("Optimized Accuracy:", accuracy_score(y_test, y_pred_opt))


Optimized Accuracy: 1.0


### Explaining the Output

### 1.Accuracy
```python
Accuracy = Correct Predictions / Total Predictions
```
which shows overall performance.

### 2.Classification Report
- Precision: Correct positive predictions
- Recall: Coverage of actual positives
- F1-score: Balance of precision & recall

### 3. Feature Importance
```python
for i, importance in enumerate(rf_optimized.feature_importances_):
    print(f"Feature {i} importance: {importance:.4f}")
```

Interpretation:
- Higher value → more influence on predictions
- Helps with feature selection and explainability


### Alternative Approaches

#### Brute Force Alternatives
- Single Decision Tree
- Logistic Regression

#### Optimized Alternatives
- Gradient Boosting
- XGBoost / LightGBM
- Hyperparameter tuning with GridSearchCV

### Test Case

In [3]:
# Test Case1: New Prediction

new_sample = np.array([[75, 42]])
prediction = rf_optimized.predict(new_sample)

print("Predicted Class:", prediction)


Predicted Class: [0]


In [4]:
# Test Case2: Larger Random Forest

from sklearn.datasets import make_classification

X, y = make_classification(
    n_samples=200,
    n_features=5,
    n_informative=3,
    random_state=42
)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

rf = RandomForestClassifier(n_estimators=200, random_state=42)
rf.fit(X_train, y_train)

print("Accuracy:", accuracy_score(y_test, rf.predict(X_test)))


Accuracy: 0.95


# Expected Outputs

✔ Model trains successfully

✔ Predictions generated

✔ Accuracy improves with optimization

✔ Feature importance sums to 1

✔ Results reproducible with random_state

## Complexity Analysis

### Time Complexity
O(n_estimators × n_samples × log(n_features))

### Space Complexity
O(n_estimators × tree_size)


#### Thank You!!