# Case Study: Bagging and Boosting Approach for Fault Detection in Electrical Systems

In this notebook, we explore **Ensemble Learning** techniques: **Bagging** and **Boosting**. We will use **Decision Trees** as base models and compare their performance on synthetic data. These techniques help improve the performance of weak models by combining them into a strong one.

## Steps:
1. **Generate Synthetic Data**
2. **Bagging with Decision Tree**
3. **Boosting with Decision Tree**
4. **Compare Performance**

### Step 1: Install/load Required Libraries



In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import BaggingClassifier, GradientBoostingClassifier
from sklearn.metrics import accuracy_score

### Step 2: Generate Synthetic Data

We will generate synthetic data for classification, simulating a **fault detection** scenario in electrical systems. This dataset will have 20 features, of which 10 are informative.

In [None]:
# Generate synthetic classification data (simulating electrical system data)
X, y = make_classification(n_samples=1000, n_features=20, n_informative=10, n_classes=2, random_state=42)

# Create a DataFrame
feature_names = [f'Feature_{i+1}' for i in range(X.shape[1])]
df = pd.DataFrame(X, columns=feature_names)
df['Target'] = y  # Add target column

# Display first five rows
df.head()

In [None]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Step 3: Bagging with Decision Tree

Here we apply **Bagging** using **Decision Trees** as the base model. **Bagging** trains multiple models independently on bootstrapped data (random sampling with replacement) and combines their predictions.

In [None]:
# Train a Bagging Classifier with Decision Tree as the base model
bagging = BaggingClassifier(estimator=DecisionTreeClassifier(), n_estimators=100, random_state=42)
bagging.fit(X_train, y_train)

# Predictions and evaluation
bagging_pred = bagging.predict(X_test)
print('Bagging (with Decision Tree) Accuracy: ', accuracy_score(y_test, bagging_pred))


### Step 4: Boosting with Decision Tree

Now, we will use **Boosting** with **Decision Trees** as the base model. In **Boosting**, models are trained sequentially, where each model corrects the errors made by the previous one.

In [None]:
# Train a Gradient Boosting Classifier with Decision Tree as the base model
boosting = GradientBoostingClassifier(n_estimators=100, random_state=42)
boosting.fit(X_train, y_train)

# Predictions and evaluation
boosting_pred = boosting.predict(X_test)
print('Boosting (with Decision Tree) Accuracy: ', accuracy_score(y_test, boosting_pred))

### Step 5: Compare Performance

Let’s compare the performance of **Bagging** and **Boosting** based on accuracy.

In [None]:
# Output comparison
print(f'Bagging (with Decision Tree) Accuracy: {accuracy_score(y_test, bagging_pred):.4f}')
print(f'Boosting (with Decision Tree) Accuracy: {accuracy_score(y_test, boosting_pred):.4f}')

### Conclusion

- **Bagging** reduces **variance** by training multiple models independently, but it does not significantly improve **bias**.
- **Boosting** reduces **bias** by focusing on errors from previous models, but it can be more prone to overfitting compared to **Bagging**.

In our case study on synthetic electrical fault detection, we observed that **Boosting** provided better performance compared to **Bagging**, particularly for more complex patterns in the data.

---

This notebook demonstrates the implementation of **Bagging** and **Boosting** using **Decision Trees** and their performance comparison. You can easily modify the dataset or models to suit real-world electrical engineering problems or other domains as needed.