1.Can we use Bagging for regression problems.


Yes, **Bagging (Bootstrap Aggregating)** can be used for regression problems.

## What is Bagging?

Bagging is an ensemble learning technique where:

* Multiple models are trained on different random samples (with replacement) of the dataset.
* The final prediction is obtained by combining all model predictions.

## How Bagging Works in Regression

In regression problems:

* Each model predicts a continuous value.
* The final output is the **average** of all predictions.

[
Final\ Prediction = \frac{1}{N} \sum_{i=1}^{N} Prediction_i
]

This averaging reduces variance and improves model stability.


## Advantages of Bagging in Regression

1. Reduces overfitting
2. Decreases variance
3. Improves prediction accuracy
4. Works well with high-variance models like Decision Trees










2.What is the difference between multiple model training and single model training?




##  1. Single Model Training

Single model training means training **only one machine learning model** on the dataset.

###  Characteristics:

* One algorithm is used.
* One model is fitted to the data.
* Final prediction comes from that single model.
* Simple and faster to train.

###  Example:

Training one Decision Tree using **scikit-learn**.

###  Advantages:

* Easy to implement
* Less computational cost
* Faster training

###  Disadvantages:

* Can overfit easily
* May have high bias or high variance
* Less robust

---

## 2. Multiple Model Training

Multiple model training (Ensemble Learning) means training **more than one model** and combining their predictions.

### Characteristics:

* Multiple models are trained.
* Each model learns from data (sometimes different subsets).
* Final prediction is combined (average or voting).

### Example:

**Random Forest** trains many decision trees and averages their predictions.

### Advantages:

* Better accuracy
* Reduces overfitting
* More stable predictions

### Disadvantages:

* More computational cost
* Slower training
* More complex

---

## üîπ Key Differences

| Feature          | Single Model Training | Multiple Model Training |
| ---------------- | --------------------- | ----------------------- |
| Number of models | One                   | Many                    |
| Accuracy         | Moderate              | Usually Higher          |
| Overfitting risk | Higher                | Lower                   |
| Complexity       | Simple                | Complex                 |
| Example          | One Decision Tree     | Random Forest           |




3.Explain the concept of feature randomness in Random Forest?

### Feature Randomness in Random Forest

Feature randomness is one of the **key ideas** behind **Random Forest** that makes it powerful and reduces overfitting.

---

## What is Feature Randomness?

In Random Forest, **not all features are considered at every split** of a decision tree.

Instead:

* At each split,
* A **random subset of features** is selected,
* And the best split is chosen **only from that subset**, not from all features.

This is called **feature randomness** (or feature bagging).

---

## Why is Feature Randomness Important?

If we use all features for every split:

* Many trees would look very similar.
* Strong features would dominate every tree.
* Trees become highly correlated.
* The model may overfit.

Feature randomness helps by:

 Making trees different from each other
 Reducing correlation between trees
 Improving generalization
 Reducing overfitting



## How It Works (Example)

Suppose your dataset has **10 features**.

Instead of checking all 10 features at each split:

* Random Forest might randomly select only **3 features**
* Then choose the best split among those 3

At the next split, it may select a **different random 3 features**.

This randomness ensures each tree learns different patterns.



4.What is OOB (Out-of-Bag) Score?

Out-of-Bag (OOB) Score is a validation technique used in Random Forest to evaluate model performance without using a separate validation dataset.

In Random Forest, each tree is trained on a bootstrap sample (random sampling with replacement). Some data points are not selected in that sample ‚Äî these are called Out-of-Bag (OOB) samples.

The model uses these OOB samples to test the corresponding tree, and the average performance across all trees is called the OOB score.

5.How can you measure the importance of features in a Random Forest model?

In **Random Forest**, feature importance measures how much each feature contributes to making accurate predictions. It can be measured in the following ways:

---

### 1Ô∏è‚É£ Mean Decrease in Impurity (MDI)

Also called **Gini Importance** (for classification).

* Each time a feature is used to split a node, it reduces impurity (Gini index or variance).
* The total reduction in impurity caused by that feature across all trees is calculated.
* Features with higher total reduction are considered more important.

‚úî Built-in method in Random Forest
‚úî Fast to compute

---

### 2Ô∏è‚É£ Permutation Importance (Mean Decrease in Accuracy)

* Randomly shuffle the values of one feature.
* Measure how much the model accuracy decreases.
* If accuracy drops significantly, the feature is important.

‚úî More reliable
‚úî Works for both classification and regression



6. Explain the working principle of a Bagging Classifier.



A **Bagging Classifier** (Bootstrap Aggregating) is an ensemble learning method that improves model performance by combining multiple base classifiers.

The working principle is as follows:

1Ô∏è‚É£ **Bootstrap Sampling**
From the original dataset, multiple new training datasets are created by random sampling **with replacement**. Each of these datasets is called a bootstrap sample.

2Ô∏è‚É£ **Training Multiple Models**
A separate base classifier (usually a Decision Tree) is trained on each bootstrap sample independently.

3Ô∏è‚É£ **Aggregation (Voting)**
For classification problems, each model makes a prediction.
The final output is decided by **majority voting** ‚Äî the class that receives the highest number of votes is selected as the final prediction.



7.How do you evaluate a Bagging Classifier‚Äôs performance?



The performance of a **Bagging Classifier** is evaluated using standard classification evaluation metrics. The common methods are:

---

### 1Ô∏è‚É£ Accuracy

Measures the proportion of correctly predicted instances out of total instances.

[
Accuracy = \frac{Correct\ Predictions}{Total\ Predictions}
]

---

### 2Ô∏è‚É£ Confusion Matrix

Shows the number of:

* True Positives (TP)
* True Negatives (TN)
* False Positives (FP)
* False Negatives (FN)

It helps understand model errors clearly.

---

### 3Ô∏è‚É£ Precision, Recall, and F1-Score

* **Precision** ‚Üí Correct positive predictions out of total predicted positives
* **Recall** ‚Üí Correct positive predictions out of actual positives
* **F1-Score** ‚Üí Harmonic mean of Precision and Recall

Useful when dealing with imbalanced datasets.

---

### 4Ô∏è‚É£ ROC-AUC Score

Measures the model‚Äôs ability to distinguish between classes.
Higher AUC indicates better classification performance.

---

### 5Ô∏è‚É£ Out-of-Bag (OOB) Score

If bootstrap sampling is used (like in bagging), the OOB score can be used as an internal validation method without a separate test set.




8.How does a Bagging Regressor work?

A **Bagging Regressor** (Bootstrap Aggregating for regression) is an ensemble learning technique used to improve the accuracy and stability of regression models.

Its working principle is as follows:

---

### 1Ô∏è‚É£ Bootstrap Sampling

From the original dataset, multiple new training datasets are created using **random sampling with replacement**. Each dataset is called a bootstrap sample.

---

### 2Ô∏è‚É£ Training Multiple Base Regressors

A separate regression model (commonly a Decision Tree Regressor) is trained independently on each bootstrap sample.

---

### 3Ô∏è‚É£ Aggregation (Averaging)

For a new input, each trained regressor predicts an output value.
The final prediction is calculated as the **average of all individual predictions**.

[
Final\ Prediction = \frac{Sum\ of\ all\ model\ predictions}{Number\ of\ models}
]

---

###  Key Advantages:

* Reduces variance
* Decreases overfitting
* Improves prediction stability
* Works well with high-variance models




9.What is the main advantage of ensemble techniques?


The main advantage of **ensemble techniques** is that they **improve prediction accuracy and model reliability** by combining the outputs of multiple models instead of relying on a single model.

By aggregating predictions from different learners, ensemble methods reduce errors caused by **bias, variance, or noise**, and help prevent **overfitting**. As a result, ensemble models usually perform better and generalize more effectively than individual models.


10. What is the main challenge of ensemble methods?


The main challenge of **ensemble methods** is their **increased computational complexity and reduced interpretability** compared to single models.

Since ensemble techniques combine multiple models:

* They require **more training time and memory**.
* They are **computationally expensive**.
* The final model becomes harder to interpret and explain.

Because many models are combined, it is difficult to understand how individual features influence the final prediction.



11. Explain the key idea behind ensemble techniques.


The key idea behind **ensemble techniques** is to **combine multiple models to produce a better and more accurate prediction than a single model**.

Instead of relying on one model, ensemble methods train several base learners and combine their outputs using techniques such as **voting** (for classification) or **averaging** (for regression).

This approach works on the principle that a group of weak or moderate learners, when combined properly, can form a strong learner. It helps in reducing bias, variance, and overfitting, leading to improved generalization performance.



12. What is a Random Forest Classifier?


A **Random Forest Classifier** is a supervised machine learning algorithm used for classification tasks. It is an ensemble method that builds multiple decision trees and combines their predictions to produce a final output.

In **Random Forest**, each tree is trained on a random subset of the training data (bootstrap sampling), and at each split, a random subset of features is considered. This randomness helps create diverse trees.

For classification problems, each tree gives a class prediction, and the final output is determined by **majority voting**.

###  Key Advantages:

* Reduces overfitting compared to a single decision tree
* Handles large datasets and high-dimensional data
* Provides feature importance
* High accuracy and robustness





13.What are the main types of ensemble techniques?


The main types of ensemble techniques are:

---

### 1Ô∏è‚É£ Bagging (Bootstrap Aggregating)

In **Bagging**, multiple models are trained independently on different bootstrap samples of the dataset.
The final prediction is made by:

* **Majority voting** (classification)
* **Averaging** (regression)

Example: **Random Forest**

‚úî Reduces variance
‚úî Prevents overfitting

---

### 2Ô∏è‚É£ Boosting

In **Boosting**, models are trained sequentially, where each new model focuses on correcting the errors of the previous one.

Examples:

* **AdaBoost**
* **Gradient Boosting**
* **XGBoost**

‚úî Reduces bias
‚úî Improves accuracy

---

### 3Ô∏è‚É£ Stacking (Stacked Generalization)

In **Stacking**, multiple different models are trained, and their predictions are combined using another model called a **meta-learner**.

‚úî Combines strengths of different algorithms
‚úî Often gives very high performance




14.What is ensemble learning in machine learning?


**Ensemble learning** is a machine learning technique in which multiple models (called base learners) are combined to improve overall prediction performance.

Instead of relying on a single model, ensemble learning aggregates the predictions of several models using methods like:

* **Voting** (for classification)
* **Averaging** (for regression)

The main idea is that a group of models working together can produce more accurate and stable results than an individual model.

A popular example of ensemble learning is **Random Forest**, which combines multiple decision trees to make final predictions.

###  Advantages:

* Improves accuracy
* Reduces overfitting
* Enhances model stability
* Better generalization



15.When should we avoid using ensemble methods?


Although ensemble methods improve accuracy, they are not always the best choice. We should avoid using ensemble methods in the following situations:

---

### 1Ô∏è‚É£ When Model Interpretability is Important

Ensemble models combine multiple learners, making them complex and difficult to interpret.
If explainability is required (e.g., in healthcare or finance), simpler models may be preferred.

---

### 2Ô∏è‚É£ Limited Computational Resources

Ensemble methods require more memory, processing power, and training time compared to single models.

---

### 3Ô∏è‚É£ Small Datasets

With very small datasets, ensemble methods may not provide significant improvement and can sometimes lead to overfitting.

---

### 4Ô∏è‚É£ Real-Time or Low-Latency Systems

If predictions must be made very quickly, complex ensembles may slow down performance.

---

### 5Ô∏è‚É£ When a Simple Model Performs Well

If a single model already provides high accuracy, using an ensemble may add unnecessary complexity.



16.How does Bagging help in reducing overfitting?


Bagging (Bootstrap Aggregating) reduces overfitting by lowering the **variance** of a machine learning model. It works by generating multiple bootstrap samples (random samples with replacement) from the original dataset and training separate models on each sample.

Since each model is trained on slightly different data, they learn different patterns and make different errors. When their predictions are combined‚Äîusing averaging for regression or majority voting for classification‚Äîthe individual errors tend to cancel out.

This aggregation makes the final model more stable and less sensitive to noise in the training data, thereby reducing overfitting and improving generalization performance on unseen data.


17.Why is Random Forest better than a single Decision Tree?


**Random Forest** is better than a single Decision Tree because it reduces overfitting and improves prediction accuracy.

---

### 1Ô∏è‚É£ Reduces Overfitting

A single Decision Tree can easily overfit the training data, especially if it is deep.
Random Forest builds **multiple decision trees** and combines their predictions, which reduces variance and makes the model more stable.

---

### 2Ô∏è‚É£ Better Generalization

Since Random Forest averages the results of many trees, it performs better on unseen (test) data compared to a single tree.

---

### 3Ô∏è‚É£ Feature Randomness

Random Forest selects a **random subset of features** at each split.
This makes trees less correlated with each other and improves overall performance.

---

###  4Ô∏è‚É£ Higher Accuracy

By combining predictions from multiple trees (majority voting for classification, averaging for regression), Random Forest usually achieves higher accuracy than a single Decision Tree.

---

### 5Ô∏è‚É£ Robust to Noise

If one tree makes an incorrect prediction due to noise, other trees can correct it during aggregation.



18.What is the role of bootstrap sampling in Bagging?
### Role of Bootstrap Sampling in Bagging

Bootstrap sampling is a key component of **Bagging (Bootstrap Aggregating)**. It involves creating multiple training datasets by randomly sampling the original dataset **with replacement**.

---

### 1Ô∏è‚É£ Creates Data Diversity

Each bootstrap sample contains:

* Some original observations repeated
* Some observations left out

This ensures that every model is trained on slightly different data, creating diversity among models.

---

### 2Ô∏è‚É£ Reduces Model Correlation

Because each model sees different data, they learn different patterns.
This reduces correlation between models, which improves the effectiveness of averaging.

---

### 3Ô∏è‚É£ Reduces Variance

Since models are trained on varied samples, their errors differ.
When predictions are combined (averaging or majority voting), random errors cancel out, reducing variance and overfitting.

---

### 4Ô∏è‚É£ Enables Out-of-Bag (OOB) Evaluation

The data points not included in a bootstrap sample (called Out-of-Bag samples) can be used to estimate model performance without needing a separate validation set.



19.What are some real-world applications of ensemble techniques?


### Real-World Applications of Ensemble Techniques

Ensemble techniques (like Bagging, Boosting, and Random Forest) are widely used because they improve accuracy and reduce overfitting. Some important real-world applications are:

---

### 1Ô∏è‚É£ Finance ‚Äì Fraud Detection

Banks use ensemble models to detect fraudulent transactions.
By combining multiple models, they can better identify unusual patterns and reduce false positives.

---

### 2Ô∏è‚É£ Healthcare ‚Äì Disease Prediction

Ensemble methods help in diagnosing diseases like cancer or heart disease by analyzing medical data, improving prediction reliability.

---

### 3Ô∏è‚É£ E-Commerce ‚Äì Recommendation Systems

Companies like Amazon use ensemble techniques to recommend products based on user behavior and purchase history.

---

### 4Ô∏è‚É£ Search Engines & Ranking

Search engines such as Google use ensemble-based algorithms to improve search result ranking and relevance.

---

### 5Ô∏è‚É£ Credit Scoring

Financial institutions use ensemble models to assess credit risk and decide whether to approve loans.

---

### 6Ô∏è‚É£ Image & Speech Recognition

Ensemble models improve accuracy in applications like facial recognition and voice assistants (e.g., Siri).

---

### 7Ô∏è‚É£ Stock Market Prediction

Boosting and other ensemble techniques are used to analyze historical stock data and predict price movements.

---



20.What is the difference between Bagging and Boosting?
###  Difference Between Bagging and Boosting

Bagging and Boosting are both ensemble learning techniques, but they work in different ways.

| Basis                  | Bagging                                                        | Boosting                                                               |
| ---------------------- | -------------------------------------------------------------- | ---------------------------------------------------------------------- |
| **Full Form**          | Bootstrap Aggregating                                          | ‚Äî                                                                      |
| **Training Method**    | Models are trained **independently**                           | Models are trained **sequentially**                                    |
| **Data Sampling**      | Uses **bootstrap sampling** (random sampling with replacement) | Uses full dataset but gives **higher weight to misclassified samples** |
| **Goal**               | Reduces **variance**                                           | Reduces **bias and variance**                                          |
| **Overfitting**        | Less prone to overfitting                                      | Can overfit if not tuned properly                                      |
| **Model Combination**  | Averaging (regression) / Majority voting (classification)      | Weighted sum of models                                                 |
| **Example Algorithms** | Random Forest                                                  | AdaBoost, Gradient Boosting                                            |


21.Train a Bagging Classifier using Decision Trees on a sample dataset and print model accuracy?


In [1]:
# Import required libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.metrics import accuracy_score

# Load sample dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Create base model (Decision Tree)
dt = DecisionTreeClassifier()

# Create Bagging Classifier
bagging_model = BaggingClassifier(
    estimator=dt,
    n_estimators=50,
    random_state=42
)

# Train model
bagging_model.fit(X_train, y_train)

# Make predictions
y_pred = bagging_model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

print("Model Accuracy:", accuracy)

Model Accuracy: 0.9590643274853801


22.Train a Bagging Regressor using Decision Trees and evaluate using Mean Squared Error (MSE).

In [2]:
# Import required libraries
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import BaggingRegressor
from sklearn.metrics import mean_squared_error

# Load dataset
data = load_diabetes()
X = data.data
y = data.target

# Split dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Create base model (Decision Tree Regressor)
dt = DecisionTreeRegressor()

# Create Bagging Regressor
bagging_model = BaggingRegressor(
    estimator=dt,
    n_estimators=50,
    random_state=42
)

# Train model
bagging_model.fit(X_train, y_train)

# Make predictions
y_pred = bagging_model.predict(X_test)

# Calculate Mean Squared Error
mse = mean_squared_error(y_test, y_pred)

print("Mean Squared Error (MSE):", mse)

Mean Squared Error (MSE): 2987.0073593984966


23.Train a Random Forest Classifier on the Breast Cancer dataset and print feature importance scores.

In [3]:
# Import required libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import pandas as pd

# Load dataset
data = load_breast_cancer()
X = data.data
y = data.target
feature_names = data.feature_names

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Train Random Forest Classifier
rf = RandomForestClassifier(
    n_estimators=100,
    random_state=42
)

rf.fit(X_train, y_train)

# Get feature importance scores
importances = rf.feature_importances_

# Create DataFrame for better visualization
feature_importance_df = pd.DataFrame({
    "Feature": feature_names,
    "Importance": importances
}).sort_values(by="Importance", ascending=False)

# Print feature importance scores
print(feature_importance_df)

                    Feature  Importance
7       mean concave points    0.141934
27     worst concave points    0.127136
23               worst area    0.118217
6            mean concavity    0.080557
20             worst radius    0.077975
22          worst perimeter    0.074292
2            mean perimeter    0.060092
3                 mean area    0.053810
26          worst concavity    0.041080
0               mean radius    0.032312
13               area error    0.029538
21            worst texture    0.018786
25        worst compactness    0.017539
10             radius error    0.016435
28           worst symmetry    0.012929
12          perimeter error    0.011770
24         worst smoothness    0.011769
1              mean texture    0.011064
5          mean compactness    0.009216
19  fractal dimension error    0.007135
29  worst fractal dimension    0.006924
4           mean smoothness    0.006223
14         smoothness error    0.005881
16          concavity error    0.005816


24.Train a Random Forest Regressor and compare its performance with a single Decision Tree.


In [4]:
# Import required libraries
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

# Load dataset
data = load_diabetes()
X = data.data
y = data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Train Decision Tree Regressor
dt = DecisionTreeRegressor(random_state=42)
dt.fit(X_train, y_train)
dt_pred = dt.predict(X_test)
dt_mse = mean_squared_error(y_test, dt_pred)

# Train Random Forest Regressor
rf = RandomForestRegressor(
    n_estimators=100,
    random_state=42
)
rf.fit(X_train, y_train)
rf_pred = rf.predict(X_test)
rf_mse = mean_squared_error(y_test, rf_pred)

# Print results
print("Decision Tree MSE:", dt_mse)
print("Random Forest MSE:", rf_mse)

Decision Tree MSE: 5697.789473684211
Random Forest MSE: 2859.641982706767


25.Compute the Out-of-Bag (OOB) Score for a Random Forest Classifier.


In [5]:
# Import required libraries
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier

# Load dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Create Random Forest with OOB enabled
rf = RandomForestClassifier(
    n_estimators=100,
    oob_score=True,      # Enable OOB scoring
    bootstrap=True,      # Must be True for OOB
    random_state=42
)

# Train model
rf.fit(X, y)

# Print OOB Score
print("Out-of-Bag (OOB) Score:", rf.oob_score_)

Out-of-Bag (OOB) Score: 0.961335676625659


26.Train a Bagging Classifier using SVM as a base estimator and print accuracy.


In [6]:
# Import required libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.ensemble import BaggingClassifier
from sklearn.metrics import accuracy_score

# Load dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Create base model (SVM)
svm = SVC(kernel='rbf', probability=True)

# Create Bagging Classifier with SVM
bagging_model = BaggingClassifier(
    estimator=svm,
    n_estimators=10,
    random_state=42
)

# Train model
bagging_model.fit(X_train, y_train)

# Make predictions
y_pred = bagging_model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

print("Model Accuracy:", accuracy)

Model Accuracy: 0.9473684210526315


27.Train a Random Forest Classifier with different numbers of trees and compare accuracy.


In [7]:
# Import required libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

# Load dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Different numbers of trees
n_trees = [10, 50, 100, 200]

# Train and compare models
for n in n_trees:
    rf = RandomForestClassifier(
        n_estimators=n,
        random_state=42
    )
    rf.fit(X_train, y_train)
    y_pred = rf.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)

    print(f"Number of Trees: {n}, Accuracy: {accuracy:.4f}")

Number of Trees: 10, Accuracy: 0.9649
Number of Trees: 50, Accuracy: 0.9708
Number of Trees: 100, Accuracy: 0.9708
Number of Trees: 200, Accuracy: 0.9708


28.Train a Bagging Classifier using Logistic Regression as a base estimator and print AUC score.


In [8]:
# Import required libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import BaggingClassifier
from sklearn.metrics import roc_auc_score

# Load dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Base model (Logistic Regression)
log_reg = LogisticRegression(max_iter=5000)

# Bagging Classifier with Logistic Regression
bagging_model = BaggingClassifier(
    estimator=log_reg,
    n_estimators=20,
    random_state=42
)

# Train model
bagging_model.fit(X_train, y_train)

# Predict probabilities
y_prob = bagging_model.predict_proba(X_test)[:, 1]

# Compute AUC Score
auc = roc_auc_score(y_test, y_prob)

print("AUC Score:", auc)

AUC Score: 0.9976484420928865


29.Train a Random Forest Regressor and analyze feature importance scores.


In [9]:
# Import required libraries
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
import pandas as pd

# Load dataset
data = load_diabetes()
X = data.data
y = data.target
feature_names = data.feature_names

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Train Random Forest Regressor
rf = RandomForestRegressor(
    n_estimators=100,
    random_state=42
)

rf.fit(X_train, y_train)

# Get feature importance scores
importances = rf.feature_importances_

# Create DataFrame for better interpretation
feature_importance_df = pd.DataFrame({
    "Feature": feature_names,
    "Importance": importances
}).sort_values(by="Importance", ascending=False)

# Print feature importance
print(feature_importance_df)

  Feature  Importance
2     bmi    0.400000
8      s5    0.166602
3      bp    0.104839
9      s6    0.071358
6      s3    0.061730
0     age    0.058633
4      s1    0.049191
5      s2    0.047138
7      s4    0.029427
1     sex    0.011082


30.Train an ensemble model using both Bagging and Random Forest and compare accuracy.


In [10]:
# Import required libraries
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier, RandomForestClassifier
from sklearn.metrics import accuracy_score

# Load dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# -------------------------
# 1. Bagging Classifier
# -------------------------
dt = DecisionTreeClassifier(random_state=42)

bagging_model = BaggingClassifier(
    estimator=dt,
    n_estimators=100,
    random_state=42
)

bagging_model.fit(X_train, y_train)
bagging_pred = bagging_model.predict(X_test)
bagging_acc = accuracy_score(y_test, bagging_pred)

# -------------------------
# 2. Random Forest Classifier
# -------------------------
rf_model = RandomForestClassifier(
    n_estimators=100,
    random_state=42
)

rf_model.fit(X_train, y_train)
rf_pred = rf_model.predict(X_test)
rf_acc = accuracy_score(y_test, rf_pred)

# Print results
print("Bagging Accuracy:", bagging_acc)
print("Random Forest Accuracy:", rf_acc)

Bagging Accuracy: 0.9590643274853801
Random Forest Accuracy: 0.9707602339181286
