<a href="https://colab.research.google.com/github/poovarasansivakumar2003/Marvel_Batch_4_works/blob/main/Task_3_Ensemble_techniques.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Ensemble Techniques:

##What are Ensemble Techniques?

Ensemble techniques in machine learning involve combining multiple base models to produce a single, powerful model. The idea is to leverage the strengths of various models to achieve better performance than any single model alone. Ensemble methods can reduce overfitting, improve accuracy, and provide more robust predictions.

##Types of Ensemble Methods

**Bagging (Bootstrap Aggregating)**: Involves training multiple instances of the same model on different subsets of the training data (with replacement).
Each model votes, and the final prediction is based on the majority vote or average.
Example: Random Forest.

**Boosting**: Involves training multiple models sequentially, each trying to correct the errors of the previous one.
Models are trained on the weighted data, giving more focus to incorrectly predicted instances.
Example: AdaBoost, Gradient Boosting.

**Stacking**: Involves training multiple models (base learners) and then using another model (meta-learner) to combine their predictions.
The base models are trained on the initial dataset, and the meta-learner is trained on their predictions.

**Voting**: Combines the predictions of multiple models by averaging (for regression) or taking a majority vote (for classification).
Models can be assigned different weights based on their performance.

**Blending**: Similar to stacking but involves a holdout validation set instead of cross-validation for training the meta-learner.

##Advantages of Ensemble Methods

**Improved Accuracy**: By combining multiple models, ensemble methods can achieve higher accuracy than individual models.

**Robustness**: They are less likely to overfit the training data and generalize better to unseen data.

**Versatility**: They can be applied to both classification and regression tasks.

##Implementation

Applying Ensemble Techniques on the Titanic Dataset

The Titanic dataset is a well-known dataset for binary classification tasks (survival prediction). We will demonstrate the use of various ensemble techniques on this dataset.

Let's start with importing necessary libraries and loading the dataset.

In [2]:
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier, VotingClassifier
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder

# Load the Titanic dataset
url = 'https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv'
data = pd.read_csv(url)

# Data preprocessing
data = data.drop(['Cabin', 'Ticket', 'Name'], axis=1)
data['Age'].fillna(data['Age'].median(), inplace=True)
data['Embarked'].fillna(data['Embarked'].mode()[0], inplace=True)
data['Fare'].fillna(data['Fare'].median(), inplace=True)

# Encode categorical features
label_encoder = LabelEncoder()
data['Sex'] = label_encoder.fit_transform(data['Sex'])
data['Embarked'] = label_encoder.fit_transform(data['Embarked'])

# Define features and target
X = data.drop(['Survived'], axis=1)
y = data['Survived']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define base learners
clf1 = RandomForestClassifier(n_estimators=100, random_state=42)
clf2 = AdaBoostClassifier(n_estimators=100, random_state=42)
clf3 = GradientBoostingClassifier(n_estimators=100, random_state=42)

# Voting classifier (hard voting)
voting_clf = VotingClassifier(estimators=[
    ('rf', clf1),
    ('ada', clf2),
    ('gb', clf3)], voting='hard')

# Train the voting classifier
voting_clf.fit(X_train, y_train)

# Predict and evaluate
y_pred = voting_clf.predict(X_test)
print(f'Voting Classifier Accuracy: {accuracy_score(y_test, y_pred):.4f}')


Voting Classifier Accuracy: 0.8101


##Explanation

**Data Preprocessing**:
<ul>
<li>Dropped unnecessary columns.</li>
<li>Filled missing values.</li>
<li>Encoded categorical variables.</li>
</ul>

**Base Learners**:
<ul>
<li>Random Forest, AdaBoost, and Gradient Boosting classifiers were chosen as base learners.</li>
</ul>

**Voting Classifier**:
<ul>
<li>Combined the predictions of the base learners using hard voting.</li>
<li>Trained the voting classifier on the training data.</li>
<li>Evaluated its performance on the test data.</li>
</ul>