# Introduction
Stacking, or Stacked Generalization, is an ensemble learning technique that combines multiple classification or regression models to improve predictive performance. In this tutorial, we will explore how to use stacking to predict diabetes onset based on the Pima Indians Diabetes dataset. This dataset includes diagnostic measurements for female patients of Pima Indian heritage, including features like glucose concentration, blood pressure, and BMI.

## Steps

### Step 1: Import Required Libraries
First, import the necessary libraries for data manipulation, model training, and evaluation.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import StackingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.metrics import accuracy_score

### Step 2: Load and Preprocess the Dataset
Load the Pima Indians Diabetes dataset and preprocess it. This includes handling missing values, encoding categorical variables using LabelEncoder, and scaling the features.

In [2]:
# Load the dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.csv"
columns = ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome']
data = pd.read_csv(url, names=columns)

# Handle missing values by replacing 0 with the mean (for columns where 0 is not a valid value)
for col in ['Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI']:
    data[col] = data[col].replace(0, data[col].mean())

# Encode categorical variables
label_encoder = LabelEncoder()
data['Outcome'] = label_encoder.fit_transform(data['Outcome'])

# Split the data into features and target variable
X = data.drop('Outcome', axis=1)
y = data['Outcome']

# Scale the features
scaler = StandardScaler()
X = scaler.fit_transform(X)

### Step 3: Split the Dataset
Split the dataset into training and testing sets to evaluate the performance of the models.

In [8]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=12)

### Step 4: Initialize Base Models and the Meta-Model
Initialize several base models and a meta-model. In this case, we will use a Decision Tree, a Support Vector Machine, a K-Nearest Neighbors classifier, and a Random Forest classifier as base models, and a Logistic Regression as the meta-model.

In [9]:
# Initialize base models
base_models = [
    ('decision_tree', DecisionTreeClassifier(random_state=42)),
    ('svc', SVC(probability=True, random_state=42)),
    ('knn', KNeighborsClassifier()),
    ('random_forest', RandomForestClassifier(random_state=42))
]

# Initialize the meta-model
meta_model = LogisticRegression()

### Step 5: Initialize and Train the Stacking Classifier
Create the StackingClassifier using the base models and the meta-model, and then train it on the training data.

In [None]:
# Initialize the Stacking Classifier
stacking_classifier = StackingClassifier(estimators=base_models, final_estimator=meta_model, cv=5)

# Train the Stacking Classifier
stacking_classifier.fit(X_train, y_train)

### Step 6: Make Predictions
Use the trained Stacking Classifier to make predictions on the test data.

In [11]:
# Make predictions on the test data
predictions = stacking_classifier.predict(X_test)

### Step 7: Evaluate the Model
Calculate the accuracy of the model based on its predictions and print the accuracy.

In [None]:
# Evaluate the model's accuracy
accuracy = accuracy_score(y_test, predictions)
print(f'Stacking Classifier Model Accuracy: {accuracy * 100:.2f}%')