# Predict campaign response

In this example, we'll be using a dataset containing customer information and their response to a marketing campaign. 

The goal is to predict if a customer will respond positively to a future campaign based on their features.

# Setup

In [7]:
import pandas as pd
from joblib import dump, load

from sklearn.ensemble import GradientBoostingClassifier

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.inspection import permutation_importance

## Data

- Age: Customer's age (integer)
- City: Customer's place of residence (string: 'Berlin', 'Stuttgart')
- Income: Customer's annual income (integer)
- Membership_days: Number of days the customer has been a member (integer)
- Campaign_engagement: Number of times the customer engaged with previous - campaigns (integer)
- Target: Whether the customer responded positively to the campaign (0 or 1)

### Data import

In [8]:
# Load the df 
df = pd.read_csv('data.csv')


### Data structure

In [9]:
df

Unnamed: 0,age,city,income,membership_days,campaign_engagement,target
0,56,Berlin,136748,837,3,1
1,46,Stuttgart,25287,615,8,0
2,32,Berlin,146593,2100,3,0
3,60,Berlin,54387,2544,0,0
4,25,Berlin,28512,138,6,0
...,...,...,...,...,...,...
995,22,Berlin,49241,2123,4,0
996,40,Stuttgart,116214,970,5,1
997,27,Stuttgart,64569,2552,6,0
998,61,Stuttgart,31745,2349,8,1


In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 6 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   age                  1000 non-null   int64 
 1   city                 1000 non-null   object
 2   income               1000 non-null   int64 
 3   membership_days      1000 non-null   int64 
 4   campaign_engagement  1000 non-null   int64 
 5   target               1000 non-null   int64 
dtypes: int64(5), object(1)
memory usage: 47.0+ KB


### Data corrections

In [11]:

# Encode categorical variables 
df = pd.get_dummies(df, columns=['city'])

In [12]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 7 columns):
 #   Column               Non-Null Count  Dtype
---  ------               --------------  -----
 0   age                  1000 non-null   int64
 1   income               1000 non-null   int64
 2   membership_days      1000 non-null   int64
 3   campaign_engagement  1000 non-null   int64
 4   target               1000 non-null   int64
 5   city_Berlin          1000 non-null   uint8
 6   city_Stuttgart       1000 non-null   uint8
dtypes: int64(5), uint8(2)
memory usage: 41.1 KB


### Data splitting

Split the df into training and testing sets:


In [13]:
# Split the df into features (X) and target (y)
X = df.drop('target', axis=1)
y = df['target']

In [14]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Model

### Select model

Define hyperparameters:

In [None]:
params = {
    "n_estimators": 50,
    "max_depth": 3,
    "min_samples_split": 5,
}

In [15]:
# Set up the XGBoost model with some parameters
clf = GradientBoostingClassifier(**params)


### Train model

Train the XGBoost model:


In [16]:
# Train the model on the training data
model.fit(X_train, y_train)


### Model evaluation

Make predictions

In [17]:
# Predict on the testing data
y_pred = model.predict(X_test)

Evaluate the model

In [18]:
# Calculate accuracy
accuracy_score(y_test, y_pred)

0.925

In [19]:
# Print confusion matrix and classification report
print(confusion_matrix(y_test, y_pred))

[[95  6]
 [ 9 90]]


In [20]:
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

           0       0.91      0.94      0.93       101
           1       0.94      0.91      0.92        99

    accuracy                           0.93       200
   macro avg       0.93      0.92      0.92       200
weighted avg       0.93      0.93      0.92       200



### Save model

In [21]:
# Save the trained model to a file
model_filename = 'xgboost_model.joblib'
dump(model, model_filename)

['xgboost_model.joblib']