Due to the cost of delivering email SMARTMARKET decides to send email to only 25% of it's
subscriber base for week 27.

Given the data provided:

(1) Which subscribers would you send email to?

(2) Which campaign(s) would you deliver to them?

(3) What do you expect the response rate to be?

In [106]:
import pandas as pd
import numpy as np

In [107]:
file_path = 'hist.xlsx'

In [108]:
df = pd.read_excel(file_path)

In [109]:
print(df.head(5))

   week_id  customer_id attribute1  state_id Sex  campaign_id  response
0        1            1          B         2   M            1         1
1        1            2          A        38   F            2         0
2        1            3          C        46   M            3         0
3        1            4          B        35   M            4         0
4        1            5          B        22   M            5         1


In [110]:
rows, columns = df.shape
print(f"Number of rows: {rows}")
print(f"Number of columns: {columns}")

Number of rows: 260000
Number of columns: 7


In [111]:
filtered_data = df[['state_id', 'Sex', 'attribute1']]
filtered_data.head()

Unnamed: 0,state_id,Sex,attribute1
0,2,M,B
1,38,F,A
2,46,M,C
3,35,M,B
4,22,M,B


### Calculate the response rate grouped by 'state_id', 'Sex', and 'attribute1'

In [112]:
grouped_response_rates = df.groupby(['state_id', 'Sex', 'attribute1'])['response'].mean().reset_index()
grouped_response_rates.columns = ['state_id', 'Sex', 'attribute1', 'response_rate']

### Merge the response rates back into the original dataset

In [113]:
data_with_rates = pd.merge(df, grouped_response_rates, on=['state_id', 'Sex', 'attribute1'])

### Preview the data with response rates

In [114]:
data_with_rates.head()

Unnamed: 0,week_id,customer_id,attribute1,state_id,Sex,campaign_id,response,response_rate
0,1,1,B,2,M,1,1,0.137821
1,1,2,A,38,F,2,0,0.432692
2,1,3,C,46,M,3,0,0.587278
3,1,4,B,35,M,4,0,0.124126
4,1,5,B,22,M,5,1,0.123932


### Determine the threshold for top 25% response rates for groups

In [115]:
top_25_percent_threshold = grouped_response_rates['response_rate'].quantile(0.75)

### Select groups with response rates in the top 25%

In [116]:
top_groups = grouped_response_rates[grouped_response_rates['response_rate'] >= top_25_percent_threshold]

### Preview the top 25% groups

In [117]:
top_groups.head()

Unnamed: 0,state_id,Sex,attribute1,response_rate
2,1,F,C,0.716346
4,1,M,A,0.708502
10,2,F,C,0.730769
12,2,M,A,0.741758
18,3,F,C,0.68109


### Check the last campaign delivered to each group (state, gender, attribute1)
### Assume you're delivering campaigns in rotation, so find out the last campaign for each group at week 26

In [118]:
last_campaign_per_group = data_with_rates[data_with_rates['week_id'] == 26][['state_id', 'Sex', 'attribute1', 'campaign_id']]

### Preview the last campaign data


In [119]:
last_campaign_per_group.head()

Unnamed: 0,state_id,Sex,attribute1,campaign_id
250000,2,M,B,6
250001,38,F,A,7
250002,46,M,C,8
250003,35,M,B,9
250004,22,M,B,10


### For week 27, based on the rotation logic, assign the next campaign in sequence

In [120]:
def next_campaign(campaign_id):
    return 1 if campaign_id == 10 else campaign_id + 1

### Apply the function to determine the next campaign for each top group

In [121]:
top_groups_with_campaign = pd.merge(top_groups, last_campaign_per_group, on=['state_id', 'Sex', 'attribute1'])
top_groups_with_campaign['next_campaign'] = top_groups_with_campaign['campaign_id'].apply(next_campaign)

### Preview the next campaigns to be sent to each group

In [122]:
top_groups_with_campaign.head()

Unnamed: 0,state_id,Sex,attribute1,response_rate,campaign_id,next_campaign
0,1,F,C,0.716346,2,3
1,1,F,C,0.716346,4,5
2,1,F,C,0.716346,10,1
3,1,F,C,0.716346,8,9
4,1,F,C,0.716346,8,9


### Based on historical response rates, estimate the response rate for week 27

In [123]:
predicted_response_rate = top_groups_with_campaign['response_rate'].mean()

### Display the predicted response rate

In [124]:
print(f"Predicted response rate for week 27: {predicted_response_rate:.2%}")

Predicted response rate for week 27: 71.06%


### Response rate for campaign_Id for each one and then which campaign ID has the highest response rate, so which campaignID to send next.

In [142]:
segment_response_rates = top_groups_with_campaign.groupby(['state_id', 'Sex', 'attribute1', 'campaign_id'])['response_rate'].mean().reset_index()

Here we have group by state_id, user_class, gender, and campaign_id to calculate response rates

In [143]:
best_campaigns_by_segment = segment_response_rates.loc[segment_response_rates.groupby(['state_id','Sex', 'attribute1'])['response_rate'].idxmax()]

Here we have the best campaign for each segment based on the highest response rate

In [144]:
print(best_campaigns_by_segment[['state_id', 'Sex', 'attribute1','campaign_id', 'response_rate']])

     state_id Sex attribute1  campaign_id  response_rate
0           1   F          C            1       0.716346
10          1   M          A            1       0.708502
19          2   F          C            1       0.730769
29          2   M          A            1       0.741758
36          3   F          C            1       0.681090
..        ...  ..        ...          ...            ...
838        48   M          A            1       0.662088
847        49   F          C            1       0.725962
856        49   M          A            1       0.717949
865        50   F          C            1       0.724615
873        50   M          A            1       0.719780

[100 rows x 5 columns]


We can display the recommended campaign for each (state_id, user_class, gender) segment
