## Scenario Analysis: Influencer Marketing Impact on Retailer Profit

**Context**: A retailer engages influencers to promote a product, deriving benefit only from influencer tweets. The product has a certain profit margin per unit, and customers can purchase only one unit each.

#### Assumptions:
- **Non-influencer tweets** yield no benefit.
- A single influencer tweet results in a **0.02%** purchase probability among followers.
- Two influencer tweets increase the purchase probability to **0.03%**.

#### Compensation Strategy:
- **Without Analytics**: \\$5 for each individual (A and B) to tweet once.
- **With Analytics**: \\$10 to identified influencers for two tweets; non-influencers receive nothing.

#### Questions:
- What is the boost in expected net profit from using your analytic model (versus not using analytics)?
- What is the boost in net profit from using a perfect analytic model (versus not using analytics)?

In [1]:
# Load libraries
import pandas as pd
from pycaret.classification import *
import numpy as np

In [2]:
# Load test data
df = pd.read_csv('test.csv')

In [3]:
# Define variables
profit = 10
cost_without_analytics = 5
cost_with_analytics = 10
prob_one_tweet = 0.0002
prob_two_tweets = 0.0003

### Scenario 1: Without Analytics
To accurately obtain the financial value of the model, we'll work with unseen data, which in this case is the test.csv file.

In [4]:
# Convert the target vairbles into a binary one (transform prob. values to 0 or 1)
df['Choice'] = df['Choice'].apply(lambda x: 1 if x > 0.5 else 0)

In [5]:
# PROFIT WITHOUT ANALYTICS
# Each person receives $5 to tweet once

revenue_without_analytics = sum(df.apply(lambda row: profit * prob_one_tweet * (row['A_follower_count'] if row['Choice'] == 1 else row['B_follower_count']), axis=1))

total_cost_without_analytics = cost_without_analytics * 2 * len(df)

total_profit_without_analytics = revenue_without_analytics - total_cost_without_analytics
print(f'Total profit without analytics: ${round(total_profit_without_analytics,0)}')

Total profit without analytics: $14442737.0


### Scenario 2: With Analytics
For this scenario, we'll apply the Gradient Boosting Classifier that was previously obtained as the best classifier for this excercise. I order to run the model, we'll pre-process the data.

In [6]:
# Same steps took in the 'part1_eda_features_eng' for data pre-processing
# Drop the target for modeling
new_df = df.drop('Choice', axis=1)

# Add the differential features
differential_features = ['follower_count', 'following_count', 'listed_count',
                         'mentions_received', 'retweets_received', 'mentions_sent',
                         'retweets_sent', 'posts', 'network_feature_1',
                         'network_feature_2', 'network_feature_3']

for feature in differential_features:
    new_df[f'diff_{feature}'] = new_df[f'A_{feature}'] - new_df[f'B_{feature}']
    
# Add the ratio the features
ratio_features = ['follower_count', 'following_count', 'mentions_received', 'posts']
for feature in ratio_features:
    # add a very small number to the denominator to avoid division by zero
    new_df[f'ratio_{feature}_A/B'] = new_df[f'A_{feature}'] / (new_df[f'B_{feature}'] + 1e-6)
    
# Add interaction features
new_df['social_reach_engagement_A'] = (new_df['A_follower_count'] + new_df['A_listed_count']) / (new_df['A_mentions_received'] + (new_df['A_retweets_received'])+ 1e-6) # this is to avoid division by 0
new_df['social_reach_engagement_B'] = (new_df['B_follower_count'] + new_df['B_listed_count']) / (new_df['B_mentions_received'] + (new_df['B_retweets_received'])+ 1e-6)

# Add the difference of social reach engagement
new_df['diff_social_reach_engagement'] = new_df['social_reach_engagement_A'] - new_df['social_reach_engagement_B']

# Drop the original columns
new_df.drop(columns=['social_reach_engagement_A', 'social_reach_engagement_B'], inplace=True)

new_df = new_df.iloc[:,23:]

In [7]:
# Load the GBC model
model = load_model('final_gbc')

Transformation Pipeline and Model Successfully Loaded


In [8]:
# Obtain the predictions
new_df['predicted_Choice'] = model.predict(new_df)

In [9]:
# Join the 'Choice' and 'followers_count' variables
new_df = new_df.join(df[['Choice', 'A_follower_count', 'B_follower_count']])

In [10]:
# PROFIT WITH ANALYTICS
# Only the influencer receives $10 to tweet twice

revenue_with_analytics = sum(new_df.apply(lambda row: 
                                          (profit * prob_two_tweets * row['A_follower_count'] if row['predicted_Choice'] == 1 and row['Choice'] == row['predicted_Choice'] else 0) +
                                          (profit * prob_two_tweets * row['B_follower_count'] if row['predicted_Choice'] == 0 and row['Choice'] == row['predicted_Choice'] else 0), 
                                          axis=1))

total_cost_with_analytics = cost_with_analytics * len(df)

total_profit_with_analytics = revenue_with_analytics - total_cost_with_analytics
print(f'Total profit with analytics: ${round(total_profit_with_analytics,0)}')

Total profit with analytics: $19394146.0


### Scenario 3: Perfect Analytics
In this scenario, we are assuming that all predictions were accurate.

In [11]:
# PROFIT PERFECT ANALYTICS
# Only the influencer receives $10 to tweet twice

revenue_perfect_analytics = sum(new_df.apply(lambda row: profit * prob_two_tweets * (row['A_follower_count'] if row['Choice'] == 1 else row['B_follower_count']), axis=1))

total_cost_perfect_analytics = cost_with_analytics * len(df)

total_profit_perfect_analytics = revenue_perfect_analytics - total_cost_perfect_analytics
print(f'Total profit with PERFECT analytics: ${round(total_profit_perfect_analytics,0)}')

Total profit with PERFECT analytics: $21693865.0


### Summarize the Scenarios

In [12]:
# Create a DataFrame to display the summary
profits_df = pd.DataFrame({
    'Scenario': ['Without Analytics', 'With Analytics', 'Perfect Analytics'],
    'Total Profit': [total_profit_without_analytics, total_profit_with_analytics, total_profit_perfect_analytics]})

profits_df['Diff in Value'] = profits_df['Total Profit'] - profits_df.loc[0, 'Total Profit']
profits_df['Diff in %'] = (profits_df['Diff in Value'] / profits_df.loc[0, 'Total Profit']) * 100

profits_df = profits_df.round(0)
profits_df

Unnamed: 0,Scenario,Total Profit,Diff in Value,Diff in %
0,Without Analytics,14442737.0,0.0,0.0
1,With Analytics,19394146.0,4951409.0,34.0
2,Perfect Analytics,21693865.0,7251128.0,50.0
