# Brand Lift

Google Brand Lift to estimating lift from TrueView advertising campaigns, with surveys on YouTube using a randomized experiment. Users are assigned to experiment (i.e. treatment) or control using random number generation based on a hashed version of their cookies.

Questions for brand lift are designed to address different stages of the advertising funnel: ad recall, brand awareness, consideration, etc. This dataset includes data from one campaign and a single question about recall.

# Data
For exposure data, the columns are:
* timestamp: time of the exposure
* user_id: a unique identifier for the user
* group: whether the user was assigned to the control group, or the experiment group.
* gender: Female (F), Male (M), or Other/Uknown (O)
* age: five different age buckets <p>

For survey solicitations, the columns are:

* user_id: a unique identifier for the user.
* timestamp: time of the survey solicitation <p>

For responses, the columns are:

* user_id: a unique identifier for the user.
* timestamp: the timestamp of the response
* positive response: 0 or 1, where 1 indicates the user responded positively (i.e. the "correct answer" to the question.)

Analysing the difference in the proportion of positive responses between the experiment group and the control group using point estimates for the difference in response rate between the experiment group and control group, and a confidence interval on this difference.

In [None]:
import numpy as np
import pandas as pd

exposures_url = 'x'
exposure_data = pd.read_csv(exposures_url)

solicitations_url = 'y'
solicitation_data = pd.read_csv(solicitations_url)

responses_url = 'z'
response_data = pd.read_csv(responses_url)

In [None]:
#add col for frequency from exposure data for count of exposures per user
exposure_data['frequency'] = exposure_data.groupby('user_id')['user_id'].transform('count')
exposure_data = exposure_data[exposure_data['frequency'] <= 5]

#keep first exposure timestamp
first_exposure_data = exposure_data.sort_values(by=['user_id', 'timestamp']).drop_duplicates(subset='user_id', keep='first')

#rename timestamps
first_exposure_data = first_exposure_data.rename(columns={'timestamp': 'exposure_timestamp'})
solicitation_data = solicitation_data.rename(columns={'timestamp': 'solicitation_timestamp'})
response_data = response_data.rename(columns={'timestamp': 'response_timestamp'})

#merge data
df = pd.merge(first_exposure_data, solicitation_data, on='user_id', how='left')
df = pd.merge(df, response_data, on='user_id', how='left')

df.head()

Unnamed: 0,exposure_timestamp,user_id,group,gender,age,frequency,solicitation_timestamp,response_timestamp,positive_response
0,1707673303,1000040,CONTROL,M,18-29,2,,,
1,1706969254,1000094,EXPERIMENT,M,65+,2,,,
2,1707010678,1000581,EXPERIMENT,M,50-64,2,,,
3,1707294314,1000602,CONTROL,M,65+,2,1707900000.0,,
4,1706911834,1000802,EXPERIMENT,M,30-39,3,,,


In [None]:
#solicitation rate (percent of exposed users that were solicited)

def two_sample_solicitation_rate_conf_interval(df):
    critical_value = 1.96
    exp_data = df[df['group'] == "EXPERIMENT"]

    exp_solicited = exp_data['solicitation_timestamp'].notnull().sum()
    exp_exposed = len(exp_data)
    p1 = exp_solicited / exp_exposed


    se_1 = 0
    if exp_data.shape[0] > 0:
        se_1 = np.sqrt(p1 * (1 - p1) / exp_data.shape[0])

    control_data = df[df['group'] == "CONTROL"]

    control_solicited = control_data['solicitation_timestamp'].notnull().sum()
    control_exposed = len(control_data)
    p2 = control_solicited / control_exposed

    se_2 = 0
    if control_data.shape[0] > 0:
        se_2 = np.sqrt(p2 * (1 - p2) / control_data.shape[0])

    se_diff = np.sqrt(se_1**2 + se_2**2)
    diff = p1 - p2

    return pd.Series([p1, p2, diff.round(4), "(%.4f - %.4f)" % (diff - critical_value * se_diff, diff + critical_value * se_diff)],
                     index=['solicitation rate - Experiment', 'solicitation rate - Control', 'Difference', 'Confidence Interval'])

result = two_sample_solicitation_rate_conf_interval(df)
print(result)

solicitation rate - Experiment               0.074703
solicitation rate - Control                  0.174402
Difference                                    -0.0997
Confidence Interval               (-0.1012 - -0.0982)
dtype: object


In [None]:
#response rate (percent of solicited users that responded)

def two_sample_response_rate_conf_interval(df):
    critical_value = 1.96
    exp_data = df[df['group'] == "EXPERIMENT"]

    exp_responded = exp_data['positive_response'].notnull().sum()
    exp_solicited = exp_data['solicitation_timestamp'].notnull().sum()
    p1 = exp_responded / exp_solicited

    se_1 = 0
    if exp_data.shape[0] > 0:
        se_1 = np.sqrt(p1 * (1 - p1) / exp_data.shape[0])

    control_data = df[df['group'] == "CONTROL"]

    control_responded = control_data['positive_response'].notnull().sum()
    control_solicited = control_data['solicitation_timestamp'].notnull().sum()
    p2 = control_responded / control_solicited

    se_2 = 0
    if control_data.shape[0] > 0:
        se_2 = np.sqrt(p2 * (1 - p2) / control_data.shape[0])

    se_diff = np.sqrt(se_1**2 + se_2**2)
    diff = p1 - p2

    return pd.Series([p1, p2, diff.round(4), "(%.4f - %.4f)" % (diff - critical_value * se_diff, diff + critical_value * se_diff)],
                     index=['response rate - Experiment', 'response rate - Control', 'Difference', 'Confidence Interval'])

result = two_sample_response_rate_conf_interval(df)
print(result)

response rate - Experiment              0.70061
response rate - Control                 0.69778
Difference                               0.0028
Confidence Interval           (0.0009 - 0.0048)
dtype: object


In [None]:
#positive response rate

def two_sample_positive_response_conf_interval(df):
  critical_value = 1.96
  exp_data = df[df['group'] == "EXPERIMENT"]
  p1 = exp_data['positive_response'].mean()
  se_1 = 0
  if exp_data.shape[0] > 0:
    se_1 = np.sqrt(p1 * (1 - p1) / exp_data.shape[0])

  control_data = df[df['group'] == "CONTROL"]
  p2 = control_data['positive_response'].mean()
  se_2 = 0
  if control_data.shape[0] > 0:
    se_2 = np.sqrt(p2 * (1 - p2) / control_data.shape[0])

  se = np.sqrt(se_1*se_1 + se_2*se_2)

  p_diff = p1 - p2
  return pd.Series([p1, p2, p_diff.round(4), "(%.4f - %.4f)" % (p_diff - critical_value * se, p_diff + critical_value * se)],
                   index=['Positive Response Rate - Experiment', 'Positive Response Rate - Control','diff', 'conf interval'])

result = two_sample_positive_response_conf_interval(df)
print(result)

Positive Response Rate - Experiment              0.13863
Positive Response Rate - Control                0.117966
diff                                              0.0207
conf interval                          (0.0193 - 0.0221)
dtype: object


In [None]:
#lift

relative_lift = (result['diff']) / (result['Positive Response Rate - Control']) *100
print(relative_lift)

absolute_lift = result['diff'] * 100
print(absolute_lift)

reach = df[df['group'] == 'EXPERIMENT']['user_id'].nunique()
lifted_users = result['diff'] * reach
lifted_users

17.54743036211699
2.07


14487.4125

In [None]:
#reach
experiment_group_reach = df[df['group'] == 'EXPERIMENT']['user_id'].nunique()
control_group_reach = df[df['group'] == 'CONTROL']['user_id'].nunique()

print(experiment_group_reach)
print(control_group_reach)



699875
300089


The experiment results indicate a positive impact of the campaign on brand recall. The positive response rate was 13.86% for the experiment group and 11.79% for the control group, showing a statistically significant difference of 2.07% (CI: 0.0193 - 0.0221). The campaign led to a relative lift of 17.54% and an absolute lift of 2.07%, with 14,487 users positively influenced.

In contrast, the solicitation rate was significantly lower in the experiment group (7.47%) than in the control group (17.43%), with a difference of -9.97% (CI: -0.1012 - -0.0982), indicating the control group was solicited twice as often.

The response rate was high for both groups, with 70.06% for the experiment group and 69.77% for the control group, showing a slight but statistically significant difference of 0.28% (CI: 0.0009 - 0.0048).

During data cleaning, exposures exceeding five per user were removed due to inaccuracies, and only the first exposure was retained to measure its impact. Data merging was conducted using a left join, preserving null values in solicitation and response fields to maintain distinctions between non-responses and negative responses. Timestamp columns were also standardized to differentiate between exposure, solicitation, and response events.

Analyzing results sliced by gender, and age.

In [None]:
def two_sample_conf_interval(df):
  critical_value = 1.96
  exp_data = df[df['group'] == "EXPERIMENT"]
  p1 = exp_data['positive_response'].mean()
  se_1 = 0
  if exp_data.shape[0] > 0:
    se_1 = np.sqrt(p1 * (1 - p1) / exp_data.shape[0])

  control_data = df[df['group'] == "CONTROL"]
  p2 = control_data['positive_response'].mean()
  se_2 = 0
  if control_data.shape[0] > 0:
    se_2 = np.sqrt(p2 * (1 - p2) / control_data.shape[0])

  se = np.sqrt(se_1*se_1 + se_2*se_2)

  p_diff = p1 - p2
  return pd.Series([p_diff.round(4), "(%.4f - %.4f)" % (p_diff - critical_value * se, p_diff + critical_value * se)],
                   index=['diff', 'conf interval'])

In [None]:
# sliced by gener
df.groupby(['gender']).apply(two_sample_conf_interval)

Unnamed: 0_level_0,diff,conf interval
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
F,0.0342,(0.0322 - 0.0362)
M,0.0073,(0.0053 - 0.0093)
O,0.0181,(0.0080 - 0.0282)


In [None]:
#sliced by age
df.groupby(['age']).apply(two_sample_conf_interval)

Unnamed: 0_level_0,diff,conf interval
age,Unnamed: 1_level_1,Unnamed: 2_level_1
18-29,0.0185,(0.0152 - 0.0219)
30-39,0.0256,(0.0224 - 0.0288)
40-49,0.0363,(0.0336 - 0.0390)
50-64,0.0041,(0.0009 - 0.0072)
65+,0.0105,(0.0070 - 0.0141)


In [None]:
df.groupby(['gender', 'age']).apply(two_sample_conf_interval)

Unnamed: 0_level_0,Unnamed: 1_level_0,diff,conf interval
gender,age,Unnamed: 2_level_1,Unnamed: 3_level_1
F,18-29,0.0242,(0.0194 - 0.0290)
F,30-39,0.0522,(0.0475 - 0.0569)
F,40-49,0.0671,(0.0632 - 0.0710)
F,50-64,0.002,(-0.0024 - 0.0065)
F,65+,0.0065,(0.0014 - 0.0115)
M,18-29,0.016,(0.0112 - 0.0207)
M,30-39,0.0022,(-0.0023 - 0.0066)
M,40-49,0.003,(-0.0009 - 0.0068)
M,50-64,0.007,(0.0026 - 0.0115)
M,65+,0.0125,(0.0074 - 0.0176)


When slicing by age we can see that all  CIs don't include zero and are positive differences indicating that the postive response rate for experiment is higher than the control group and are statistically significant which is a good indicator that the campaign has been successful in increase brand lift. The greatest increase in positve response came from the female group, which could show that this ad responded well from a female audience.

The same results can been seen when slicing by gender in terms of significance and an increase in positive response rate from the control. The greatest increase in respoonse by age came from the 30-39 and 40-49 groups which could show a positve response to the ad for these age groups.

Slicing a combination of age and gender shows where some groups dont have statistical significace as their CI include 0. These group include within the female group in the age range 50-64, within the male group in age ranges of 30-39 and 40-49 and within the other gender group in the age range of 50-64. For these groups running the campaign did not show a signficance in influencing response and could be attributed to randomness. When slicing by age and gender we can see the greatest increase in response coming from the other gender category in the age range of 65+ which is interesting as these results were not seen when slicing the two groups independently. However, the second and third highest increase in response from the control to experiment groups was found in the female gender group for the age ranges of 30-39 and 40-49 which supports the findings from the independent group slicing indicating that this campaign was recieved well by females in those age buckets.


Analyzing covariate imbalance for age and gender.

In [None]:
#covariate imbalance for gender
pd.crosstab(df['group'], df['gender'], normalize='index')

gender,F,M,O
group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
CONTROL,0.489828,0.489995,0.020177
EXPERIMENT,0.490178,0.48982,0.020002


In [None]:
#covariate imbalance for age
pd.crosstab(df['group'], df['age'], normalize='index')

age,18-29,30-39,40-49,50-64,65+
group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
CONTROL,0.169283,0.188974,0.278764,0.202597,0.160382
EXPERIMENT,0.170335,0.190597,0.278635,0.20021,0.160223


Overall gender is very balanced with the absolute difference in % of females in the control group to experiment being approximatly 0.0003% and for males and other genders with lower differences of 0.0001. These are good indications that for gender the experiment and control groups were randomized well and represent the data well.

Age is less balanced with the biggest differences seen in the younger age buckets of 18-19 and 30-39 with an approximate 0.001 difference between groups. This shows how the experiment group is biased higher within this analysis in these two age columns while the older age buckets hold a good balance. This should be taken into consideration for targeting this ad as the age group 30-39 appeared to have a good increase in postive response seen through slicing which originally indicated it was a good group to target.

 Analyzing results sliced by frequency (computed from the data).

In [None]:
df.groupby(['frequency']).apply(two_sample_conf_interval)

Unnamed: 0_level_0,diff,conf interval
frequency,Unnamed: 1_level_1,Unnamed: 2_level_1
1,0.0146,(0.0123 - 0.0169)
2,0.0238,(0.0215 - 0.0262)
3,0.0278,(0.0242 - 0.0314)
4,0.0172,(0.0122 - 0.0222)
5,0.0228,(0.0129 - 0.0326)


In [None]:
pd.crosstab(df['group'], df['frequency'], normalize='index')

frequency,1,2,3,4,5
group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
CONTROL,0.3708,0.368887,0.160066,0.080003,0.020244
EXPERIMENT,0.369895,0.370156,0.159826,0.080107,0.020016


Frequency counts 1 to 5 all show significance with their differences confidence intervals above 0 and each experiment count group has a positive increase in postive reviews from the control group. The greatest difference comes from frequency counts 2, 3, and 5 which may indicate these frequencies were most successful in improve positive responce, however all frequency counts saw an increase.

The greatest imbalance for frequency comes from counts 1, 2, and 3 where bias to the control group is seen for 1 and 3, and bias to the experiment group is seen in 2.


Recommendations:

The publisher should not set a frequency cap of one exposure per user, as there is variation in impact on exposures and an increase in response is seen after 1 exposure. They should use a strategy that incorporates mulitple exposures to increase effectiveness, however looking out for ad fatigue. From these results it appears that 2 to 3 exposures should be successful.

As the publisher should try to prioritize both reach and frequency for recall, they should balance their efforts in exposing some users to this 2 to 3 exposure range, however not all as they also need to capture new, unqiue users to the campagin.

Based on the overall results of this analysis it is also recommended to put some effort into exposing a female audience in the 40-49 age group as they saw the greatest increase in positive responses from the control group, however they should not limit targeting to this group as positive results can be seen across all genders and age groups in some manner, which indicates this ad is appealing to all.