# Data Science Online
## Part IV: Rocket Fuel Costs, Benefits, and Efficacy

<img src="images/berkeley_img-4-1.jpg" style="width: 700px; height: 300px;" />

*In this notebook, we will apply what we learned about Python and DataFrames to compute the costs, benefits, and return on investment for the Rocket Fuel handbag case study.*

### Table of Contents


3. <a href='#section 3'>Problem: Rocket Fuel Costs, Benefits, and Efficacy</a>

     a. <a href='#subsection 3a'>Conversion Proportions</a>

     b. <a href='#subsection 3b'>Benefit</a>

     c. <a href='#subsection 3c'>ROI</a>

     d. <a href='#subsection 3d'>Opportunity Cost</a>

## 1. Problem: Rocket Fuel Costs, Benefits, and Efficacy <a id='section1'></a>

Now that we know a bit about Python and DataFrames, we can start analyzing the Rocket Fuel case. 

As a reminder, here are the details of the Rocket Fuel case as detailed in Notebook 3.

## Rocket Fuel Ad Campaign <a id='section case'></a>

[Rocket Fuel Inc.](https://rocketfuel.com/programmatic-marketing-platform/) (NASDAQ: FUEL), works in digital advertising offering a "Programmatic Marketing Platform" that claims to optimize digital marketing through big data and machine learning techniques.

In 2015, Rocket Fuel ran a trial ad campaign for handbag manufacturer TaskBella. TaskBella was interested in answering two questions:

1. Would the campaign be successful?
2. If the campaign was successful, how much of that success could be attributed to the ads?

With the second question in mind, they agreed to run an **A/B test**. The majority of the people exposed to Rocket Fuel's content delivery network would see TaskBella's handbag ad (the **experimental group**). But, a small portion of people (the **control group**) would instead see a Public Service Announcement (PSA) in the exact size and place the ad would normally be. One PSA example is below:

<img src="images/smokey_bear_psa.PNG" style="width: 700px; height: 300px;" />

In this section, we'll explore four questions:

> * *Was the campaign effective? Did more users convert as a result of seeing an ad?*
> * *How much more money did TaskBella make as a result of running the campaign (ignoring advertising costs)?*
> * *Was the campaign profitable (what was the ROI)?*
> * *What was the opportunity cost of including a control group? How much more could have TaskaBella made with a smaller control group or not having a control group at all?*

In [2]:
# load the necessary software. THIS CELL MUST BE RUN
import pandas as pd


In [3]:
# run this cell
ads = pd.read_csv('https://raw.githubusercontent.com/ds-modules/exec_ed/master/data/rocketfuel_data_renamed.csv', index_col=0)

# display the first ten rows
ads.head()

Unnamed: 0_level_0,test group,converted,total ads,most ads day,most ads hour
user id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1069124,ad,0,130,1:Mon,20
1119715,ad,0,93,2:Tues,22
1144181,ad,0,21,2:Tues,18
1435133,ad,0,355,2:Tues,10
1015700,ad,0,276,5:Fri,14


In [5]:
control = ads[ads["test group"] == "psa"]
control.head()

Unnamed: 0_level_0,test group,converted,total ads,most ads day,most ads hour
user id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
900681,psa,0,248,6:Sat,19
905704,psa,0,27,4:Thurs,8
904595,psa,0,13,2:Tues,19
901904,psa,0,32,3:Wed,19
902234,psa,0,105,2:Tues,19


In [6]:
experiment = ads[ads["test group"] == "ad"]
experiment.head()

Unnamed: 0_level_0,test group,converted,total ads,most ads day,most ads hour
user id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1069124,ad,0,130,1:Mon,20
1119715,ad,0,93,2:Tues,22
1144181,ad,0,21,2:Tues,18
1435133,ad,0,355,2:Tues,10
1015700,ad,0,276,5:Fri,14


### 3a. Did more users convert as a result of the ad campaign? <a id='subsection 3a'></a>

We're interested in seeing if the buying behavior of users differed between the control and experimental groups. The two groups are very different in size, so it isn't fair to compare the number of people who converted in each group. Instead, we're going to look at the *proportion* of people in each group who bought a bag.

For both groups, the proportion will be calculated as:
$$\frac{\text{number of people in group who converted}}{\text{total number of people in group}}$$

Let's start with the control group. Getting the number of people in the control group is easy: we can just call `num_rows` on our `control` table from 2b.

In [7]:
# number of users in control group
num_control = control.shape[0]
num_control

(23524, 5)

Next, we need a table with only users in the control group who converted. We can get this with a call to `where` on our table of control group users.

In [None]:
# table with only converting control group users
ctrl_converts = control[control[""]]
ctrl_converts

From this new table, we can get the number of converting control group users by again using `num_rows`.

In [None]:
# number of people in the ctrl_converts table
num_ctrl_converts = ctrl_converts.num_rows
num_ctrl_converts

Finally, we can plug the number of control group converts and the total number of control group people into our formula to find the percentage.

In [None]:
# proportion of control group users who converted
ctrl_convert_proportion = (num_ctrl_converts / num_control)
ctrl_convert_proportion

**EXERCISE:** Find the proportion of people in the *experiment* group who converted. You can follow the exact same steps as we did above for the control group; in all steps the code will be identical except for the variable and table names.

Step 1: Get the number of people in the experiment group using the `experiment` table and `num_rows`.


In [None]:
# number of people in the experiment (ad) group
num_exper = ...
num_exper

Step 2: Use `where` on the `experiment` table to create a table with only the experiment group users who converted.

In [None]:
# use "where" to get only the experiment group users who converted
exper_converts = experiment.where(..., ...)
exper_converts

Step 3: Get the number of converted experiment group users using the table you just created and `num_rows`.

In [None]:
# count the number of converting experimental group members
num_exper_converts = ...
num_exper_converts

Step 4: Plug the values from step 1 and step 3 into the formula to calculate the proportion.

$$\frac{\text{number of people in group who converted}}{\text{total number of people in group}}$$

Hint: you don't have to type any numbers here; you can just use the names of the two variables you just created.

In [None]:
# the proportion of people in the experimental group that converted
exper_convert_proportion = ...
exper_convert_proportion

The next cell will print the values you calculated as percents of the control and experiment groups that converted, rounded to two decimal places. 

In [None]:
print("Control Group: {} % converted".format(round(ctrl_convert_proportion * 100, 2))) 
print("Experiment Group: {} % converted".format(round(exper_convert_proportion * 100, 2)))

**QUESTION:** Was the campaign effective? Was a user who saw the ad more likely to buy a bag than a user who didn't see the ad?

**ANSWER:** 

### 3b. How much more money did TaskBella make as a result of running the campaign (ignoring advertising costs)? <a id='subsection 3b'></a>

Here we're looking for the benefit of the campaign: the expected financial impact from the conversions resulting from the ads (excluding all advertising costs).

The formula for the benefit is as follows:

$$ (\text{value of a converted user}) * (\text{number of users in the experiment group}) * (\text{proportion of converting experiment group users} - \text{proportion of converting control group users}) $$

That is, we are looking for the number of people in the experiment group who bought a handbag and *wouldn't have bought one if they'd been in the control group*- the people whose conversion was the result of the ad campaign, This is why we subtract the control group conversion percentage from the experiment group conversion percentage.

We already have most of the parts of this formula- we just need to assemble them.

First, TaskBella estimates the value of a converted user to be $\$40$. In the following cell, assign `40` to the name `convert_val`.

In [None]:
# dollar value of converted user
convert_val = ...

Next, let's get the difference in conversion proportions for the experiment and control groups: 

$$\text{proportion of converting experiment group users} - \text{proportion of converting control group users}$$

You can do this easily by using the variables you just calculated: `exper_convert_proportion` and `ctrl_convert_proportion`.

In [None]:
# the difference between the experiment conversion proportion and the control conversion proportion
proportion_diff = ...
proportion_diff

Lastly, plug all the appropriate values into the benefit formula to get the benefit.

Hint: the number of users in the experiment group is saved as `num_exper`.

In [None]:
benefit = ...
benefit

### 3c. What was the Return on Investment (ROI)? <a id='subsection 3c'></a>

In 3a and 3b we saw that advertising resulted in a higher percentage of converting users and a positive benefit. But, would using the campaign still increase profits when advertising costs are accounted for?

Recall that back in part 1b we calculated the advertising costs and named them `cost`.

In [None]:
# the cost of the campaign
cost

**EXERCISE:** Calculate the ROI as 

$$\frac{\text{benefit} - \text{cost}}{\text{cost}}$$

In [None]:
# calculate the ROI
# remember to mind your order of operations
roi = ...
roi

### 3d. What was the opportunity cost of including a control group? <a id='subsection 3d'></a>

As we saw in 3b, having a control group is important to get a baseline with which to compare the experimental data. However, any users assigned to the control group are not seeing TaskBella's advertising, eating into profits.

We can calculate the *opportunity cost* of the control group as:

$$(\text{value of converted user}) * (\text{number of users in control group}) * (\text{proportion of experiment group users who converted} - \text{proportion of control group users who converted})$$

In other words, the opportunity cost is the additional amount of money users in the control group would have spent if they had seen the ads *purely as a result of seeing the ads*. Note that this is almost the same formula as for the benefit in 3b, except with the control group instead of the experiment group.

**EXERCISE:** Use `convert_val`, `num_control`, and `proportion_diff` to calculate the opportunity cost.

In [None]:
opp_cost = ...
opp_cost

**QUESTION:** Was the ad campaign profitable when all the costs are accounted for? Why or why not?

**ANSWER:**

#### References

- Sections of "Intro to Jupyter", "Table Transformation" adapted from materials by Kelly Chen and Ashley Chien in [UC Berkeley Data Science Modules core resources](http://github.com/ds-modules/core-resources)
- "A Note on Errors" subsection and "error" image adapted from materials by Chris Hench and Mariah Rogers for the Medieval Studies 250: Text Analysis for Graduate Medievalists [data science module](https://github.com/ds-modules/MEDST-250).
- Rocket Fuel data and discussion questions adapted from materials by Zsolt Katona and Brian Bell, BerkeleyHaas Case Series

Author: Keeley Takimoto