# Data Science: Bridging Principles and Practice
## Part 10: Rocket Fuel Costs, Benefits, and Efficacy

<img src="images/berkeley_img-4-1.jpg" style="width: 700px; height: 300px;" />

*In this notebook, we will apply what we learned about Python and DataFrames to compute the costs, benefits, and return on investment for the Rocket Fuel handbag case study.*

### Table of Contents


<ol start=10><li><a href='#section10'>Problem: Rocket Fuel Costs, Benefits, and Efficacy</a></li>
    <ol>
     <li><a href='#section10a'>Conversion Proportions</a></li>
     <li><a href='#section10b'>Benefit</a></li>
     <li><a href='#section10c'>ROI</a></li>
     <li><a href='#section10d'>Opportunity Cost</a></li>
     </ol>
    </ol>

In [None]:
# load the necessary software. THIS CELL MUST BE RUN
import pandas as pd
from gofer.ok import check
%matplotlib inline

## Rocket Fuel Ad Campaign <a id='section case'></a>
### (Review from notebooks 03, 04, and 05)

[Rocket Fuel Inc.](https://rocketfuel.com/programmatic-marketing-platform/) (NASDAQ: FUEL), works in digital advertising offering a "Programmatic Marketing Platform" that claims to optimize digital marketing through big data and machine learning techniques.

In 2015, Rocket Fuel ran a trial ad campaign for handbag manufacturer TaskBella. TaskBella was interested in answering two questions:

1. Would the campaign be successful?
2. If the campaign was successful, how much of that success could be attributed to the ads?

With the second question in mind, they agreed to run an **A/B test**. The majority of the people exposed to Rocket Fuel's content delivery network would see TaskBella's handbag ad (the **experimental group**). But, a small portion of people (the **control group**) would instead see a Public Service Announcement (PSA) in the exact size and place the ad would normally be. One PSA example is below:

<img src="images/smokey_bear_psa.PNG" style="width: 700px; height: 300px;" />


## 10. Problem: Rocket Fuel Costs, Benefits, and Efficacy <a id='section10'></a>

Earlier in the course we explored the Rocket Fuel study in terms of grouping and visualizing the data. Now, we want to see whether or not the campaign worked.

In this section, we'll explore four questions:

> * *Was the campaign effective? Did more users convert as a result of seeing an ad?*
> * *How much more money did TaskBella make as a result of running the campaign (ignoring advertising costs)?*
> * *Was the campaign profitable (what was the ROI)?*
> * *What was the opportunity cost of including a control group? How much more could have TaskaBella made with a smaller control group or not having a control group at all?*

Run the next cell to load the data. Then, run the cell after it to split the data into two smaller DataFrames: one that only has subjects in the control ("psa") group, and another that only has subject in the test ("ad") group.

In [None]:
# run this cell to load the data
ads = pd.read_csv('data/rocketfuel_data_renamed.csv', index_col=0)

# show the first five rows of the ads data
ads.head()

In [None]:
# create a DataFrame with only users in the control (PSA) group
control = ads[ads["test group"] == "psa"]

# create a DataFrame with only users in the experimental (ad) group
experiment = ads[ads["test group"] == "ad"]
experiment.head()

### 10a. Did more users convert as a result of the ad campaign? <a id='section10a'></a>

We're interested in seeing if the buying behavior of users differed between the control and experimental groups. The two groups are very different in size, so it isn't fair to compare the number of people who converted in each group. Instead, we're going to look at the *proportion* of people in each group who bought a bag.

For both groups, the proportion will be calculated as:
$$\frac{\text{number of people in group who converted}}{\text{total number of people in group}}$$

Let's start with the control group. To get the number of people in the control group, we can use the `shape` attribute to get the shape of the DataFrame.

In [None]:
control.shape

The `shape` attribute returns two values: the number of rows followed by the number of columns. We only want the number of rows (the first value), so we'll use indexing to get it. Remember that in Python indexing starts at 0.

In [None]:
# number of users in control group
num_control = control.shape[0]
num_control

Next, we need a table with only users in the control group who converted. We can get this using Boolean indexing. Our condition inside the square brackets (i.e. how we want to select the rows) is that we want rows where the "converted" feature is `True`.

In [None]:
# table with only converting control group users
ctrl_converts = control[control["converted"] == True]
ctrl_converts.head()

From this new table, we can get the number of converting control group users by again using `shape`.

In [None]:
# number of people in the ctrl_converts table
num_ctrl_converts = ctrl_converts.shape[0]
num_ctrl_converts

Finally, we can plug the number of control group converts and the total number of control group people into our formula to find the percentage.

In [None]:
# proportion of control group users who converted
ctrl_convert_proportion = (num_ctrl_converts / num_control)
ctrl_convert_proportion

<div class="alert alert-warning"><p><b>EXERCISE:</b> Find the proportion of people in the <b>experiment</b> group who converted. You can follow the exact same steps as we did above for the control group; in all steps the code will be identical except for the variable and table names.</p>

<p>Step 1: Get the number of people in the experiment group using the <code>experiment</code> table, the <code>shape</code> attribute, and indexing.</p>
</div>

In [None]:
# number of people in the experiment (ad) group
num_exper = ...
num_exper

<div class="alert alert-warning"><p>Step 2: Fill in the ellipses with the correct condition to select <b>users in the experiment group who converted</b>.</p>
<p> Hint: if you're stuck, look at how we did it for the control group. It's the same task, so the code will look very similar, but all references to the <code>control</code> DataFrame will be replaced by the <code>experiment</code> DataFrame.</p>
</div>

In [None]:
# get only the experiment group users who converted
exper_converts = experiment[...]
exper_converts.head()

<div class="alert alert-warning">Step 3: Get the number of converted experiment group users using the table you just created, the <code>shape</code> attribute, and indexing.</div>

In [None]:
# count the number of converting experimental group members
num_exper_converts = ...
num_exper_converts

<div class="alert alert-warning"><p>Step 4: Plug the values from step 1 and step 3 into the formula to calculate the proportion.</p>
<br>
$$\frac{\text{number of people in group who converted}}{\text{total number of people in group}}$$
<br>
<p>Hint: you don't have to type any numbers here; use the names of the two variables you just created.</p></div>

In [None]:
# the proportion of people in the experimental group that converted
exper_convert_proportion = ...
exper_convert_proportion

In [None]:
# run this cell to check your answer for common errors
check("tests/exper-convert.ok")

The next cell will print the values you calculated as percents of the control and experiment groups that converted, rounded to two decimal places. 

In [None]:
print("Control Group: {} % converted".format(round(ctrl_convert_proportion * 100, 2))) 
print("Experiment Group: {} % converted".format(round(exper_convert_proportion * 100, 2)))

<div class="alert alert-warning"><b>QUESTION:</b> Was the campaign effective? Was a user who saw the ad more likely to buy a bag than a user who didn't see the ad?</div>

**ANSWER:**  *Fill in your answer here*

### 10b. How much more money did TaskBella make as a result of running the campaign (ignoring advertising costs)? <a id='section10b'></a>

Here we're looking for the benefit of the campaign: the expected financial impact from the conversions resulting from the ads (excluding all advertising costs).

The formula for the benefit is as follows:

$$ (\text{value of a converted user}) * (\text{number of users in the experiment group}) * (\text{proportion of converting experiment group users} - \text{proportion of converting control group users}) $$

That is, we are looking for the number of people in the experiment group who bought a handbag and *wouldn't have bought one if they'd been in the control group*- the people whose conversion was the result of the ad campaign, This is why we subtract the control group conversion percentage from the experiment group conversion percentage.

We already have most of the parts of this formula- we just need to assemble them.

<div class="alert alert-warning"><b> EXERCISE:</b>TaskBella estimates the value of a converted user to be $\$40$. In the following cell, assign <code>40</code> to the name <code>convert_val</code>.
</div>

In [None]:
# dollar value of converted user
convert_val = ...

<div class="alert alert-warning">Next, let's get the difference in conversion proportions for the experiment and control groups: <br>
<br>
$$\text{proportion of converting experiment group users} - \text{proportion of converting control group users}$$
<br>
You can do this easily by using the variables you just calculated: <code>exper_convert_proportion</code> and <code>ctrl_convert_proportion</code>.</div>

In [None]:
# the difference between the experiment conversion proportion and the control conversion proportion
proportion_diff = ...
proportion_diff

<div class="alert alert-warning"><p>Lastly, plug all the appropriate values into the benefit formula to get the benefit.</p>
<br>
$$ (\text{value of a converted user}) * (\text{number of users in the experiment group}) * (\text{proportion of converting experiment group users} - \text{proportion of converting control group users}) $$
<br>
    <p>Hint: the number of users in the experiment group is saved as <code>num_exper</code>.</p>

In [None]:
benefit = ...
benefit

In [None]:
# run this cell to check your answer for some common errors
check("tests/benefit.ok")

### 10c. What was the Return on Investment (ROI)? <a id='section10c'></a>

In parts 10a and 10b we saw that advertising resulted in a higher percentage of converting users and a positive benefit. But, would using the campaign still increase profits when advertising costs are accounted for?

Recall that back in Notebook 02 we calculated the advertising costs and named them `cost`. Run the next cell to re-define that variable with the cost we calculated.

In [None]:
# the cost of the campaign
cost = 131374.64

<div class="alert alert-warning"><b>EXERCISE:</b> Calculate the ROI as 

$$\frac{\text{benefit} - \text{cost}}{\text{cost}}$$
</div>

In [None]:
# calculate the ROI
# remember to mind your order of operations
roi = ...
roi

In [None]:
# run this cell to check your answer for some common errors
check("tests/roi.ok")

### 10d. What was the opportunity cost of including a control group? <a id='section10d'></a>

As we saw in 10b, having a control group is important to get a baseline with which to compare the experimental data. However, any users assigned to the control group are not seeing TaskBella's advertising, eating into profits.

We can calculate the *opportunity cost* of the control group as:

$$(\text{value of converted user}) * (\text{number of users in control group}) * (\text{proportion of experiment group users who converted} - \text{proportion of control group users who converted})$$

In other words, the opportunity cost is the additional amount of money users in the control group would have spent if they had seen the ads *purely as a result of seeing the ads*. Note that this is almost the same formula as for the benefit in 10b, except with the control group instead of the experiment group.



<div class="alert alert-warning"><b>EXERCISE:</b> Use <code>convert_val</code>, <code>num_control</code>, and <code>proportion_diff</code> to calculate the opportunity cost.</div>

In [None]:
# calculate the opportunity cost
opp_cost = ...
opp_cost

In [None]:
# run this cell to check your answer for some common errors
check("tests/opp-cost.ok")

<div class="alert alert-warning"><b>QUESTION:</b> Was the ad campaign profitable when all the costs are accounted for? Why or why not?</div>

**ANSWER:** *Fill in your answer here*

#### References

- Rocket Fuel data and discussion questions adapted from materials by Zsolt Katona and Brian Bell, BerkeleyHaas Case Series

Author: Keeley Takimoto