<a href="https://colab.research.google.com/github/remerge/uplift-report/blob/master/uplift_report.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Remerge uplift report

This notebook allows you to validate remerge provided uplift reporting numbers. To do so it downloads and analyses exported campaign and event data from S3. The campaign data contains all users that remerge marked to be part of an uplift test, the A/B group assignment, the timestamp of marking, conversion events (click, app open or similar) and their cost. The event data reflects the app event stream and includes events, their timestamp and revenue (if any). We calculate the incremental revenue and the iROAS in line with the [remerge whitepaper](https://drive.google.com/file/d/1PTJ93Cpjw1BeiVns8dTcs2zDDWmmjpdc/view). 

**Hint**: This notebook can be run in any Jupyter instance with enough space/memory, as a [Google Colab notebook](#Google-Colab-version) or as a standalone Python script. If you are using a copy of this notebook running on Colab or locally you can find the original template on [GitHub: remerge/uplift-report](https://github.com/remerge/uplift-report/blob/master/uplift_report_per_campaign.ipynb)

### Notebook configuration

For this notebook to work properly several variables in the [Configuration](#Configuration) section need to be be set: `customer`, `audience`, `
revenue_event`, `dates` and the AWS credentials. All of these will be provided by your remerge account manager. 

In [0]:
# Import remerge uplift-report library
import os

# if we are in jupyter environment - we have cloned the repo already and `lib` is available
# on Colab we need to clone the repo and enable the same loading path through a symlink
if not os.path.exists('lib'):
    !git clone --branch master https://github.com/remerge/uplift-report.git
    !ln -s uplift-report/lib
    
    !pip install lib/
    
    # Since we could have upgraded some dependencies, that require restart of the kernel (specifically `pandas`),
    # it is safer to perform this restart now
    os.kill(os.getpid(), 9)    

## Import packages

This notebook/script needs our Uplift Report helper library, as long as several other dependencies it brings with it


## Load helpers

In [0]:
import os
import pandas as pd

from lib.helpers import Helpers, display

## Version
Version of the analysis script corresponding to the methodology version in the whitepaper (Major + Minor version represent the whitepaper version, revision represents changes and fixes of the uplift report script).

In [0]:
display(Helpers.version())

## Configuration

Set the customer name, audience and access credentials for the S3 bucket and path. Furthermore the event for which we want to evaluate the uplift needs to be set `revenue_event`.

In [0]:
# configure path and revenue event 
customer = ''
audiences = ['']
revenue_event = 'purchase'

# date range for the report
dates = pd.date_range(start='2019-01-01',end='2019-01-01')

# AWS credentials
os.environ["AWS_ACCESS_KEY_ID"] = ''
os.environ["AWS_SECRET_ACCESS_KEY"] = ''

# Configure the reporting output: 

# named groups that aggregate several campaigns
groups = {}

# show uplift results per campaign:
per_campaign_results = False

# base statistical calculations on unique converters instead of conversions
use_converters_for_significance = False

# enable deduplication heuristic for appsflyer
use_deduplication = False

In [0]:
# Instantiate & configure the helpers
#
# Hint: Press Atl + / or Tab to see docstring with paramerets descriptions in Google Colab
helpers = Helpers(
    customer=customer,
    audiences=audiences,
    revenue_event=revenue_event,
    dates=dates,
    groups=groups,
    per_campaign_results=per_campaign_results,
    use_converters_for_significance=use_converters_for_significance,
    use_deduplication=use_deduplication,
    export_user_ids=False,
)

## Load CSV data from S3

Load mark, spend and event data from S3. 

### IMPORTANT

**The event data is usually quite large (several GB) so this operation might take several minutes or hours to complete, depending on the size and connection.**

In [0]:
marks_and_spend_df = helpers.load_marks_and_spend_data()

In [0]:
attributions_df = helpers.load_attribution_data(marks_and_spend_df=marks_and_spend_df)

Print some statistics of the loaded data sets.

In [0]:
marks_and_spend_df.info(memory_usage='deep')

In [0]:
attributions_df.info(memory_usage='deep')

### Calculate and display uplift report for the data set as a whole

This takes the whole data set and calculates uplift KPIs.

In [0]:
report = helpers.uplift_report(marks_and_spend_df=marks_and_spend_df, attributions_df=attributions_df)

## Uplift Results

You can configure the ouput by using variables in the 'Configuration' section

In [0]:
helpers.display_report(report, raw=False)

### CSV Export - combined reports

In [0]:
start = dates[0]
end = dates[-1]

helpers.export_csv(df=report, file_name='{}_{}-{}.csv'.format(customer, start, end))