# Advanced Bootstrap Features
In this example we will cover:
* Bootstrapping with custom aggregation functions
* Bootstrapping Ratios  
  * Bootstrap( SUM(numerator) / SUM(denominator) )
  * Bootstrap( AVG(numerator / denominator) )

**Note:** Here we will demonstrate bs.bootstrap of this same functionality is available with bootstrap_ab

In [1]:
import matplotlib.pyplot as plt
import matplotlib
import pandas as pd
import numpy as np

In [2]:
import bootstrapped.bootstrap as bs
import bootstrapped.compare_functions as bs_compare
import bootstrapped.stats_functions as bs_stats

## Set up the data

In [3]:
df = pd.DataFrame()
df['revenue'] = np.random.normal(loc=100, scale=90, size=1000000)
df['revenue'] = df['revenue'].apply(lambda x: x if x > 0 else 1)

df['clicks'] = np.random.binomial(100, 0.15, 1000000) 

sample_df = df[:5000]

In [4]:
# Calculate the estimate of revenue per record 
print(df.revenue.mean())

print(bs.bootstrap(values=sample_df.revenue.values, stat_func=bs_stats.mean))

106.133594774
105.37894894    (103.155297296, 107.549433364)


### Bootstraps with custom functions
Lets demonstrate a simple eample of how to calculate the bootstrapped mean in a custom function. Should give similar results to the above function calls.

In [5]:
def custom_mean(values):
    '''Calculate the mean of values for each bootstrap sample
    Args:
        values: a np.array of values we want to calculate the statistic on
            This is actually a 2d array (matrix) of values. Each row represents 
            a bootstrap resample simulation that we wish to aggretage across.
    '''
    
    # 12345 bootstrap resample simulations
    print('function input shape {}'.format(values.shape))
    
    mean_values = values.mean(axis=1)
    
    print('function output shape {} == num bootstrap iterations'.format(mean_values.shape))
    
    return mean_values

print('length of the array', len(sample_df.revenue.values))

results = bs.bootstrap(sample_df.revenue.values, custom_mean, num_iterations=12345)
print()
print(results)

length of the array 5000
function input shape (12345, 5000)
function output shape (12345,) == num bootstrap iterations

105.350491418    (103.246572005, 107.598528079)


In [6]:
# alternatively we could write our function to aggregate across rows and ignore the axis
# this is much slower

def alternate_mean(values):
    # note this way of doing things is often slower, but not always
    return np.array([np.mean(v) for v in values])
    
print(bs.bootstrap(sample_df.revenue.values, alternate_mean))

105.390518051    (103.167009787, 107.583823505)


### Bootstraps with 'advanced' functions

In [7]:
def percentile_range(values):
    '''Calculate a percentile range of values
    Args:
        values: a np.array of values we want to calculate the statistic on
            This is actually a 2d array (matrix) of values. Each row represents 
            a bootstrap resample simulation that we wish to aggretage across.
    '''
    p95 = np.percentile(values, 95, axis=1)
    p5 = np.percentile(values, 5, axis=1)  
    return p95 - p5
    

print(bs.bootstrap(sample_df.revenue.values, percentile_range))

240.377455222    (235.409574035, 246.39621899)


# Denominator Values

If you would like to bootstrap the global Clicks Per Dollar reduction you need a function that does:

**Bootstrap(SUM(Clicks) / SUM(Revenue))**

and not 

**Mean(Bootstrap(Clicks/Revenue))**

The first gives you the global reduction. In this case you want to sample a whole record according to an event (revenue, click) and then calculate from that. In this case this is achievable by the following code (in addition to more complex functions). 

In [8]:
print(df.clicks / df.revenue).mean() 

print(bs.bootstrap(
    sample_df.clicks.values / sample_df.revenue.values,
    bs_stats.mean
))

2.52033023071
2.34355715099    (2.1880213099, 2.50488848647)


In [9]:
print(df.clicks.sum() / df.revenue.sum())

print(bs.bootstrap(
    sample_df.clicks.values,
    bs_stats.mean,
    denominator_values=sample_df.revenue.values
))

0.141290804593
0.141671340896    (0.138651692035, 0.14478146765)
