# Initial broad optimization for number of offers

Because we couldn't use the OR tools to calculate the full picture for multiple customers above around 80, investigate a two-stage process:

  1. Determine the number of offers to make using mean probabilities over the customer space.
  2. Use Bob Agnew's dual optimization to calculate the final allocations

| Run | Size | Time (s)  | Value  |
|-----|------|-----------|--------|
|   1 |  27  |  0.9      |    603 |
|   2 |  50  |  1.8      |  1 283 |
|   3 |  75  |  0.25     |  1 975 | 
|   4 |  85  |  0.55     |  2 227 |
|   5 |  90  |   --      |  --    |



In [1]:
from ortools.linear_solver import pywraplp
import time
import numpy as np
from __future__ import print_function

n_obs_new = 27

We have four product types:

  * car loan
  * savings
  * mortgage
  * pension
  
Each product has a different `productValue`: the revenue that can be obtained for the product on average. To get a fair representation of marketing across the various offers, each is allocated a `budgetShare`. 

In [2]:
products = ['Car loan', 'Savings', 'Mortgage', 'Pension']
productValue = [100, 200, 300, 400]
budgetShare = [0.6, 0.1, 0.2, 0.1]

  
Each product these can be offered over one of the following channels:

  * gift
  * newsletter
  * seminar
  
Each of these channels has different costs, and each has a different _influence factor_. We use the influence to weight the estimated value of the response accordingly.

In [3]:
channels = ['gift', 'newsletter', 'seminar']
cost = [20, 15, 23]
factor = [0.2, 0.05, 0.3]

Budget needs to be less than the available marketing budget of $ \$500$.

In [4]:
availableBudget = 500

Read in the offers data, originally from IBM and massaged. It gives the probability of taking an offer by each customer.

Rather than using the full 10,000, test that it works on a smaller size.

In [5]:
import pandas

product_probs_orig = pandas.read_csv('offers_ibm_pivot.csv')
n_obs_original = product_probs_orig.shape[0]

product_probs = pandas.read_csv('sample_data_10000.csv')
# product_probs = product_probs[product_probs.index > product_probs.shape[0] - n_obs_new]
product_probs = product_probs[product_probs.index < n_obs_new]
n_obs = product_probs.shape[0]

adjustment_factor = n_obs/n_obs_original
availableBudget = availableBudget*adjustment_factor

product_probs.rename(columns={'Unnamed: 0': 'customerid'}, inplace=True)
product_probs.head()

Unnamed: 0,customerid,name,Car loan,Savings,Mortgage,Pension
0,0,Matthew Harvey,0.0,0.0,0.0,0.0
1,1,Joshua Wilcox,0.0,0.0,0.179932,0.0
2,2,Yolanda Vasquez,0.330731,0.580556,0.0,0.0
3,3,Jessica Alvarado,0.0,0.630242,0.509746,0.0
4,4,Gregory Martinez,0.0,0.320511,0.0,0.288832


Calculate the average probabilities by customer. In the first instance, include zeroes as a valid probability. Also calculate the numbers we could reasonably offer to.

In [6]:
product_probs = product_probs.replace(0, np.NaN)
product_probs_mean = product_probs[['Car loan', 'Savings', 'Mortgage', 'Pension']].mean()
product_probs_count = product_probs[['Car loan', 'Savings', 'Mortgage', 'Pension']].count()

> **Note:** I will need to add the counts as constraints on the products.

Instantiate the solver as an MIP problem.

In [7]:
solver = pywraplp.Solver('SolveCampaignProblem', pywraplp.Solver.CBC_MIXED_INTEGER_PROGRAMMING)

Define the number of customers, the number of offers and the number of channel as $x_{ijk}$.

In [8]:
num_customers = product_probs.shape[0]
num_products = len(products)
num_channels = len(channels)

x = {}

for j in range(num_products):
    for k in range(num_channels):
        x[j, k] = solver.IntVar(0, num_customers, 'x[%i,%i]' % (j, k))

### Set up the constraints

  1. Offer only at most one product per customer.
  2. Do not exceed the budget.
  3. Balance the offers/customers among products.
  

In [9]:
    ## offer only one product per customer
    solver.Add(solver.Sum([x[j, k] 
                               for j in range(num_products)
                               for k in range(num_channels)
                              ]) <= num_customers) 

    ## Do not exceed the budget
    solver.Add(solver.Sum([x[j, k]*cost[k]
                           for j in range(num_products)
                           for k in range(num_channels)
                          ]) <= availableBudget)
    
    ## Balance the offers/customers among products
    for j in range(num_products):
        solver.Add(solver.Sum([x[j, k]
                               for k in range(num_channels)
            ]) <= budgetShare[j]*solver.Sum([x[j, k]
                                            for j in range(num_products)
                                            for k in range(num_channels)
                                            ]) )

#    for j in range(num_products):
#        solver.Add(solver.Sum([x[j, k]
#                             for k in range(num_channels)])
#                  <= product_probs_count[j])

### Maximize revenue

We want to maximize revenue $R$. Here $x_{ijk}$ denotes whether customer $i$ receives an offer for product $j$ over channel $k$, $f_k$ denotes the channel adjustment factor, $v_j$ the product value and $p_{ij}$ the probability that customer $i$ takes up product $j$.

$ \max R = \sum_{ijk} x_{ijk} \times f_k \times v_j \times p_{ij}$

In [10]:
solver.Maximize(solver.Sum([x[j, k]*factor[k]*productValue[j]*product_probs_mean[j]
                           for j in range(num_products)
                           for k in range(num_channels)]))

### Invoke the solver

In [11]:
# Invoke the solver
t = time.process_time()
sol = solver.Solve()
elapsed_time = time.process_time() - t

Print out the solution. We can print out more information about the constraints.

In [12]:
report = [(channels[k], products[j], x[j, k].solution_value(), x[j, k].solution_value()*cost[k],
          x[j, k].solution_value()*factor[k]*productValue[j]*product_probs_mean[j]) 
          for j in range(num_products) 
          for k in range(num_channels)  if x[j, k].solution_value() > 0]

report_bd = pandas.DataFrame(report, columns=['channel', 'product', 'number', 'cost', 'revenue'])

print('Total revenue = %d' % (solver.Objective().Value()))
print('Total budget  = %d' % (report_bd['cost'].sum()) )
print('Time = ', elapsed_time, " seconds.")
display(report_bd)

Total revenue = 445
Total budget  = 460
Time =  0.011603999999999948  seconds.


Unnamed: 0,channel,product,number,cost,revenue
0,seminar,Car loan,12.0,276.0,162.525834
1,seminar,Savings,2.0,46.0,73.769771
2,seminar,Mortgage,4.0,92.0,128.514192
3,seminar,Pension,2.0,46.0,80.202494


In [13]:
product_probs_mean*productValue

Car loan     45.146065
Savings     122.949619
Mortgage    107.095160
Pension     133.670824
dtype: float64

In [14]:
productValue

[100, 200, 300, 400]

In [15]:
product_probs_mean

Car loan    0.451461
Savings     0.614748
Mortgage    0.356984
Pension     0.334177
dtype: float64

In [16]:
cost

[20, 15, 23]