# Does MIP scale?

Now that I have translated IBM's problem to the Google OR framework, is it going to fall over at some point.

This example uses a file of size 10,000 compared with the original example's 27. Where appropriate, I will scale up with a factor.

| Run | Size | Time (s)  | Value  |
|-----|------|-----------|--------|
|   1 |  27  |  0.9      |    603 |
|   2 |  50  |  1.8      |  1 283 |
|   3 |  75  |  0.25     |  1 975 | 
|   4 |  85  |  0.55     |  2 227 |
|   5 |  90  |   --      |  --    |

What you see above is failure to complete at size 90.

In [1]:
from ortools.linear_solver import pywraplp
import time
from __future__ import print_function

n_obs_new = 75

We have four product types:

  * car loan
  * savings
  * mortgage
  * pension
  
Each product has a different `productValue`: the revenue that can be obtained for the product on average. To get a fair representation of marketing across the various offers, each is allocated a `budgetShare`. 

In [2]:
products = ['Car loan', 'Savings', 'Mortgage', 'Pension']
productValue = [100, 200, 300, 400]
budgetShare = [0.6, 0.1, 0.2, 0.1]

  
Each product these can be offered over one of the following channels:

  * gift
  * newsletter
  * seminar
  
Each of these channels has different costs, and each has a different _influence factor_. We use the influence to weight the estimated value of the response accordingly.

In [3]:
channels = ['gift', 'newsletter', 'seminar']
cost = [20, 15, 23]
factor = [0.2, 0.05, 0.3]

Budget needs to be less than the available marketing budget of $ \$500$.

In [4]:
availableBudget = 500

Read in the offers data, originally from IBM and massaged. It gives the probability of taking an offer by each customer.

Rather than using the full 10,000, test that it works on a smaller size.

In [5]:
import pandas

product_probs_orig = pandas.read_csv('offers_ibm_pivot.csv')
n_obs_original = product_probs_orig.shape[0]

product_probs = pandas.read_csv('sample_data_10000.csv')
# product_probs = product_probs[product_probs.index > product_probs.shape[0] - n_obs_new]
product_probs = product_probs[product_probs.index < n_obs_new]
n_obs = product_probs.shape[0]

adjustment_factor = n_obs/n_obs_original
availableBudget = availableBudget*adjustment_factor

product_probs.rename(columns={'Unnamed: 0': 'customerid'}, inplace=True)
product_probs.head()

Unnamed: 0,customerid,name,Car loan,Savings,Mortgage,Pension
0,0,Matthew Harvey,0.0,0.0,0.0,0.0
1,1,Joshua Wilcox,0.0,0.0,0.179932,0.0
2,2,Yolanda Vasquez,0.330731,0.580556,0.0,0.0
3,3,Jessica Alvarado,0.0,0.630242,0.509746,0.0
4,4,Gregory Martinez,0.0,0.320511,0.0,0.288832


Instantiate the solver as an MIP problem.

In [6]:
solver = pywraplp.Solver('SolveCampaignProblem', pywraplp.Solver.CBC_MIXED_INTEGER_PROGRAMMING)

Define the number of customers, the number of offers and the number of channel as $x_{ijk}$.

In [7]:
num_customers = product_probs.shape[0]
num_products = len(products)
num_channels = len(channels)

x = {}

for i in range(num_customers):
    for j in range(num_products):
        for k in range(num_channels):
            x[i, j, k] = solver.IntVar(0, 1, 'x[%i,%i,%i]' % (i, j, k))

### Set up the constraints

  1. Offer only one product per customer.
  2. Do not exceed the budget.
  3. Balance the offers/customers among products.
  

In [8]:
    ## offer only one product per customer
    for i in range(num_customers):
        solver.Add(solver.Sum([x[i, j, k] 
                               for j in range(num_products)
                               for k in range(num_channels)
                              ]) <= 1) 

    ## Do not exceed the budget
    solver.Add(solver.Sum([x[i, j, k]*cost[k]
                           for i in range(num_customers)
                           for j in range(num_products)
                           for k in range(num_channels)
                          ]) <= availableBudget)
    
    ## Balance the offers/customers among products
 #   for j in range(num_products):
 #       solver.Add(solver.Sum([x[i, j, k]
 #                              for i in range(num_customers)
 #                              for k in range(num_channels)
 #           ]) <= budgetShare[j]*solver.Sum([x[i, j, k]
 #                                           for i in range(num_customers)
 #                                           for j in range(num_products)
 #                                           for k in range(num_channels)
 #                                           ]) )
 
# minimums for channel
    channel_min = 5

    for k in range(num_channels):
        solver.Add(solver.Sum([x[i, j, k]
                               for i in range(num_customers)
                               for j in range(num_products)
        ]) >= channel_min)
    
    product_min = 8
    
    for j in range(num_products):
        solver.Add(solver.Sum([x[i, j, k]
                               for i in range(num_customers)
                               for k in range(num_channels)
        ]) >= product_min)


### Express the objective

We want to maximize revenue $R$. Here $x_{ijk}$ denotes whether customer $i$ receives an offer for product $j$ over channel $k$, $f_k$ denotes the channel adjustment factor, $v_j$ the product value and $p_{ij}$ the probability that customer $i$ takes up product $j$.

$ \max R = \sum_{ijk} x_{ijk} \times f_k \times v_j \times p_{ij}$

In [9]:
#    solver.Minimize(solver.Sum([cost[i][j] * x[i, j] for i in range(num_workers)
#                                                     for j in range(num_tasks)]))

solver.Maximize(solver.Sum([x[i, j, k]*factor[k]*productValue[j]*product_probs[products[j]].iloc[i]
                           for i in range(num_customers)
                           for j in range(num_products)
                           for k in range(num_channels)]))

### Invoke the solver

In [10]:
# Invoke the solver
t = time.process_time()
sol = solver.Solve()
elapsed_time = time.process_time() - t

Print out the solution. We can print out more information about the constraints.

In [11]:
report = [(channels[k], products[j], product_probs.loc[i, 'name'], x[i, j, k].solution_value()*cost[k],
          x[i, j, k].solution_value()*factor[k]*productValue[j]*product_probs[products[j]].iloc[i]) 
          for i in range(num_customers) 
          for j in range(num_products) 
          for k in range(num_channels)  if x[i, j, k].solution_value() > 0]

report_bd = pandas.DataFrame(report, columns=['channel', 'product', 'customer', 'cost', 'revenue'])

print('Total revenue = %d' % (solver.Objective().Value()))
print('Total budget  = %d' % (report_bd['cost'].sum()) )
print('Time = ', elapsed_time, " seconds.")
display(report_bd)

Total revenue = 3114
Total budget  = 1386
Time =  0.27495  seconds.


Unnamed: 0,channel,product,customer,cost,revenue
0,newsletter,Car loan,Joshua Wilcox,15.0,0.000000
1,seminar,Savings,Yolanda Vasquez,23.0,34.833389
2,seminar,Mortgage,Jessica Alvarado,23.0,45.877099
3,seminar,Pension,Gregory Martinez,23.0,34.659849
4,seminar,Pension,Kathryn Maxwell,23.0,71.253490
5,seminar,Mortgage,Leah Nelson,23.0,57.169339
6,newsletter,Car loan,Rebecca Ross,15.0,0.000000
7,seminar,Pension,Angela Sanchez,23.0,75.004685
8,seminar,Mortgage,Michelle Roy,23.0,68.208093
9,seminar,Savings,David Lewis,23.0,33.735030


In [12]:
report_bd.groupby(['channel', 'product']).count()

Unnamed: 0_level_0,Unnamed: 1_level_0,customer,cost,revenue
channel,product,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
gift,Car loan,2,2,2
gift,Mortgage,1,1,1
gift,Pension,2,2,2
newsletter,Car loan,6,6,6
seminar,Mortgage,16,16,16
seminar,Pension,23,23,23
seminar,Savings,13,13,13
