# Does CPLEX Community Studio scale?

The Google OR framework failed as I tried to scale. The CPLEX documentation warns that you cannot scale the problems indefinitely without a paid subscriptions. The question is: when will it either fail or take too long?

This example uses a file of size 10,000 compared with the original example's 27. Where appropriate, I will scale up with a factor.

| Run | Size | Time (s)  | Value  |
|-----|------|-----------|--------|
|   1 |  27  |   .009    |   436  |
|   2 |  50  |   0.05    |   925  |
|   3 |  75  |   0.13    |  1388  | 
|   4 |  85  |   --      |   --   |

What you see above is failure to complete at size 85.

## How to make targeted offers to customers?

_This is an excerpt from the original file, kept for reference._

This tutorial includes everything you need to set up IBM Decision Optimization CPLEX Modeling for Python (DOcplex), build a Mathematical Programming model, and get its solution by solving the model with IBM ILOG CPLEX Optimizer.

When you finish this tutorial, you'll have a foundational knowledge of _Prescriptive Analytics_.

>This notebook is part of the [Prescriptive Analytics for Python](https://rawgit.com/IBMDecisionOptimization/docplex-doc/master/docs/index.html).

>It requires a valid subscription to **Decision Optimization on Cloud** or a **local installation of CPLEX Optimizers**.

Discover us [here](https://developer.ibm.com/docloud).


## Describe the business problem
* The Self-Learning Response Model (SLRM) node enables you to build a model that you can continually update. Such updates are useful in building a model that assists with predicting which offers are most appropriate for customers and the probability of the offers being accepted. These sorts of models are most beneficial in customer relationship management, such as marketing applications or call centers.
* This example is based on a fictional banking company. 
* The marketing department wants to achieve more profitable results in future campaigns by matching the right offer of financial services to each customer. 
* Specifically, the datascience department identified the characteristics of customers who are most likely to respond favorably based on previous offers and responses and to promote the best current offer based on the results and now need to compute the best offerig plan.
<br>

A set of business constraints have to be respected:

* We have a limited budget to run a marketing campaign based on "gifts", "newsletter", "seminar"...
* We want to determine which is the best way to contact the customers.
* We need to identify which customers to contact.

## Prepare the data


In [90]:
import pandas as pd
import time

n_obs_new = 75

We have four product types:

  * car loan
  * savings
  * mortgage
  * pension
  
Each product has a different `productValue`: the revenue that can be obtained for the product on average. To get a fair representation of marketing across the various offers, each is allocated a `budgetShare`. 

In [91]:
products = ["Car loan", "Savings", "Mortgage", "Pension"]
productValue = [100, 200, 300, 400]
budgetShare = [0.6, 0.1, 0.2, 0.1]


Each product these can be offered over one of the following channels:

  * gift
  * newsletter
  * seminar
  
Each of these channels has different costs, and each has a different _influence factor_. We use the influence to weight the estimated value of the response accordingly.

In [92]:
channels = ['gift', 'newsletter', 'seminar']
cost = [20, 15, 23]
factor = [0.2, 0.05, 0.3]

Budget needs to be less than the available marketing budget of $500.

In [93]:
availableBudget = 500

# channels =  pd.DataFrame(data=[("gift", 20.0, 0.20), ("newsletter", 15.0, 0.05), ("seminar", 23.0, 0.30)], columns=["name", "cost", "factor"])
# channels.head()

Read in the offers data, originally from IBM and massaged. It gives the probability of taking an offer by each customer.

Rather than using the full 10,000, test that it works on a smaller size.

In [94]:
import pandas

product_probs_orig = pandas.read_csv('offers_ibm_pivot.csv')
n_obs_original = product_probs_orig.shape[0]

product_probs = pandas.read_csv('sample_data_10000.csv')
# product_probs = product_probs[product_probs.index > product_probs.shape[0] - n_obs_new]
product_probs = product_probs[product_probs.index < n_obs_new]
n_obs = product_probs.shape[0]

adjustment_factor = n_obs/n_obs_original
availableBudget = availableBudget*adjustment_factor

product_probs.rename(columns={'Unnamed: 0': 'customerid'}, inplace=True)
product_probs.head()

Unnamed: 0,customerid,name,Car loan,Savings,Mortgage,Pension
0,0,Matthew Harvey,0.0,0.0,0.0,0.0
1,1,Joshua Wilcox,0.0,0.0,0.179932,0.0
2,2,Yolanda Vasquez,0.330731,0.580556,0.0,0.0
3,3,Jessica Alvarado,0.0,0.630242,0.509746,0.0
4,4,Gregory Martinez,0.0,0.320511,0.0,0.288832


Let's customize the display of this data and show the confidence forecast for each customer.

In [95]:
import sys
import docplex.mp
from docplex.mp.model import Model

mdl = Model(name="marketing_campaign")

#### Define the decision variables
- The integer decision variables `channelVars`, represent whether or not a customer will be made an offer for a particular product via a particular channel.
- The integer decision variable `totaloffers` represents the total number of offers made.
- The continuous variable `budgetSpent` represents the total cost of the offers made.

In [96]:
num_customers = product_probs.shape[0]
num_products = len(products)
num_channels = len(channels)

offersR = range(0, num_customers)
productsR = range(0, num_products)
channelsR = range(0, num_channels)

channelVars = mdl.binary_var_cube(offersR, productsR, channelsR)
totaloffers = mdl.integer_var(lb=0)
budgetSpent = mdl.continuous_var()

#### Set up the constraints
- Offer only one product per customer.
- Compute the budget and set a maximum on it.
- Compute the number of offers to be made.

In [97]:
# Only 1 product is offered to each customer     
mdl.add_constraints( mdl.sum(channelVars[o,p,c] for p in productsR for c in channelsR) <=1
                   for o in offersR)

mdl.add_constraint( totaloffers == mdl.sum(channelVars[o,p,c] 
                                           for o in offersR 
                                           for p in productsR 
                                           for c in channelsR) )

mdl.add_constraint( budgetSpent == mdl.sum(channelVars[o,p,c]*cost[c]
                                           for o in offersR 
                                           for p in productsR 
                                           for c in channelsR) )

docplex.mp.LinearConstraint[](_x902,EQ,20_x1+15_x2+23_x3+20_x4+15_x5+23_x6+20_x7+15_x8+23_x9+20_x10+15_x11+23_x12+20_x13+15_x14+23_x15+20_x16+15_x17+23_x18+20_x19+15_x20+23_x21+20_x22+15_x23+23_x24+20_x25+15_x26+23_x27+20_x28+15_x29+23_x30+20_x31+15_x32+23_x33+20_x34+15_x35+23_x36+20_x37+15_x38+23_x39+20_x40+15_x41+23_x42+20_x43+15_x44+23_x45+20_x46+15_x47+23_x48+20_x49+15_x50+23_x51+20_x52+15_x53+23_x54+20_x55+15_x56+23_x57+20_x58+15_x59+23_x60+20_x61+15_x62+23_x63+20_x64+15_x65+23_x66+20_x67+15_x68+23_x69+20_x70+15_x71+23_x72+20_x73+15_x74+23_x75+20_x76+15_x77+23_x78+20_x79+15_x80+23_x81+20_x82+15_x83+23_x84+20_x85+15_x86+23_x87+20_x88+15_x89+23_x90+20_x91+15_x92+23_x93+20_x94+15_x95+23_x96+20_x97+15_x98+23_x99+20_x100+15_x101+23_x102+20_x103+15_x104+23_x105+20_x106+15_x107+23_x108+20_x109+15_x110+23_x111+20_x112+15_x113+23_x114+20_x115+15_x116+23_x117+20_x118+15_x119+23_x120+20_x121+15_x122+23_x123+20_x124+15_x125+23_x126+20_x127+15_x128+23_x129+20_x130+15_x131+23_x132+20_x133+15_x1

In [98]:
# Balance the offers among products   
for p in productsR:
    mdl.add_constraint( mdl.sum(channelVars[o,p,c] for o in offersR for c in channelsR) 
                       <= budgetShare[p] * totaloffers )
            

In [99]:
# Do not exceed the budget
mdl.add_constraint( mdl.sum(channelVars[o,p,c]*cost[c] 
                            for o in offersR 
                            for p in productsR 
                            for c in channelsR)  <= availableBudget )  

mdl.print_information()

Model: marketing_campaign
 - number of variables: 902
   - binary=900, integer=1, continuous=1
 - number of constraints: 82
   - linear=82
 - parameters: defaults


#### Express the objective

We want to maximize the expected revenue.

In [100]:
mdl.maximize(
    mdl.sum(
        channelVars[i, j, k] * factor[k] * productValue[j] * product_probs[products[j]].iloc[i]
            for i in offersR
            for j in productsR
            for k in channelsR
    )
)

#### Solve with the Decision Optimization solve service

If url and key are None, the Modeling layer will look for a local runtime, otherwise will use the credentials.
Look at the documentation for a good understanding of the various solving/generation modes.

If you're using a Community Edition of CPLEX runtimes, depending on the size of the problem, the solve stage may fail and will need a paying subscription or product installation.

> I _am_ using a Community Edition.

In [102]:
url = None
key = None

t = time.process_time()

s = mdl.solve(url=url, key=key)

elapsed_time = time.process_time() - t

assert s, "No Solution !!!"

### Step 4: Analyze the solution

First, let's display the **Optimal Marketing Channel per customer**.

In [103]:
report = [(channels[c], products[p], product_probs.loc[o, 'name']) 
          for c in channelsR 
          for p in productsR 
          for o in offersR  if channelVars[o,p,c].solution_value==1]

assert len(report) == totaloffers.solution_value

print("Marketing plan has {0} offers costing {1}".format(totaloffers.solution_value, budgetSpent.solution_value))
print('Time = ', elapsed_time, " seconds.")

report_bd = pd.DataFrame(report, columns=['channel', 'product', 'customer'])
display(report_bd)

Marketing plan has 70.0 offers costing 1388.0
Time =  0.15585399999999971  seconds.


Unnamed: 0,channel,product,customer
0,gift,Car loan,Yolanda Vasquez
1,gift,Car loan,Kristin Lewis
2,newsletter,Car loan,Gregory Martinez
3,newsletter,Car loan,Shane Scott
4,newsletter,Car loan,Rebecca Ross
5,newsletter,Car loan,Jillian Cook
6,newsletter,Car loan,Douglas Marsh
7,newsletter,Car loan,Jessica Daniel
8,newsletter,Car loan,Robert Shelton
9,newsletter,Car loan,Kyle Gordon
