# Optimization

Using real data from a fundraising organisation, this workbook shows a prototype of marketing optimization. It works in conjunction with an Excel spreadsheet used as the presentation layer and the user interface – `01-optimize-nswcc.xlsm`.

## To do

  * Move the configuration settings to a file (YAML) or Excel.
  * Configure mapping for the customer file to the standard fields:
    - `customerid`
    - `value`
    - `response` (optional)
  * Define a threshold at where a file needs to be read externally rather than in the Excel tab, say 20,000.
  
## Interface

Talking with Nadav, want to make the interface simple, even wizard-driven.

  1. Set budget min/max (all optional)
  
      1. overall
      2. by channel
      3. by product
      
  2. Select offers
     
     1. Pull in associated probabilities or values for the offers. Offers are linked to products.
  
  3. Select channels
  
    1. costs
    2. adjustment factors for the probabilities or values above
    
  4. Set exclusions
  
  
## Data requirements

  * Product information
  * Channel information
  * Customer expected value for a product–channel offer
  
## Set up the project

In [1]:
import xlwings as xw
import pandas as pd
from ortools.linear_solver import pywraplp
import numpy as np
import os

sample_size = 70 # initial sample to optimize on
number_of_samples = 10 # not used yet

wb = xw.Book(r'myproject/01-optimize-nswcc.xlsm')  # connect to an existing file 

trace = False # if true, print out diagnostic information

## Read in product and channel information

Read in customer, product and channel information from Excel.

As the customer data is comparatively large, we will read it from a `.csv` then load it into the Excel spreadsheet.

> **To do** Need to map the input files with the key types of field:

  * Customer key
  * Product/offer name (in the product tab)
  * Response or actual (for testing and validation)


In [2]:
customer_data_filename = "constituent_value.csv"
customer_data_directory = "/Users/jamespearce/repos/simu-late"
customer_df = pd.read_csv(customer_data_filename)
customer_df.columns = ["customerid", "response", "product", "value"]

Write to the customer tab.

In [3]:
customer_sheet = wb.sheets['customer_data']
customer_records_threshold = 20000

if customer_df.shape[0] <= customer_records_threshold:
    customer_sheet.range('A1').value = customer_df
    customer_in_tab = True
else:
    customer_sheet.range('A1').value = os.path.join(customer_data_directory, customer_data_filename)
    customer_in_tab = False

Read in the data from the Excel tabs.

In [4]:
product_sheet = wb.sheets['products']
channel_sheet = wb.sheets['channels']
scenario_sheet = wb.sheets['Scenario A']

In [5]:
product_probs_all = customer_sheet.range('A1').options(pd.DataFrame, expand='table').value if customer_in_tab else customer_df.copy()
products_df = product_sheet.range('A1').options(pd.DataFrame, expand='table').value

# is the value of products in the table or in customer
product_in_customer = True

products = products_df.index
if not(product_in_customer):
    productValue = products_df.loc[:, 'value'] 

channels_df = channel_sheet.range('A1').options(pd.DataFrame, expand='table').value
channels = channels_df.index
cost = channels_df['cost']
factor = channels_df['factor']

sample_scaling = sample_size/product_probs_all.shape[0]

In [6]:
product_value_list = [product_probs_all.copy() for j in channels]

In [7]:
for j, channel in enumerate(channels):
#     print(j, channel)
    product_value_list[j]['channel'] = channel
# product_value_list
product_value = pd.concat(product_value_list)
# product_value.set_index(['customerid', 'product', 'channel'], inplace=True)


Update `product_probs` so that all revenue calculations exist at the customer level. This means add in the factor to the `value` column.

In [8]:
product_value = product_value.merge(channels_df[['cost', 'factor']], left_on='channel', right_index=True, copy=True)
product_value.set_index(['customerid', 'product', 'channel'], inplace=True)

In [9]:
# product_value.head()

In [10]:
## Multi-index slicing
idx = pd.IndexSlice
# product_value.loc[idx[1,:,:], idx[:]].head()

> **Note**: because of the sparse notation, deal with missing instead of zero value in the constraints and objective value calculation.

In [11]:
# put some checks/tests in place
if any(products.isnull()):
    print("Null values exist in products")
    
if any(channels.isnull()):
    print("Null values exist in channels")
    
if any(cost.isnull()):
    print("Null values exist in channel cost")
    
if any(factor.isnull()):
    print("Null values exist is channel response multiplier (factor)")

Get the available marketing budget from the `Scenario` sheet.

In [12]:
budget_range = scenario_sheet.range('budgetConstraints').value
availableBudget_total = budget_range[1]
if availableBudget_total == None:
    availableBudget = None
    print("No budget constraints.")
else:
    availableBudget = availableBudget_total*sample_scaling # scale to sample_size for initial optimization
    print("Sampled available budget: %d" % availableBudget)


Sampled available budget: 32


Create a sample of size `sample_size` for the initial optimization.


In [13]:
num_products = len(products)
random_seed = 2058
np.random.seed(random_seed)
customerid_sample = np.random.choice(product_probs_all['customerid'].unique(), sample_size)
product_probs = product_probs_all[product_probs_all['customerid'].isin(customerid_sample)]

Instantiate the solver as an MIP problem.

In [14]:
solver = pywraplp.Solver('SolveCampaignProblem', pywraplp.Solver.CBC_MIXED_INTEGER_PROGRAMMING)
solver.Clear()

Define the number of customers, the number of offers and the number of channels as $x_{ijk}$.

In [15]:
num_customers = len(product_probs['customerid'].unique())
num_channels = len(channels)

x = {}

for i in range(num_customers):
    for j in range(num_products):
        for k in range(num_channels):
            x[i, j, k] = solver.IntVar(0, 1, 'x[%i,%i,%i]' % (i, j, k))

print('Number of customers: %d' % num_customers)
print('Number of products: %d' % num_products)
print('Number of channels: %d' % num_channels)

Number of customers: 70
Number of products: 4
Number of channels: 4


## Set up the constraints

  1. Offer only one product per customer. 
  
      _(**TO DO:** update this.)_  Can be done trivially by expanding out the 
     different combinations of products and restricting _those_ to one per 
     customer.
     
  2. Adhere to budget, channel and product constraints from the Excel spreadsheet.
  
  3. Adhere to number of offer constraints.
  


In [16]:
    ## offer only one product per customer
    max_offers_per_customer = 1
    for i in range(num_customers):
        solver.Add(solver.Sum([x[i, j, k] 
                               for j in range(num_products)
                               for k in range(num_channels)
                              ]) <= max_offers_per_customer) 

    ## Do not exceed the budget
    ### channel-specific costs; later -- include in customer-level calculations
    if availableBudget != None:
        solver.Add(solver.Sum([x[i, j, k]*cost[k]
                               for i in range(num_customers)
                               for j in range(num_products)
                               for k in range(num_channels)
                              ]) <= availableBudget)

### Get the channel constraints

Adjust the constraints for the sample size.

In [17]:
channels_df['minimum offers adjusted'] = channels_df['minimum offers']*sample_scaling
channels_df['maximum offers adjusted'] = channels_df['maximum offers']*sample_scaling
channels_df['minimum expenditure adjusted'] = channels_df['minimum expenditure']*sample_scaling
channels_df['maximum expenditure adjusted'] = channels_df['maximum expenditure']*sample_scaling
channels_df['minimum revenue adjusted'] = channels_df['minimum revenue']*sample_scaling
channels_df['maximum revenue adjusted'] = channels_df['maximum revenue']*sample_scaling

### Set the channel constraints

In [18]:
# minimums for channel
if channels_df['minimum offers adjusted'].notnull().any():
    for k in range(num_channels):
        if pd.notnull(channels_df.loc[channels[k], 'minimum offers adjusted']):
            solver.Add(solver.Sum([x[i, j, k]
                for i in range(num_customers)
                for j in range(num_products)
                ]) >= channels_df.loc[channels[k], 'minimum offers adjusted'])

# maxima for channel
if channels_df['maximum offers adjusted'].notnull().any():
    for k in range(num_channels):
        if pd.notnull(channels_df.loc[channels[k], 'maximum offers adjusted']):
            solver.Add(solver.Sum([x[i, j, k]
                for i in range(num_customers)
                for j in range(num_products)
                ]) <= channels_df.loc[channels[k], 'maximum offers adjusted'])

# minimums for channel
if channels_df['minimum expenditure adjusted'].notnull().any():
    for k in range(num_channels):
        if pd.notnull(channels_df.loc[channels[k], 'minimum expenditure adjusted']):
            solver.Add(solver.Sum([x[i, j, k]*cost[k]
                for i in range(num_customers)
                for j in range(num_products)
                ]) >= channels_df.loc[channels[k], 'minimum expenditure adjusted'])

# maximums for channel
if channels_df['maximum expenditure adjusted'].notnull().any():
    for k in range(num_channels):
        if pd.notnull(channels_df.loc[channels[k], 'maximum expenditure adjusted']):
            solver.Add(solver.Sum([x[i, j, k]*cost[k]
                for i in range(num_customers)
                for j in range(num_products)
                ]) <= channels_df.loc[channels[k], 'maximum expenditure adjusted'])

# minimums for channel
if channels_df['minimum revenue adjusted'].notnull().any():
    for k in range(num_channels):
        if pd.notnull(channels_df.loc[channels[k], 'minimum revenue adjusted']):
            print('Minimum revenue %d for %s' % (channels_df.loc[channels[k], 'minimum revenue adjusted'], 
                                                channels[k]))
            solver.Add(solver.Sum([x[i, j, k]*product_value.loc[idx[customerid_sample[i], products[j], channels[k]], 'value']
                for i in range(num_customers)
                for j in range(num_products)
                ]) >= channels_df.loc[channels[k], 'minimum revenue adjusted'])

# maximums for channel
if channels_df['maximum revenue adjusted'].notnull().any():
    for k in range(num_channels):
        if pd.notnull(channels_df.loc[channels[k], 'maximum revenue adjusted']):
            print('Maximum revenue %d (%d) for %s' % (channels_df.loc[channels[k], 'maximum revenue adjusted'], 
                                                      channels_df.loc[channels[k], 'maximum revenue'],
                                                channels[k]))
            solver.Add(solver.Sum([x[i, j, k]*product_value.loc[idx[customerid_sample[i], products[j], channels[k]], 'value']
                for i in range(num_customers)
                for j in range(num_products)
                ]) <= channels_df.loc[channels[k], 'maximum revenue adjusted'])

### Get the product constraints

Adjust the constraints for the sample size.

In [19]:
products_df['minimum offers adjusted'] = products_df['minimum offers']*sample_scaling
products_df['maximum offers adjusted'] = products_df['maximum offers']*sample_scaling
products_df['minimum expenditure adjusted'] = products_df['minimum expenditure']*sample_scaling
products_df['maximum expenditure adjusted'] = products_df['maximum expenditure']*sample_scaling
products_df['minimum revenue adjusted'] = products_df['minimum revenue']*sample_scaling
products_df['maximum revenue adjusted'] = products_df['maximum revenue']*sample_scaling


### Set the product constraints

In [20]:
# minima for product
if products_df['minimum offers adjusted'].notnull().any():
    for j in range(num_products):
        if pd.notnull(products_df.loc[products[j], 'minimum offers adjusted']):
            solver.Add(solver.Sum([x[i, j, k]
                for i in range(num_customers)
                for k in range(num_channels)
                ]) >= products_df.loc[products[j], 'minimum offers adjusted'])

# maxima for product
if products_df['maximum offers adjusted'].notnull().any():
    for j in range(num_products):
        if pd.notnull(products_df.loc[products[j], 'maximum offers adjusted']):
            solver.Add(solver.Sum([x[i, j, k]
                for i in range(num_customers)
                for k in range(num_channels)
                ]) <= products_df.loc[products[j], 'maximum offers adjusted'])

# minima for product
if products_df['minimum expenditure adjusted'].notnull().any():
    for j in range(num_products):
        if pd.notnull(products_df.loc[products[j], 'minimum expenditure adjusted']):
            solver.Add(solver.Sum([x[i, j, k]*cost[k]
                for i in range(num_customers)
                for k in range(num_channels)
                ]) >= products_df.loc[products[j], 'minimum expenditure adjusted'])

# maxima for product
if products_df['maximum expenditure adjusted'].notnull().any():
    for j in range(num_products):
        if pd.notnull(products_df.loc[products[j], 'maximum expenditure adjusted']):
            solver.Add(solver.Sum([x[i, j, k]*cost[k]
                for i in range(num_customers)
                for k in range(num_channels)
                ]) <= products_df.loc[products[j], 'maximum expenditure adjusted'])

# Is this causing the infeasible solution?

# minima for product
if products_df['minimum revenue adjusted'].notnull().any():
    for j in range(num_products):
        if pd.notnull(products_df.loc[products[j], 'minimum revenue adjusted']):
            print('Minimum revenue %d (%d) for %s' % (products_df.loc[products[j], 'minimum revenue adjusted'], 
                                                      products_df.loc[products[j], 'minimum revenue'],
                                                products[j]))

            solver.Add(solver.Sum([x[i, j, k]*product_value.loc[idx[customerid_sample[i], products[j], channels[k]], 'value']
                                  for i in range(num_customers)
                                  for k in range(num_channels)
                                  ]) >= products_df.loc[products[j], 'minimum revenue adjusted'])
            
# maxima for product
if products_df['maximum revenue adjusted'].notnull().any():
    for j in range(num_products):
        if pd.notnull(products_df.loc[products[j], 'maximum revenue adjusted']):
            print('Maximum revenue %d (%d) for %s.' % (products_df.loc[products[j], 'maximum revenue adjusted'], 
                                                      products_df.loc[products[j], 'maximum revenue'],
                                                products[j]))
            solver.Add(solver.Sum([x[i, j, k]*product_value.loc[idx[customerid_sample[i], products[j], channels[k]], 'value']
                for i in range(num_customers)
                for k in range(num_channels)
                ]) <= products_df.loc[products[j], 'maximum revenue adjusted'])

## Set the _objective function_

Set to maximise the revenue $R$. Here $x_{ijk}$ denotes whether customer $i$ receives an offer for product $j$ over channel $k$, $f_k$ denotes the channel adjustment factor, $v_j$ the product value and $p_{ij}$ the probability that customer $i$ takes up product $j$.

$ \max R = \sum_{ijk} x_{ijk} \times f_k \times v_j \times p_{ij}$


> At some point, need to be able to specify 
  1. What to optimize, and 
  2. Whether to maximise or minimise.  

> At the moment we maximise revenue, this could be profit, we could minimise budget, maximise profit or ~~maximise ROI.~~

In [21]:
#    solver.Minimize(solver.Sum([cost[i][j] * x[i, j] for i in range(num_workers)
#                                                     for j in range(num_tasks)]))
# optimize = 'Profit' # to do: get this from the interface
optimize = 'Revenue'
# optimize = 'Expenditure'
# require that the product_probs contain the value _already multiplied out_

if optimize == 'Revenue':
#     solver.Maximize(solver.Sum([x[i, j, k]*product_value.loc[idx[customerid_sample[i], products[j], channels[k]], 'value'].values[0]
    solver.Maximize(solver.Sum([x[i, j, k]*product_value.loc[idx[customerid_sample[i], products[j], channels[k]], 'value']

                               for i in range(num_customers)
                               for j in range(num_products)
                               for k in range(num_channels)]))
elif optimize == 'Profit':
#     solver.Maximize(solver.Sum([x[i, j, k]*(product_value.loc[idx[customerid_sample[i], products[j], channels[k]], 'value'].values[0]) - cost[k]
    solver.Maximize(solver.Sum([x[i, j, k]*(product_value.loc[idx[customerid_sample[i], products[j], channels[k]], 'value']) - cost[k]

                               for i in range(num_customers)
                               for j in range(num_products)
                               for k in range(num_channels)]))
elif optimize == 'Expenditure':
        solver.Minimize(solver.Sum([x[i, j, k]*cost[k]
                               for i in range(num_customers)
                               for j in range(num_products)
                               for k in range(num_channels)]))


### Invoke the solver

> Need a routine here to evaluate whether the solver is solving. That is, set the most iterations and a time limit.

In [22]:
# Invoke the solver
# t = time.process_time()
sol = solver.Solve()
# elapsed_time = time.process_time() - t
print('Solver completed with return value %d.' % sol)

Solver completed with return value 0.


> **To do:** If not returned 0, throw an error.

I guess `sol == 0` means that the solver correctly solved. Values of $1$ or $2$ mean something else.

Print out the solution. We can print out more information about the constraints. What happens in `xlwings` when the python routine prints – does it go to the logs?

In [23]:
report = [(channels[k], products[j], customerid_sample[i], x[i, j, k].solution_value()*cost[k],
          x[i, j, k].solution_value()*product_value.loc[idx[customerid_sample[i], products[j], channels[k]], 'value']) 
          for i in range(num_customers) 
          for j in range(num_products) 
          for k in range(num_channels)  
           if x[i, j, k].solution_value() > 0 # else 0
         ]

report_bd = pd.DataFrame(report, columns=['channel', 'product', 'customer', 'cost', 'revenue'])

print('Total revenue = %d' % (solver.Objective().Value()))
print('Total budget  = %d' % (report_bd['cost'].sum()) )


if trace:
    display(report_bd)

Total revenue = 1213
Total budget  = 32


Channel counts.

In [24]:
report_count = report_bd.groupby(['channel', 'product']).count()
report_count['channel'] = report_count.index.get_level_values('channel')
report_count['product'] = report_count.index.get_level_values('product')

The sample has given us the rough outline of the optimization. Using these figures, replicate using non-linear minimization.

In [25]:
n_obs_orig = num_customers
# n_obs_new = product_probs_all.shape[0]
n_obs_new = len(product_value.index.get_level_values('customerid').unique())

print(n_obs_orig, n_obs_new)



70 401347


In [26]:
# product_probs = product_probs_all

# n_obs = product_probs.shape[0]

adjustment_factor = n_obs_new/n_obs_orig
availableBudget = availableBudget_total

# product_probs.head()

In [27]:
# num_customers = n_obs_new

offer_scale = int(n_obs_new/n_obs_orig)

# get the offers from the original optimization by product and channel
sample_counts = pd.pivot_table(report_bd, index='channel', columns='product', values='customer', 
                                   aggfunc=len, fill_value=0)

offers = report_count
offers['n_offers'] = offers['customer']*offer_scale

In [28]:
offers

Unnamed: 0_level_0,Unnamed: 1_level_0,customer,cost,revenue,channel,product,n_offers
channel,product,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
DM,Daffodil Day Appeal,2,2,2,DM,Daffodil Day Appeal,11466
DM,September Appeal RG,3,3,3,DM,September Appeal RG,17199
EDM,September Appeal RG,3,3,3,EDM,September Appeal RG,17199
SMS,Daffodil Day Appeal,4,4,4,SMS,Daffodil Day Appeal,22932
SMS,September Appeal RG,56,56,56,SMS,September Appeal RG,321048
SMS,Tax Appeal,1,1,1,SMS,Tax Appeal,5733
TM,September Appeal RG,1,1,1,TM,September Appeal RG,5733


In [29]:

# product_offer = product_value.reset_index()

product_offer = product_value.reset_index().merge(offers[['channel', 'product', 'n_offers']], how='left', on=['channel', 'product']).dropna().pivot_table(values='value', columns=['product', 'channel'], index=['customerid'])

The variable `offers` has a MultiIndex. We want this for the `product_profit` data frame. We can construct it from `channels` and `products`.

In [30]:
# offers_ndx = pd.MultiIndex.from_product([channels, products], names=['channel', 'product'])
# product_profit = pd.DataFrame(index=product_value.index.get_level_values('customerid').unique(), columns=offers.index)

In [31]:
# product_offer.pivot_table(values='value', index='customerid', columns=['channel', 'product']).head()

In [32]:

# for cust in product_value.index.get_level_values('customerid').unique():
#     for ch in offers.index.get_level_values('channel').unique():
#         for pr in offers.index.get_level_values('product').unique():
#             product_profit.loc[cust, (ch, pr)] = product_value.loc[idx[cust, pr, ch], 'value']
        

# The world of R

As of yet, the non-linear minimization in Python has not worked properly, but it _has_ with R and `nlm()`. Until I can get it to work, the workaround is to use `rpy2` to run R from Python.

> **To do:** Get the non-linear minimization right in Python.

Import the requisite libraries.

In [33]:
import rpy2.robjects as robjects

from rpy2.robjects.packages import importr
# import R's "base" package
base = importr('base')

# import R's "utils" package
utils = importr('utils')
stats = importr('stats')
data_table = importr('data.table')

Select a mirror for R packages.

In [34]:
# utils.chooseCRANmirror(ind=1) # select the first mirror in the list

Install packages using R's `install.package`. (This needs to be done once.)

> **To do:** rewrite the routine to check if installed.

In [35]:
# R package names
# packnames = ('magrittr', 'dplyr', 'data.table', 'dtplyr', 'stringr')

# R vector of strings
# from rpy2.robjects.vectors import StrVector

# Selectively install what needs to be install.
# We are fancy, just because we can.
# for x in packnames:
#    if not(rpackages.isinstalled(x)):
#        utils.install_packages(StrVector(names_to_install))

All I need to run in R is the non-linear minimization, and whatever is needed to supply the appropiate data. Here is the original R code.

### The dual function (R)

```
            dual <- function(u, pp, offers) {
              if (dim(pp)[2] != length(u)) {
                print(c(dim(pp)[2], length(u)))
                stop("Mismatched dimensions")
                }
              d <- sweep(pp, 2, u)
              v <- apply(d, 1, max) 
              v[v < 0] <- 0
              y <- offers%*%u + sum(v)
              y
            }
```

### The optimisation (R)

```{r}
u_init <- offers*0
out <- nlm(dual, p=u_init, pp=product_profit, offers=offers, print.level = 1)
```

### Getting the solution (R)

```{r}
mindual <- out$minimum
u <- out$estimate
mindual
u
```

In [36]:
robjects.r('''
        # create a function `dual`
            dual <- function(u, pp, offers) {
              if (dim(pp)[2] != length(u)) {
                print(c(dim(pp)[2], length(u)))
                stop("Mismatched dimensions")
                }
              d <- sweep(pp, 2, u)
              v <- apply(d, 1, max) 
              v[v < 0] <- 0
              y <- offers%*%u + sum(v)
              y
            }
        ''')

R object with classes: ('function',) mapped to:
<SignatureTranslatedFunction - Python:0x11e0a53c8 / R:0x7fc7271e8008>

### Test the new function

To do this, need to create the `product_profit` array in R.

Import the required libraries and activate the interface between R and `pandas`.

In [37]:
from rpy2.robjects import r, pandas2ri
pandas2ri.activate()

# pp_columns = product_profit.columns # if I need them
# pp_columns = product_offer.columns # if I need them

# product_offer.to_csv('temp_product_offer.csv', index=False, header=False)

~~> **NB:** _This appears to kill the kernel._~~

In [38]:
# r_product_offer = data_table.fread('temp_product_offer.csv')
r_product_offer = data_table.as_data_table(product_offer)

In [39]:
r_product_offer

X..Daffodil.Day.Appeal....DM..,X..Daffodil.Day.Appeal....SMS..,X..September.Appeal.RG....DM..,X..September.Appeal.RG....EDM..,X..September.Appeal.RG....SMS..,X..September.Appeal.RG....TM..,X..Tax.Appeal....SMS..
2.847796,2.847796,3.992865,3.992865,3.992865,3.992865,2.847796
0.915791,0.915791,2.060860,2.060860,2.060860,2.060860,0.915791
2.788690,2.788690,3.933759,3.933759,3.933759,3.933759,2.788690
0.915791,0.915791,2.060860,2.060860,2.060860,2.060860,0.915791
...,...,...,...,...,...,...
0.915791,0.915791,2.060860,2.060860,2.060860,2.060860,0.915791
0.915791,0.915791,2.060860,2.060860,2.060860,2.060860,0.915791


This has decidedly unfriendly column names.

In [40]:
# u_test = robjects.FloatVector([11.2, 15, 6.02, 19.5])
r_offers = robjects.IntVector(offers['n_offers'])

~~Test the dual function.~~

In [41]:
dual = robjects.r['dual']
# dual(u_test, pp=r_product_offer, offers=r_offers)

Perform the non-linear minimisation.

In [42]:
u_init = robjects.FloatVector(0.0*offers['n_offers'])
# r_out = stats.nlm(dual, p=u_init, pp=r_product_profit, offers=r_offers, print_level=1)
r_out = stats.nlm(dual, p=u_init, pp=r_product_offer, offers=r_offers, print_level=1)

iteration = 0

Step:

[1]
 0
 0
 0
 0
 0
 0
 0


Parameter:

[1]
 0
 0
 0
 0
 0
 0
 0


Function Value

[1]
 6900948


Gradient:

[1]
  11466
  17199
  17199
  22932
 321048
   5733
   5733




iteration = 1

Parameter:

[1]
 0
 0
 0
 0
 0
 0
 0


Function Value

[1]
 6900948


Gradient:

[1]
    -694.5
    5038.5
 -171191.0
 -165458.0
  132658.0
 -182657.0
    3847.0




Last global step failed to locate a point lower than x.

Either x is an approximate local minimum of the function,
the function is too non-linear for this algorithm,
or steptol is too large.





Extract the estimate of $u$.

In [43]:
r_u = r_out.rx('estimate')[0]

u = [r_u[i] if abs(r_u[i]) > 1e-5 else 0 for i in range(len(r_u))] # ugly way to convert

d = product_offer.sub(u) 
v = d.max(axis = 1)
v[v<0] = 0
ndx = np.argsort(-v)
d['customerid'] = d.index

In [44]:
d.head()

product,Daffodil Day Appeal,Daffodil Day Appeal,September Appeal RG,September Appeal RG,September Appeal RG,September Appeal RG,Tax Appeal,customerid
channel,DM,SMS,DM,EDM,SMS,TM,SMS,Unnamed: 8_level_1
customerid,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
1,2.847796,2.847796,3.992865,3.992865,3.992865,3.992865,2.847796,1
17,0.915791,0.915791,2.06086,2.06086,2.06086,2.06086,0.915791,17
18,2.78869,2.78869,3.933759,3.933759,3.933759,3.933759,2.78869,18
23,0.915791,0.915791,2.06086,2.06086,2.06086,2.06086,0.915791,23
34,1.19685,1.19685,2.341918,2.341918,2.341918,2.341918,1.19685,34


## Allocate the optimised solution to customers

In [45]:
d_melt = pd.melt(d, id_vars=['customerid']).sort_values(by=['customerid', 'value'], ascending=[True, False])

# Delete the offers from `d_melt` where already completely allocated.

allocated_counts = offers['n_offers']*0

offers_include_df = offers[allocated_counts < offers['n_offers']]
# offers_include_df = pd.DataFrame(offers_include)

# offers_include_df.reset_index(inplace=True)

d_melt = d_melt.merge(offers_include_df[['channel', 'product']], on=['channel', 'product'])

In [46]:
d_melt.head()

Unnamed: 0,customerid,product,channel,value
0,1,September Appeal RG,DM,3.992865
1,17,September Appeal RG,DM,2.06086
2,18,September Appeal RG,DM,3.933759
3,23,September Appeal RG,DM,2.06086
4,34,September Appeal RG,DM,2.341918


Create the initial allocation using the maximum value in each group. Will need to update `d_melt` once the first offer has been fully allocated.

In [47]:
allocated_counts = offers.n_offers*0


# d_melt.groupby(['variable']).agg({'value':'first'}).head()

# d_alloc = d_melt.groupby(['customerid']).first().sort_values(by=['value'], ascending=False)
d_alloc = d_melt.sort_values('value', ascending=False).drop_duplicates('customerid')

alloc_list = []


Repeat the next part until every offer in `offers` is allocated.

In [48]:
counter = 0 # not sure where this goes yet
old_counter = counter
failsafe_threshold = 20000
failsafe = 0

while any(allocated_counts < offers.n_offers): # could be no more offers to allocate
    offers_to_alloc = (allocated_counts < offers.n_offers)
    
    # allocate while we haven't hit the limit for one of the offers
    while all((allocated_counts < offers.n_offers) == (offers_to_alloc)):
        selected_offer = d_alloc.iloc[counter - old_counter]
        allocated_counts.loc[(selected_offer['channel'], selected_offer['product'])] += 1
        counter += 1
#     print(allocated_counts)      
    # note: selected_offer will contain the offer that has just been completely allocated!!
#     print(selected_offer)
    
    ## allocate the selected offers
    
    print(counter, old_counter)
    d_alloc_select = d_alloc.iloc[[i for i in range(counter - old_counter)]]
    alloc_list.append(d_alloc_select)
    old_counter = counter
    
    ## delete the selected records from d_melt
    rows_to_keep = np.invert(d_melt.customerid.isin(d_alloc_select.customerid))
    d_melt = d_melt[rows_to_keep]
    
    ## delete the selected offers from the data frame
    offers_include = offers.loc[allocated_counts < offers.n_offers, 'n_offers']
    offers_include_df = pd.DataFrame(offers_include)
    print(offers_include_df.head())
    offers_include_df.reset_index(inplace=True)

    d_melt = d_melt.merge(offers_include_df[['channel', 'product']], on=['channel', 'product'])
    
    d_alloc = d_melt.sort_values('value', ascending=False).drop_duplicates('customerid')
    
    ## create the allocation data frame
    
    failsafe += 1
    if failsafe > failsafe_threshold:
        break # to protect against logic errors causing an infinite loop
    


26386 0
                             n_offers
channel product                      
DM      Daffodil Day Appeal     11466
        September Appeal RG     17199
EDM     September Appeal RG     17199
SMS     Daffodil Day Appeal     22932
        September Appeal RG    321048
62244 26386
                             n_offers
channel product                      
DM      Daffodil Day Appeal     11466
EDM     September Appeal RG     17199
SMS     Daffodil Day Appeal     22932
        September Appeal RG    321048
        Tax Appeal               5733
62910 62244
                             n_offers
channel product                      
DM      Daffodil Day Appeal     11466
SMS     Daffodil Day Appeal     22932
        September Appeal RG    321048
        Tax Appeal               5733
385497 62910
                             n_offers
channel product                      
DM      Daffodil Day Appeal     11466
SMS     Daffodil Day Appeal     22932
        Tax Appeal               5733
39847

In [49]:
print(counter - old_counter)
print(d_alloc.shape)
print(offers_to_alloc)
print(offers.n_offers)

0
(0, 4)
channel  product            
DM       Daffodil Day Appeal    False
         September Appeal RG    False
EDM      September Appeal RG    False
SMS      Daffodil Day Appeal     True
         September Appeal RG    False
         Tax Appeal             False
TM       September Appeal RG    False
Name: n_offers, dtype: bool
channel  product            
DM       Daffodil Day Appeal     11466
         September Appeal RG     17199
EDM      September Appeal RG     17199
SMS      Daffodil Day Appeal     22932
         September Appeal RG    321048
         Tax Appeal               5733
TM       September Appeal RG      5733
Name: n_offers, dtype: int64


In [50]:
d_alloc.head()

Unnamed: 0,customerid,product,channel,value


In [51]:
d_melt.shape

(0, 4)

Concatenate the allocation files to create the final allocation.

In [52]:
final_allocation = pd.concat(alloc_list)
final_allocation = final_allocation.reset_index(drop=True)

## Calculate the profit and cost

In [53]:
# product_profit_allocated = pd.merge(pd.melt(product_profit.reset_index(), id_vars='customerid'), 
#          final_allocation.drop('value', axis=1), on=['customerid', 'channel', 'product'], how="inner")

product_profit_allocated = pd.merge(pd.melt(product_offer.reset_index(), id_vars='customerid'), 
         final_allocation.drop('value', axis=1), on=['customerid', 'channel', 'product'], how="inner")

In [54]:
channel_costs = pd.DataFrame(cost).reset_index()


In [55]:
product_profit_allocated = pd.merge(product_profit_allocated, channel_costs, on='channel', how='left')


In [56]:
def my_agg(df):
    names = {
        'offers':  df['value'].count(),
        'revenue': df['value'].sum(),
        'expenditure': df['cost'].sum()
    }
    return pd.Series(names, index=['offers', 'revenue', 'expenditure'])

In [57]:
def summarize_benefits(df, grouper=None):
    if grouper == None:
        df['Total'] = 'Total'
        grouper = 'Total'
    df_grouped = df.groupby(grouper).apply(my_agg) 
    df_grouped['ROI'] = df_grouped['revenue']/df_grouped['expenditure']
#     df_grouped['investigations closed'].apply(lambda x: x if x > 0 else 1)

#     inv_formats = {
#         'offers': '{:,.0f}',
#         'revenue': '${:,.0f}',
#         'expenditure': '${:,.0f}',
#         'ROI': '{:.1%}'
#     }

#     return df_grouped.style.format(inv_formats)
    return df_grouped

## There should be a better way of doing this, as I am referencing each of these multiple times.

In [58]:
ppa_total = summarize_benefits(product_profit_allocated)
ppa_channel = summarize_benefits(product_profit_allocated, 'channel')
ppa_product = summarize_benefits(product_profit_allocated, 'product')

Write to sheet `data_python`.

  * Need to clear these sheets first.

In [59]:
data_python_sheet = wb.sheets['data_total']
data_channel_sheet = wb.sheets['data_channel']
data_product_sheet = wb.sheets['data_product']

In [60]:
data_python_sheet.range('A1').value = ppa_total
data_channel_sheet.range('A1').value = ppa_channel
data_product_sheet.range('A1').value = ppa_product

## The end (for now)

> **To do**: write customer file in Excel or to file if large.