<a href="https://colab.research.google.com/github/pmontman/tmp_choicemodels/blob/main/nb/tutorials/WK_10_tuto_efficient_designs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How to design efficient experiments

In this lecture we are going to

* Properly define what is an efficient design
* Use the mathematical definition to create good designs
* Give some final guidelines on design of experiments for choice modelling

In [None]:
import pandas as pd
import numpy as np

In [None]:
betas = np.matrix(' 1 1; 2 2')

In [None]:
def choice_prob(betas, X):
  V = np.matmul(X, betas)
  P = np.exp(V)
  return P / np.sum(P, axis = 1)

In [None]:
colnames = ['price_apple', 'size_apple', 'os_apple', 'price_android', 'size_android', 'os_android']

# Working eample, discrete choice experiment for automobile preferences.

We want to understand population preferences for cars, we will consider the following variables.

`price`, `power`, `engine_type`

Ignoring all realistic values, let's go for:

* Prices: 20000, 30000, 40000 AUD.
* Consider power: 130 hp, 170 hp, 220 hp.
* Engine types, encoded as integer initially: 0=petrol, 1=diesel, 2=hybrid, 3=electric.



#Creating all combinations of variables and values

We can create the full factorial design in python by using the cartesian product
`cartesian` function.

Here is an example of use.

In [None]:
from sklearn.utils.extmath import cartesian
pd.DataFrame(cartesian(([800.0, 1000.0, 1200.0], [4.7, 5.8], [1.8, 3.2])), columns=['price', 'size', 'speed'])

Unnamed: 0,price,size,speed
0,800.0,4.7,1.8
1,800.0,4.7,3.2
2,800.0,5.8,1.8
3,800.0,5.8,3.2
4,1000.0,4.7,1.8
5,1000.0,4.7,3.2
6,1000.0,5.8,1.8
7,1000.0,5.8,3.2
8,1200.0,4.7,1.8
9,1200.0,4.7,3.2


In practice the full factorial could be too large to compute. We do not need to
actually compute it all at the same time.  


#Efficiency

Recall that we want to find the subset of $N$ rows from the full experiment that maximizes the efficiency of the resulting experiment.
There are several concepts of efficiency, relatively similar but we will
focus on $D-efficiency$. Roughly speakling, D-efficiency want to make the covariance matrix for the coefficients, $\text{covariance}(B)$, as 'small' as possible.
In discrete choice, the formula for the covariance matrix of the coefficients is a bit more complex than for linear regression.


$$\text{covariance}(\beta) = (Z' P Z )^{-1}$$

when working with $J$ alternatives:
*  $P$ is the matrix of choice probabilities computed by the model.
* $Z$ is similar to design matrix, but 'centered' using the choice probabilities. Basically, to each row of observations, we substract the weighted mean of the variables across all alternatives. The weights are the choice probabilities computed by the model.

 $$z_{jn} = x_{jn} - \sum_{i=1}^Jx_{in}P_{in}$$

To compute the $Z$ matrix, we need the 'choice probabilities'. In our context, we do not yet know these choice probabilities, so we need to work with an initial guess of them. This initial guess usually comes from an 'initial' value for the coefficients that creates equal choice probs, basically a 'no-information' stating model. In some cases, we might get a good starting guess, from example, if we have data of a similar problem or from a similar experiment.

In [None]:
def cov_mnl(Xj, J, betas):
  Xj = np.hsplit(np.array(Xj), 2)
  P = np.hstack( [np.matmul(Xj[0], betas[0].T ), np.matmul(Xj[1], betas[0].T )])
  P = np.exp(P)
  PP = P / np.sum(P, axis = 1)
  P0D = np.diag(np.array(PP[:,0].flatten()[0].T[:]).T[0])
  return np.linalg.inv(np.matmul( np.matmul(Xj[0].T, P0D), Xj[0]))

And now we calculate

In [None]:
 sub_fact = np.array(full_factorial)[np.random.choice(full_factorial.shape[0], 10, replace=False), :]

In [None]:
betas = [ np.matrix('0.5 0.1 1.1')]
betas[0]

matrix([[0.5, 0.1, 1.1]])

In [None]:
pd.DataFrame(cov_mnl(sub_fact, 2, betas))

Unnamed: 0,0,1,2
0,9e-06,-0.00232,0.000663
1,-0.00232,0.834454,-0.59347
2,0.000663,-0.59347,0.80061


In [None]:
def deffic_mnl(X, J, betas):
  covX = cov_mnl(X, J, betas)
  return np.power( np.linalg.det(covX), 1 / (covX.shape[0] + 1) )

In [None]:
deffic_mnl(sub_fact, 2, betas)

0.015101961211840247

In [None]:
deffic_mnl(full_factorial, 2, betas)

0.001539231431536287

#Relationship to the principles of design of experiments

Recall the four principles

1. Level balance
2. Orthogonality
3. Minimal level overlap
4. Utility balance


These principles are all summarized in the D-efficiency, meaning that they are 'rules of thumb' to create designs with good efficiency. Nowadays we can just put the computer to work to find a good design, before that, we used to pick the design manually by following the principles... It is important to get an intuition on how it works.


# Example: Level balance and overlap



In [None]:
np.random.seed(1234) 
sub_fact = np.array(full_factorial)[np.random.choice(full_factorial.shape[0], 20, replace=False), :]
sub_fact

array([[ 800. ,    4.7,    1.8, 1000. ,    5.8,    1.8],
       [1200. ,    4.7,    1.8, 1000. ,    4.7,    3.2],
       [1000. ,    4.7,    3.2,  800. ,    5.8,    3.2],
       [1000. ,    5.8,    3.2, 1000. ,    4.7,    1.8],
       [1200. ,    4.7,    3.2, 1200. ,    4.7,    3.2],
       [ 800. ,    5.8,    3.2, 1000. ,    4.7,    1.8],
       [1200. ,    4.7,    3.2,  800. ,    5.8,    3.2],
       [1200. ,    4.7,    1.8,  800. ,    5.8,    1.8],
       [ 800. ,    5.8,    1.8,  800. ,    5.8,    3.2],
       [ 800. ,    5.8,    3.2, 1000. ,    5.8,    3.2],
       [1000. ,    4.7,    3.2, 1000. ,    5.8,    1.8],
       [1000. ,    4.7,    1.8, 1200. ,    5.8,    1.8],
       [1000. ,    4.7,    1.8,  800. ,    5.8,    3.2],
       [ 800. ,    5.8,    1.8,  800. ,    4.7,    3.2],
       [1000. ,    5.8,    1.8, 1000. ,    5.8,    3.2],
       [ 800. ,    5.8,    3.2, 1000. ,    5.8,    1.8],
       [1000. ,    5.8,    3.2,  800. ,    4.7,    3.2],
       [ 800. ,    5.8,    3.2,

In [None]:

sub_fact = sub_fact[[ 0, 5, 8, 9, 13, 15,  17, 2],:]


In [None]:
sub_fact

array([[ 800. ,    4.7,    1.8, 1000. ,    5.8,    1.8],
       [ 800. ,    5.8,    3.2, 1000. ,    4.7,    1.8],
       [ 800. ,    5.8,    1.8,  800. ,    5.8,    3.2],
       [ 800. ,    5.8,    3.2, 1000. ,    5.8,    3.2],
       [ 800. ,    5.8,    1.8,  800. ,    4.7,    3.2],
       [ 800. ,    5.8,    3.2, 1000. ,    5.8,    1.8],
       [ 800. ,    5.8,    3.2,  800. ,    4.7,    1.8],
       [1000. ,    4.7,    3.2,  800. ,    5.8,    3.2]])

In [None]:
deffic_mnl(sub_fact, 2, betas)

0.025075612050753034

Compare with random experiments of the same size (look at the largest efficiency in a random search of experiments of 8 rows).

In [None]:
np.random.seed(1234) 
[deffic_mnl(np.array(full_factorial)[np.random.choice(full_factorial.shape[0], 8, replace=False), :], 2, betas) for i in range(20)]

[0.012616982133300486,
 0.011695402568277412,
 0.03184815359360849,
 0.03509450126161441,
 0.012031835107409962,
 0.012735934085607309,
 0.012301612107096209,
 0.024778969051915976,
 0.012241126754476505,
 0.015347405493874052,
 0.016536761463146362,
 0.015980132366145514,
 0.017566893133088412,
 0.008988701322920645,
 0.013424721792247204,
 0.02550199312206731,
 0.05737907706040104,
 0.016783149094670952,
 0.011277858292034334,
 0.019826977666191882]

#Orthogonality

We pick rows that cannot tell attribute 
(column) 2 vs 3.

In [None]:
np.random.seed(1234) 
sub_fact = np.array(full_factorial)[np.random.choice(full_factorial.shape[0], 20, replace=False), :]
sub_fact
sub_fact_orth = sub_fact[[ 0, 1, 3, 7, 9, 11, 12, 15],:]
sub_fact_orth

array([[ 800. ,    4.7,    1.8, 1000. ,    5.8,    1.8],
       [1200. ,    4.7,    1.8, 1000. ,    4.7,    3.2],
       [1000. ,    5.8,    3.2, 1000. ,    4.7,    1.8],
       [1200. ,    4.7,    1.8,  800. ,    5.8,    1.8],
       [ 800. ,    5.8,    3.2, 1000. ,    5.8,    3.2],
       [1000. ,    4.7,    1.8, 1200. ,    5.8,    1.8],
       [1000. ,    4.7,    1.8,  800. ,    5.8,    3.2],
       [ 800. ,    5.8,    3.2, 1000. ,    5.8,    1.8]])

In [None]:

deffic_mnl(sub_fact_orth, 2, betas)

0.028968245815118584

# The workflow

1) Define attributes and levels


2) Pilot Studuy

3) Design of the Experiment

4) Design the Survey

5) Conduct the survey and data analysis

# Recommendations


* **Which variables should we choose?**
 Create an exhaustive list of attributes, the reduce it to a number between 3 to 7 by discarding some and mergin others (important combinations of a pair of attributes). For example, screen size and speed can be merged if these do not really vary independently (no small fast smartphones), just create a new categorical attribute with a few levels for the realistic combination.

* **How do we choose the levels?**
 Try a large range and pick the best subset using a computer.

* **How many alternatives**
 From 2 to 3 alternatives can be handled by people before getting into decision fatigue.