# Conditional Choice Probabilitiy Estimators in 4 Easy Steps!

## Author: Eric Schulman

The following guide demonstrates how to use a conditional choice probability estimators in Python. It was written in part as a homework for the University of Texas second year course in industrial organization. These estimators have become the normal way to think about how various factors influence decisions that depend on the future in industrial organization and related fields.

To demonstrate how to use an implement a CCP estimator, we recover parameters for the cost function in Rust 1987. Rust's paper considers the decision of a bus manager. The bus manager had to decide whether or not to replace a bus engine for his fleet of buses in Madison Wisconsion. This is a very simple 'yes' or 'no' decision. However 'yes' or 'no' depend on the future. The goal is to recover parameters that tell us the importance of mileage when the bus manager decides to replace the engines. 

This decision problem is really general. You can think of it as a 'tree cutting' problem. Agents do not only think about the present when making this decision. In order to think about the future, you must consider a Bellman equation that contains the expected value of 'yes' and 'no'. This function is called the value funciton. Rust's approach involved calculating a value function explicitly. He alternated between picking paramters using MLE and then estimating parameters in the value function. 

John Rust's website:
https://editorialexpress.com/jrust/nfxp.html


CCP estimators differ from Rust's original approach because of their focus on prediciton. Instead of looking for a value function and estimating its likelihood to find parameters, you solve for the parameters using the choice probailities in the data. Once you have this value function, you can recalculate what the the choice probabilities, would be if you varied the model paramters. More over, you can maximize the likelihood of the 'conditional' choice probabilities to find consistent estimates of the parameters in the model.

This approach was first discovered formally in Joseph Hotz and Bob Miller's paper
https://www.jstor.org/stable/2298122

However, the code and data I modeled this guide from comes from Victor Aguirregabiria and Pedro Mira's website (more on them later):
http://individual.utoronto.ca/vaguirre/wpapers/program_code_survey_joe_2008.html

In [2]:
import pandas as pd
import math
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt

from scipy.interpolate import interp1d #pre written interpolation function
from statsmodels.base.model import GenericLikelihoodModel
from scipy import stats #for kernel function

## Step 1: Pre-processing the data and Constants

In [33]:
#fix the bus .dat from augirregabiria and Mira's website
data = np.fromfile('bus1234.dat')
data = data.reshape(len(data)/6,6)
data = pd.DataFrame(data,columns=['id','group','year','month','replace','miles'])

#save to .csv so other people don't need to be confused
data.to_csv("bus1234.csv")

#divide by 1e6 (use the same scale are Rust and AM)
data['miles'] = (data['miles'])/1e6

#switch to date time for ease 
data['date'] = pd.to_datetime(data[['year', 'month']].assign(Day=1))
data = data[['id','group','date','replace','miles']]

#lag date
date_lag = data.copy()
date_lag['date'] = date_lag['date'] - pd.DateOffset(months=1)
data = data.merge(date_lag, how='left', on=['id','group','date'] , suffixes=('','_next'))
data = data.dropna()

print data.max()

id                              162
group                        530875
date            1985-04-01 00:00:00
replace                           1
miles                      0.388254
replace_next                      1
miles_next                 0.388254
dtype: object


In [19]:
#constants
BETA = .9999
GAMMA = .5772 #euler's constant

#size of step in discretization
STEP = .002

#make states global variables
STATES = np.arange(data['miles'].min(),data['miles'].max() + STEP, STEP)

## Step 2: Calculating Choice Probabilities 'Non-Parametrically'

In [12]:
def miles_pdf(i_obs, x_obs, x_next):
    """estimation of mileage pdf following AM using the
    kernel function
    
    this corresponds to pdfdx in AM's code"""
    
    #figure out max number of steps
    dx = (1-i_obs)*(x_next - x_obs) + i_obs*x_next
    
    #number of 'transition' states
    dx_states = np.arange(dx.min(),dx.max() +STEP , STEP)
    
    #use kernel groups to make pdf
    kernel1 = stats.gaussian_kde(dx, bw_method='silverman')
    pdfdx = kernel1(dx_states)
    
    return np.array([pdfdx/pdfdx.sum()]).transpose()


MILES_PDF = miles_pdf(data['replace'], data['miles'], data['miles_next'])

In [13]:
def transition_1(i_obs, x_obs , x_next):
    """calculate transitions probabilities,
    non-parametrically
    
    this corresponds to fmat2 in AM's code"""
    
    #transitions when i=1
    pdfdx = miles_pdf(i_obs, x_obs, x_next).transpose()
    
    #zero probability of transitioning to large states
    zeros = np.zeros( (len(STATES),len(STATES)-pdfdx.shape[1]) )
    
    #transitioning to first state and 'jumping' dx states
    fmat1 = np.tile(pdfdx,(len(STATES),1))
    fmat1 = np.concatenate( (fmat1, zeros), axis=1 )

    return fmat1

FMAT1 = transition_1(data['replace'], data['miles'],data['miles_next'])

In [14]:
def transition_0(i_obs, x_obs , x_next):
    """calculate transitions probabilities,
    non-parametrically
    
    this corresponds to fmat1 in AM's code"""
    
    pdfdx = miles_pdf(i_obs, x_obs, x_next).transpose()
    
    #initialize fmat array, transitions when i=0
    end_zeros = np.zeros((1, len(STATES) - pdfdx.shape[1]))
    fmat0 = np.concatenate( (pdfdx, end_zeros), axis=1 )

    for row in range(1, len(STATES)):
        
        #this corresponds to colz i think
        cutoff = ( len(STATES) - row - pdfdx.shape[1] )
        
        #case 1 far enough from the 'end' of the matrix
        if cutoff >= 0:
            start_zeros = np.zeros((1,row))
            end_zeros = np.zeros((1, len(STATES) - pdfdx.shape[1] - row))
            fmat_new = np.concatenate( (start_zeros, pdfdx, end_zeros), axis=1 )
            fmat0 = np.concatenate((fmat0, fmat_new))
       
        #case 2, too far from the end and need to adjust probs
        else:
            pdf_adj = pdfdx[:,0:cutoff]
            pdf_adj = pdf_adj/pdf_adj.sum(axis=1)
            
            start_zeros = np.zeros((1,row))
            fmat_new = np.concatenate( (start_zeros, pdf_adj), axis=1 )
            fmat0 = np.concatenate((fmat0, fmat_new))
            
    return fmat0

FMAT0 = transition_0(data['replace'],data['miles'],data['miles_next'])

PR_TRANS = FMAT0, FMAT1

In [15]:
def initial_pr(i_obs, x_obs, d=0):
    """initial the probability of view a given state following AM.
    Seems like it just involves logit to predict
    
    Third arguement involves display"""
    
    X = np.array([x_obs, x_obs**2, x_obs**3]).transpose()
    X = sm.add_constant(X)
    
    model = sm.Logit(i_obs,X)
    fit = model.fit(disp=d)
    if d: print fit.summary()
    
    x_states = np.array([STATES, STATES**2, STATES**3]).transpose()
    x_states = sm.add_constant(x_states)
    
    return fit.predict(x_states)

PR_OBS = initial_pr(data['replace'], data['miles'], d=1)

Optimization terminated successfully.
         Current function value: 0.036201
         Iterations 23
                           Logit Regression Results                           
Dep. Variable:                replace   No. Observations:                 8156
Model:                          Logit   Df Residuals:                     8152
Method:                           MLE   Df Model:                            3
Date:                Tue, 15 Jan 2019   Pseudo R-squ.:                  0.1671
Time:                        14:20:03   Log-Likelihood:                -295.26
converged:                       True   LL-Null:                       -354.51
                                        LLR p-value:                 1.623e-25
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const        -17.3136      4.188     -4.134      0.000     -25.522      -9.105
x1           149.3089     56

## Step 3: Alternative Value Function Representation

To do CCP estimation, you figure out the probability of 'Yes' and 'No' in the data without considering the process that agents use to make their decision. To recover the parameters involved with agents decisions you relate these parameters to the probability of 'yes' and 'no' in the data. Formally, this invovles solving for the value function in terms of the probability of replacment and the parameters.

You can find a formal derivation of this in Hotz Miller (1993) and Aguirregabira Mira (2002). However the two key relationships are states below:

In [34]:
def hm_value(params, cost, pr_obs, pr_trans):
    """calculate value function using hotz miller approach"""
    
    #set up matrices, transition is deterministic
    trans0, trans1 = pr_trans
    
    #calculate value function for all state
    pr_tile = np.tile( pr_obs.reshape( len(STATES) ,1), (1, len(STATES) ))
    
    denom = (np.identity( len(STATES) ) - BETA*(1-pr_tile)*trans0 - BETA*pr_tile*trans1)
    
    numer = ( (1-pr_obs)*(cost(params, STATES, 0) + GAMMA - np.log(1-pr_obs)) + 
                 pr_obs*(cost(params, STATES, 1) + GAMMA - np.log(pr_obs) ) )
    
    value = np.linalg.inv(denom).dot(numer)
    return value

In [35]:
def hm_prob(params, cost, pr_obs, pr_trans):
    """calculate kappa (i.e. CCP likelihood) using value function"""
    
    value = hm_value(params, cost, pr_obs, pr_trans)
    value = value - value.min() #subtract out smallest value
    trans0, trans1 = pr_trans
    
    delta1 = np.exp( cost(params, STATES, 1) + BETA*trans1.dot(value))
    delta0 = np.exp( cost(params, STATES, 0) + BETA*trans0.dot(value) )
    
    return delta1/(delta1+delta0)

## Step 4: (Psuedo) Maximum Likelihood Estimaton

In [28]:
class CCP(GenericLikelihoodModel):
    """class for estimating the values of R and theta
    using the CCP routine and the helper functions
    above"""
    
    def __init__(self, i, x, x_next, params, cost, **kwds):
        """initialize the class
        
        i - replacement decisions
        x - miles
        x_next - next periods miles
        params - names for cost function parameters
        cost - cost function specification, takes agruements (params, x, i) """
        
        super(CCP, self).__init__(i, x, **kwds)
        
        #data
        self.endog = i #these names don't work exactly
        self.exog = x #the idea is that x is mean indep of epsilon
        self.x_next = x_next
        
        #transitions
        self.pr_obs = initial_pr(i, x)
        self.trans =  transition_0(i,x,x_next), transition_1(i,x,x_next)
        
        #should probably make these class parameters
        self.num_states = ( x.max()/STEP).astype(int) + 2
        self.states = np.arange(x.min(),x.max() + STEP, STEP)
        
        #initial model fit
        self.cost = cost
        self.num_params = len(params)
        self.data.xnames =  params
        self.results = self.fit( start_params=np.ones(self.num_params) )
        
        
    def nloglikeobs(self, params, v=False):
        """psuedo log likelihood function for the CCP estimator"""
        
        # Input our data into the model
        i = self.endog
        x = (self.exog/STEP).astype(int)*STEP #discretized x
           
        #set up hm state pr
        prob = hm_prob(params, self.cost, self.pr_obs, self.trans).transpose()
        prob = interp1d(self.states, prob)
        prob = prob(x)
        
        log_likelihood = (1-i)*np.log(1-prob) + i*np.log(prob)
        
        return -log_likelihood.sum()
    
    
    def iterate(self, numiter):
        """iterate the Hotz Miller estimation procedure 'numiter' times"""
        i = 0
        while(i < numiter):
            #update pr_obs based on parameters
            self.pr_obs = hm_prob(self.results.params, self.cost, self.pr_obs, self.trans)
            
            #refit the model
            self.results = self.fit(start_params=np.ones(self.num_params))
            i = i +1

### Linear Costs

In [31]:
#define cost functon using lambda expression
LINEAR_COST = lambda params, x, i: (1-i)*x*params[i] + i*params[i]

model_ccp = CCP(data['replace'], data['miles'], data['miles_next'], ['theta1','theta2'], LINEAR_COST)
print model_ccp.results.summary()

Optimization terminated successfully.
         Current function value: 0.036544
         Iterations: 63
         Function evaluations: 120
                                 CCP Results                                  
Dep. Variable:                replace   Log-Likelihood:                -298.05
Model:                            CCP   AIC:                             598.1
Method:            Maximum Likelihood   BIC:                             605.1
Date:                Tue, 15 Jan 2019                                         
Time:                        14:37:02                                         
No. Observations:                8156                                         
Df Residuals:                    8155                                         
Df Model:                           0                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------------------------------

### Quadratic Costs

We can see that the change in specification does not drastically change the estimates. Considering the limited data, the cost function is probability not identified.

In [27]:
QUAD_COST = lambda params, x, i: (1-i)*(x*params[0] + x**2*params[1]) + i*params[2]

model_ccp = CCP(data['replace'], data['miles'], data['miles_next'], ['theta1','theta2', 'theta3'], QUAD_COST)
print model_ccp.results.summary()

Optimization terminated successfully.
         Current function value: 0.036261
         Iterations: 147
         Function evaluations: 260
                                 CCP Results                                  
Dep. Variable:                replace   Log-Likelihood:                -295.75
Model:                            CCP   AIC:                             593.5
Method:            Maximum Likelihood   BIC:                             600.5
Date:                Tue, 15 Jan 2019                                         
Time:                        14:35:24                                         
No. Observations:                8156                                         
Df Residuals:                    8155                                         
Df Model:                           0                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
----------------------------------------------------------------------

## Step 5: Iterating the Model

It turns out that you can iterate upon these estimates to converge to the true policy function in Rust 1987. I iterate the value function below. For more information on this you can see.

Victor Aguirregabiria and Pedro Mira's website:
http://individual.utoronto.ca/vaguirre/wpapers/program_code_survey_joe_2008.html

Victor Aguirregabiria and Pedro Mira's 2002 paper
https://www.jstor.org/stable/3082006


In [30]:
model_ccp = CCP(data['replace'], data['miles'], data['miles_next'], ['theta1','theta2'], LINEAR_COST)
model_ccp.iterate(2)
print model_ccp.results.summary()

Optimization terminated successfully.
         Current function value: 0.036544
         Iterations: 63
         Function evaluations: 120
Optimization terminated successfully.
         Current function value: 0.036530
         Iterations: 62
         Function evaluations: 117
Optimization terminated successfully.
         Current function value: 0.036528
         Iterations: 63
         Function evaluations: 118
                                 CCP Results                                  
Dep. Variable:                replace   Log-Likelihood:                -297.93
Model:                            CCP   AIC:                             597.9
Method:            Maximum Likelihood   BIC:                             604.9
Date:                Tue, 15 Jan 2019                                         
Time:                        14:35:56                                         
No. Observations:                8156                                         
Df Residuals:                 