#  Assignment 3: Structural Econometrics
## Eric Schulman

Solutions to ECO 388E assignment 3 at the University of Texas written by Eric Schulman

In [1]:
import pandas as pd
import math
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt

from scipy.interpolate import interp1d #pre written interpolation function
from statsmodels.base.model import GenericLikelihoodModel

### Part 1

The value for a machine of age $a_t$ is

$$ V(a_t, \epsilon_t; \theta) = \underset{i_t}{max} E[\sum_{\tau=t}^\infty \beta^{\tau-t}\pi(a_\tau, i_\tau, \epsilon_{\tau}, ;\theta)|a_t; \theta] $$ 


As a result we can write a Bellman equation for the firm maximizing future profits is expressed below:

$$V(a_t,i_t,\epsilon_{0t},\epsilon_{1t}) = \pi(a_t, i_t, \epsilon_{1t}, \epsilon_{0t}) + max_{i_{t+1}} \beta E(V(a_{t+1},i_{t+1},\epsilon_{0t+1},\epsilon_{1t+1}) |a_t, i_t; \theta)$$

Because of the conditional independence assumption $\epsilon_{it}$ is not serially correlated.

### Part 2

In class, $x_t$ was continuous. Here $a_t$ is discrete. Additionally, we did not parameterize $c(x_t, \theta)$. Here $c(a_t,\theta) = \theta a_t$, As a result, $c(0,\theta) = 0$

### Part 3

The code below is designed to calculate the value function using forward recursion. We determine the initial value using a contraction mapping iterating forward until it converges.

The value function is a 5x2 array. The rows contain the values when the state is $a_t$. The columns contain the value based on $i_t$.

In [2]:
BETA = .9
GAMMA = .5772 #euler's constant

def value_helper(a, theta1, cost, v_init):
    """helper function for calculating value function.
    
    given the value for the first period, calculate forward 
    'a' periods"""
    if a <= 0: #initial period
        return  v_init  
    else:
        v_next = value_helper(a-1, theta1, cost, v_init)
        v_0 = a*theta1 + BETA*(GAMMA + np.log( np.exp(v_next[-1][0]) + np.exp(v_next[-1][1]) ))  
        v_1 = cost + BETA*(GAMMA + np.log(  np.exp(v_next[-1][0]) + np.exp(v_next[-1][1]) )) 
        
        return np.concatenate( ( v_next,[[v_0,v_1]]) )


v = value_helper(5, -1, -3, [[0,0]])

In [3]:
def value_function(a, theta1, cost, v_init, error, maxiter):
    """solve for the first period of the value function
    with the contraction mapping loop
    
    You can choose how far into the future it goes by setting max_periods"""
    
    #only need to iterate 1 periods into the future
    v = value_helper(1, theta1, cost, v_init)
    
    #stop iterating when the last two periods look the same
    while ( maxiter >= 0  
           and ( abs(  v[1,0] - v[0,0] )  > error
           or abs(  v[1,1] - v[0,1] )  > error) ):
        
        #recompute value function until convergence
        return value_function(a, theta1, cost, [v[1,:]], error, maxiter-1)  
    
    return value_helper(a, theta1, cost, [v[1,:]])[1:,:]

### Part 4

* Below I solve the model when $\theta_1 = -1$ and $R = -3$. 

* When $a_t = 2$, we can see that $V(2,0) - V(2,1) = 1$ so the firm chooses not to replace the engine. If the value of $\epsilon_{0t} - \epsilon_{1t}$ exceeds 1, then the firm will choose to replace the engine in period 2.

* I calculate the probability of this difference below using the exterme value distribution.

* Below I also calulate the value function when $a_t = 4$, $\epsilon_{0t} = 1$ and $\epsilon_{1t} =-1.5$. It is still cheaper to replace this period than to wait until period 5 to replace.

In [4]:
v = value_function(5, -1, -3, [[0,0]] , .001 , 100)
print '1. Value Function:'
print pd.DataFrame(v)

#difference between e0 and e1
diff = v[1,0] - v[1,1]
print '\n2. V(2,0) - V(2,1) = %s'%diff

#probability of this different
print '\n3. Likelihood: %s'%( np.exp(-diff)/(1+np.exp(-diff)) )

#PDV a = 4, e0= 1, e1=-1.5
print '\n4. PDV: %s'%np.maximum(-3 + 1 + BETA*v[3,0], -4 + -1.5 +BETA*v[3,1] )

1. Value Function:
           0         1
0  -3.655248 -5.655248
1  -4.656008 -5.656008
2  -6.388992 -6.388992
3  -8.606780 -7.606780
4 -11.044687 -9.044687

2. V(2,0) - V(2,1) = 1.0

3. Likelihood: 0.2689414213699951

4. PDV: -9.74610218495787


### Part 5 - 6 

Below I calculate the value of $\theta_1$ and $R$ and standard errors. 

The likelihood of $i_t$ is conditional on $a_t$ because my decision to replace this period depends on how old the engine is. The expected future costs, which I base my decision on, depend on the current age of the engine.

In [5]:
#load data into memory for part 5
data = np.loadtxt("data.asc")

In [6]:
class Rust(GenericLikelihoodModel):
    """class for estimating the values of R and theta"""
    
    def nloglikeobs(self, params, v=False):
        
        theta1, R = params
        
        # Input our data into the model
        i = self.exog[:,0] #reshape
        a = self.endog.astype(int)
        
        #solve value function based on params
        v = value_function(5, theta1, R, [[0,0]] , .01 , 100)
        
        #interpolate using scipy (easier than indexing)
        v0 = interp1d(range(1,6), v[:,0],fill_value="extrapolate")
        v1 = interp1d(range(1,6), v[:,1],fill_value="extrapolate")
        
        diff = v1(a) - v0(a)
    
        #calculate likelihood of each obs
        pr0 = 1/(1+np.exp(diff))
        pr1 = np.exp(diff)/(1+np.exp(diff))

        likelihood = (1-i)*pr0 + i*pr1
        return -np.log(likelihood).sum()
    
    
    def fit(self, start_params=None, maxiter=1000, maxfun=5000, **kwds):
        if start_params == None:
            start_params = [1,1]
        return super(Rust, self).fit(start_params=start_params,
                                       maxiter=maxiter, maxfun=maxfun, **kwds)
    
    
model = Rust(data[:,0],data[:,1])

result = model.fit()
print(result.summary(xname=['theta_1', 'R']))

Optimization terminated successfully.
         Current function value: 0.425320
         Iterations: 55
         Function evaluations: 104
                                 Rust Results                                 
Dep. Variable:                      y   Log-Likelihood:                -425.32
Model:                           Rust   AIC:                             852.6
Method:            Maximum Likelihood   BIC:                             857.5
Date:                Thu, 20 Dec 2018                                         
Time:                        12:29:17                                         
No. Observations:                1000                                         
Df Residuals:                     999                                         
Df Model:                           0                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------------------------------

### Part 7

#### Section A

To accomodate $\theta_{1A}$ and $\theta_{1B}$ we would need to modify the dynamic program as follows

$$V_j(a_{t},i_t,\epsilon_{it}) = \pi(a_t, i_t, \epsilon_{jt}; \theta_j) + max_{i_{t+1}} \beta E(V_j(a_{t+1},i_{t+1},\epsilon_{jt+1}) |a_t, \epsilon_{it}, i_t; \theta_j)$$

Now, there is an $j$ subscript on $\theta$. In principle, this means we will have to calculate two different value functions using the contraction mapping. The likelihood of replacing or not $i_j$. Using the extreme value distribution and conditional independence. This is given by:


$$Pr(i_j =1 | a_j, \epsilon_{jt} ) = \alpha \dfrac{e^{V_{1A}(a_j, \epsilon_{jt})}}{e^{V_{0A}(a_j, \epsilon_{jt})} + e^{V_{1A}(a_j, \epsilon_{jt})}} + (1-\alpha) \dfrac{e^{V_{1B}(a_j, \epsilon_{jt})}}{e^{V_{0B}(a_j, \epsilon_{jt})} + e^{V_{1B}(a_j, \epsilon_{jt})}}$$

#### Section B

We must now consider the likelihood of a sequence of decisions $ \{i(a_{jt},\epsilon_{jt}) = 1\}_{t<T}$. 

$$Pr( \{i(a_{jt},\epsilon_{jt}) = 1\}_{t<T} ) = \alpha Pr(\{i(a_{jt},\epsilon_{jt}) = 1\}_{t<T} ; \theta_{1A} ) + (1-\alpha) Pr(\{i(a_{jt},\epsilon_{jt}) = 1\}_{t<T} ; \theta_{1B} ) $$


Under conditional indpendence, and the extreme value distribution we would calculate the following:


$$Pr( \{i(a_{jt},\epsilon_{jt}) = 1\}_{t<T}) = \alpha \prod_{t<T} \dfrac{e^{V_{1A}(a_j, \epsilon_{jt})}}{e^{V_{0A}(a_j, \epsilon_{jt})} + e^{V_{1A}(a_j, \epsilon_{jt})}} + (1-\alpha) \prod_{t<T} \dfrac{e^{V_{1B}(a_j, \epsilon_{jt})}}{e^{V_{0B}(a_j, \epsilon_{jt})} + e^{V_{1B}(a_j, \epsilon_{jt})}}$$

#### Section C

If machines differ, then $\epsilon_{jt}$ are now serially correlated. We must make $\epsilon_{jt}$ a state variable in our expected value function. 

Before we had $\bar{V}(a_t, \epsilon_{it}) = E(V_j(a_{t+1},\epsilon_{jt+1}) |a_t ; \theta_j)$.

Now we need to condition on $\epsilon_{jt}$ i.e. $\bar{V}(a_t, \epsilon_{it}) = E(V_j(a_{t+1},\epsilon_{jt+1}) |a_t, \epsilon_{jt} ; \theta_j)$

#### Section D

The initial conditions problem involves the fact that we do not see the initial period. As a result, any serial correlation can be caused by unobserved heterogeneity or by the initial draw of $\epsilon_{jt}$.

We can solve the problem by simulating $\epsilon_{jt}$ all the way back to the initial period.

#### Section E

For unobserved reasons firms will not replace their engines as often as expected in the data. If firms systematically wait longer than expected to replace their engines, $\lambda$ will be identified. More specifically, the distribution of $\epsilon_{it}$, the unobservables, will have a mean given by the extreme value distribution. The difference between this mean (in theory) and its value in practice will identify $\lambda$.

### Part 8

Below I estimate the model using the Hotz-Miller algorithm

In [35]:
def hm_initial_pr(a_obs, i_obs):
    """calculate state pr"""
    
    df = np.array([a_obs,i_obs]).transpose()
    df = pd.DataFrame(df, columns=('a','i'))
    pr_obs = df.groupby('a')
    pr_obs = pr_obs.sum()/(1.*pr_obs .count())

    return  np.array(pr_obs)
    

def hm_transitions(a_max):
    """calculate transitions, deterministic
    in this case"""
    
    trans1 = np.zeros((a_max,a_max))
    trans1[:,0] = np.ones(a_max)
    
    trans0 = np.vstack( (np.identity(a_max-1), np.zeros(a_max-1)))
    trans0 = np.hstack( ( np.zeros((a_max,1)), trans0 ))
    trans0[a_max-1][a_max-1] = 1

    return trans0,trans1


def hm_value(a_max, theta1, cost, pr_obs):
    """calculate value function using hotz miller approach"""
    
    #set up matrices, transition is deterministic
    trans0, trans1 = hm_transitions(a_max)
    a = np.arange(1,a_max+1).reshape(a_max,1)
    
    #calculate value function for all state
    pr_tile = np.tile( pr_obs.reshape(a_max,1), (1,a_max))
    
    denom = (np.identity(a_max) - BETA*(1-pr_tile)*trans0 - BETA*trans1*pr_tile)
        
    numer = ( (1-pr_obs)*(theta1*a  + GAMMA - np.log(1-pr_obs)) + 
                 pr_obs*(cost+ GAMMA - np.log(pr_obs) ) )
    
    value = np.linalg.inv(denom).dot(numer)
    return value


def hm_prob(a_max, theta1, cost, pr_obs):
    """calculate kappa using value function"""
    
    value = hm_value(a_max, theta1, cost, pr_obs)
    trans0,trans1 = hm_transitions(a_max)
    a = np.arange(1,a_max+1).reshape(a_max,1)

    delta1 = np.exp( cost + BETA*trans1.dot(value))
    delta0 = np.exp( a*theta1 + BETA*trans0.dot(value) )
    
    return delta1/(delta1+delta0)

hm_value(5, -1, -3, hm_initial_pr(data[:,0], data[:,1]))
hm_prob(5, -1, -3, hm_initial_pr(data[:,0], data[:,1]))

array([[0.22344462],
       [0.52314252],
       [0.77256052],
       [0.90702516],
       [0.96366077]])

Below I estimate the parameters using the Hotz Miller estimation routine.

In [38]:
from scipy.optimize import minimize

class HotzMiller():
    """class for estimating the values of R and theta"""
    
    def __init__(self, a_max, a, i):
        self.a_max = a_max
        self.pr_obs = hm_initial_pr(a,i)
        self.a = a
        self.i = i
        self.theta1 = 0
        self.R = 0
        
        
    def likelihood(self, params): 
        theta1, R = params
        
        # Input our data into the model
        i = self.i
        a = self.a.astype(int)
        
        #set up hm state pr
        prob = hm_prob(self.a_max, theta1, R, self.pr_obs).transpose()[0]
        prob = interp1d(range(1,self.a_max+1), prob,fill_value="extrapolate")
        
        log_likelihood = (1-i)*np.log(1-prob(a)) + i*np.log(prob(a))
        
        return -log_likelihood.sum()
    
    
    def fit(self):
        result = minimize(self.likelihood, [-1,-3], method = 'Nelder-Mead', options={'disp': False})
        self.theta1, self.R = result.x

model_hm = HotzMiller(5, data[:,0],data[:,1])
model_hm.fit()

print '\n theta_1:%s, R:%s'%(round(model_hm.theta1,4) , round(model_hm.R,4))


 theta_1:-1.1495, R:-4.4537


Below I write code to iterate the Hotz Miller estimation procedure 3 times. Although the parameters get closer to their maximum likelihood estimates, they rate at which it converges is slow.

In [39]:
class AM( HotzMiller):
    """A class for doing the AM contraction mapping"""
    
    def iterate(self,numiter):
        i = 0
        self.fit() 
        while(i < numiter):
            #update pr_obs based on parameters
            self.pr_obs = hm_prob(self.a_max, self.theta1, self.R, self.pr_obs)
            #refit the model
            self.fit()
            i = i +1
    

model_am = AM(5,data[:,0],data[:,1])
#print model_am.result.summary(xname=['theta_1', 'R'])
model_am.iterate(0)
print '\n theta_1:%s, R:%s'%(round(model_am.theta1,4) , round(model_am.R,4))
n = 3
print'\n Iterating %s times ...'%n
model_am.iterate(3)
print '\n theta_1:%s, R:%s'%(round(model_am.theta1,4) , round(model_am.R,4))


 theta_1:-1.1495, R:-4.4537

 Iterating 3 times ...

 theta_1:-1.1484, R:-4.4464
