# Productivity and Fixed Effects Monte Carlo

In this section, we will simulate the behavior of a number of productive firms. We want to simulate the following scenario. Suppose that we observe a measure of the value added $Y_{it}$ of a firm $i$ in time period $t$. Suppose that we also observe a measure of the capital $K_{it}$ and labor $L_{it}$ that the firm uses in any given time period.

Assume a Cobb-Douglas production function as follows: $Y = A L^\alpha K^(1-\alpha)$, where $A$ is total factor productivity. Let lower-case letter represent logs of the respective variables so that we have

$$
y = a + \alpha \ell + (1-\alpha) k.
$$

(Notice, first of all, that we can interpret this production function as a log-linear approximation of a wide class of other nonlinear production functions. This is part of the reason why Cobb-Douglas is so commonly used.) Suppose that we want to estimate the parameters of this production function. Suppose
we run a regression of $y$ on $\ell$ and $k$:

$$
\hat y = \hat a + \hat \beta_\ell \ell + \hat \beta_k k + \hat \epsilon.
$$

From this regression, we should hopefully see that $\hat \beta_\ell \overset{p}{\rightarrow} \alpha$ and $\hat \beta_k \overset{p}{\rightarrow} (1-\alpha)$.
In this exercise, we will simulate a plausible scenario for a firm's production process and we will determine if our estimation procedure is unbiased and/or consistent.


**NOTE:** This section has 9 questions, labelled Q1-Q9. each is worth 4 points.


## Model Description

We will generate a balanced panel dataset of 100 firms, each operating in 50 periods (for a total of 5000 observations). We will create data for the firms' value added (output), total factor productivity, labor inputs, capital inputs, and real wage rates. Suppose that we know a firm's production function to be

$$
y_{it} = a_{it} + 0.7 \ell_{it} + 0.3 k_{it}, 
$$

where all variables are expressed in logs (and, again, $a$ is log total factor productivity, TFP). Suppose that $a_{it} = \gamma_i + \omega_{it}$. Here $\gamma_i$ is a firm fixed effect and $\omega_{it}$ follows an AR(1) process:
$\omega_{it} = \rho \omega_{i,t-1} + \epsilon_{it}$. Assume that $\gamma_i \sim \mathcal N(0, 1/2)$---that is, has a standard deviation of 1/2---across firms. Suppose that $\rho = 0.8$ and $\epsilon_{it}$ is iid $\mathcal N(0,0.1)$ across firms and time. The firm observes $\gamma_i$ and $\omega_{i,t-1}$ at the beginning of period $t$ and then chooses its inputs. After that, $\epsilon_{it}$ is revealed.

Assume that capital is exogenous to the firm and randomly determined (obviously a nonsensical assumption, but it will make things easier here). Specifically, $k_{it}$ is iid $\mathcal N(0,0.1)$ across firms and time. Likewise, the log real wage rate facing the firm is also iid with $w_{it} \sim \mathcal N(0, 0.5)$.

To sum up, the model's shocks are given by

\begin{align*}
\gamma_i &\overset{\text{iid}}{\sim} \mathcal N(0, \sigma^2_\gamma) \\
\varepsilon_{it} &\overset{\text{iid}}{\sim} \mathcal N(0, \sigma^2_\varepsilon) \\
k_{it} & \overset{\text{iid}}{\sim} \mathcal N(0, \sigma^2_k) \\
w_{it} & \overset{\text{iid}}{\sim} \mathcal N(0, \sigma^2_w),
\end{align*}

with $\sigma^2_\gamma = \sigma^2_w = 1/4$ and $\sigma^2_\epsilon = \sigma^2_k = 0.01$.

### Solving For Labor Inputs

Now that we have defined these exogenous variables, we will figure out what the firm's labor inputs are. To do so, we will assume that the firm is a price taker in its output and labor markets. We will use the expression for the profit-maximizing labor level to derive the labor demand equation, and construct the firms' labor input values according to this equation.

Assuming that the price of the output is 1 and is fixed throughout time, the firm's maximization problem is

\begin{align*}
\max_{L_{it}} &\quad E_{t-1}[(A L_{it}^{.7}K^{.3}) - e^{w_{it}} L_{it} - rK \mid w_{it}, \gamma_i, k_i] \\
&\Leftrightarrow \\
\max_{\ell_{it}} &\quad E_{t-1}[ (\exp\{a_{it} + .7 \ell_{it} + .3 k_{it}\}) - \exp(w_{it} + \ell) \mid w_{it}, \gamma_i, k_i],
\end{align*}

where I have removed the cost of capital since $k$ is exogeneous.
Notice also that I am assuming that the firm knows
at the beginning of the period its level of
capital as well as the wages
that it faces, before it chooses its
level of labor demanded.
Evaluating the objective further,

\begin{align*}
& E_{t-1}[ (\exp\{a_{it} + .7 \ell_{it} + .3 k_{it}\}) - \exp(w_{it} + \ell_{it} ) \mid \gamma_i, k_i] = \quad \\
  &\quad = \, \exp \left (
    \gamma_i + \rho \omega_{i,t-1} + .7 \ell_{it} + .3 k_{it}
    + \frac 12 \sigma^2_\varepsilon
    \right ) - \exp\left( \ell_{it} + w_{it} \right).
\end{align*}

The first-order condition is then given by

\begin{align*}
[\ell_{it} :] & \quad .7
  \exp \left (
    \gamma_i + \rho \omega_{i,t-1} + .7 \ell_{it} + .3 k_{it}
    + \frac 12 \sigma^2_\varepsilon
    \right ) - \exp\left( \ell_{it} + w_{it} \right) = 0.
\end{align*}

#### Q1. Use `scipy` to numerically solve the first-order condition for the optimal labor input.

Although this equation can be solved analytically, for practice, solve the first order equation numerically, as it is written above. Use `scipy.optimize.fsolve` to do this.  For the optimal labor input when $\gamma_i = 1$, $\omega_{i,t-1} = -1$, $k_{it} = 3$, and $w_{it} = -1$. (Recall that lower-case letters are logs, so $w_{it} = -1$ still makes sense.)

**Hint:** Write the left-hand side of the equation as a function and use `fsolve` to find the value of labor that sets the equation to zero. 

If you want to check your answers, the analytical solution to the equation is given by

$$
\ell_{it} = \frac{1}{.3}\left[
\gamma_i +\rho \omega_{i,t-1} + .3 k_{it} + \frac 1 2 \sigma^2_\varepsilon 
- w_{it}  + \ln(.7) \right]
$$

Below I provide you with code that generates the answers using the analytical solution.

In [None]:
# Code Provided

import numpy as np
import scipy.optimize
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
sns.set()

# Set seed of random number generator 
np.random.seed(10)

# The variance parameters of the random variables
sigma2gamma = 0.25
sigma2epsilon = 0.01
sigma2k = 0.01
sigma2w = 0.25

# The autocorrelation parameter of the AR(1) process
rho = .8

# Number of firms
N = 10

# Number of time period
T = 20

def labor_demand(gamma_i = 1.0, L_omega_it = -1.0, k_it = 1.0, 
                 w_it = -1.):
    '''
    Log-labor demanded, given the firm's state, solved analytically.
    '''
    l = 1/.3 * (gamma_i + rho * L_omega_it + .3 * k_it + 
                .5 * sigma2epsilon - w_it + np.log(.7))
    return l


In [None]:
# YOUR CODE HERE

#### Q2. What happens to labor inputs when $\gamma_i$ goes up? Answer this question for each of $\omega_{i,t-1}$, $k_{it}$, and $w_{it}$.

In [None]:
# YOUR ANSWER HERE

## Simulate Panel Data

Here I generate the panel of firm data. I start by generating
the fundamental shocks and then use the model of labor
demand above to generate each firm's output level
and labor demanded. I give you the code below to fully generate a panel with `N` different firms over `T` time periods. The funtion `gen_all` gives you a dictionary with all the data generated, including the firm fixed effects $\gamma_i$. The function `create_DF` converts this dictionary into a pandas DataFrame (without the fixed effects information) that you can use in your estimation.

In [None]:
import warnings

def gen_shocks(N, T):
    '''
    Generate the model shocks.
    
    L_omega are the omega shocks shifted forward one period.
    Therefore, \omega_{i,2} = L_omega[i,3]. Think of the
    `L` as representing a lag operator.
    '''
    #generate iid shocks
    gamma = np.random.randn(N, 1) * np.sqrt(sigma2gamma)
    epsilon = np.random.randn(N, T) * np.sqrt(sigma2epsilon)
    k = np.random.randn(N, T) * np.sqrt(sigma2k)
    w = np.random.randn(N, T) * np.sqrt(sigma2w)
    
    #create AR(1) process
    L_omega = np.zeros((N,T+1))
    #initial omega drawn from stationary distribution
    L_omega[:,0] = np.random.randn(N,1).reshape(N) * \
      np.sqrt(sigma2epsilon / (1. - rho**2))
    L_omega[:,1:] = rho * L_omega[:,0:-1] + epsilon
    return gamma, k, w, L_omega

def gen_all(N=5,T=10):
    '''
    General the full panel of firm data
    '''
    gamma, k, w, L_omega = gen_shocks(N=N,T=T)
    l = np.empty((N,T))
    y = np.empty((N,T))
    a = gamma * np.ones((N,T)) + L_omega[:,1:]
    
    #calculate labor demand and output
    for i in range(N):
        for t in range(T):
            l[i,t] = labor_demand(gamma_i=gamma[i],
                                  L_omega_it=L_omega[i,t], 
                                  k_it=k[i,t], w_it=w[i,t])
    y = a + .7 * l + .3 * k
    all_data = {'y':y, 'a':a, 'l':l, 'k':k, 'gamma':gamma, 
             'w':w, 'L_omega':L_omega}
    return all_data

def _create_Panel(all_data):
    '''
    Create pandas.Panel from simulated data.
    '''
    
    panel_vars = dict((key, all_data[key]) for key 
                      in ['y', 'a', 'l', 'k', 'w'])
    
    with warnings.catch_warnings():
        warnings.filterwarnings("ignore",category=FutureWarning)
        panel = pd.Panel(panel_vars)
    
    #Give names to axes
    panel.major_axis.name = 'firm'
    panel.minor_axis.name = 'time'
    panel.items.name = 'vars'
    
    return panel

def create_DF(all_data):
    '''
    Create a properly stacked pandas.DataFrame.

    The column variables are [a, k, l, y, w]
    '''
    panel = _create_Panel(all_data)
    df = panel.to_frame()
    return df

In [None]:
#generate a full panel of data
np.random.seed(100)
all_data = gen_all(N=3,T=50)
df = create_DF(all_data)
df.head()

#### Q3. Plot firm output over time.

Use the data generated above (in `df` or in `all_data`) to plot the firm output over time. Plot the output for all three firms over time on the same graph. Be sure to label the axes of the graph. The data should include `N=3` firms over `T=50` time periods.

In [None]:
# YOUR CODE HERE

#### Q4. Plot firm productivity over time.

Use the data generated above (in `df` or in `all_data`) to plot the firm productivity over time. Plot productivity for all three firms over time on the same graph. Be sure to label the axes of the graph. The data should include `N=3` firms over `T=50` time periods.

In [None]:
# YOUR CODE HERE

#### Q5. Plot capital usage and labor usage for firm `2` over time (there are three firms, 0, 1, and 2)

Use the data generated above (in `df` or in `all_data`) to plot capital usage and labor usage. Be sure to label the axes of the graph. The data should include `N=3` firms over `T=50` time periods.

In [None]:
# YOUR CODE HERE

## Monte Carlo Experiment

#### Q6. Run a Monte Carlo experiment. Is the OLS procedure biased? 

Estimate the production function coefficients (as described at the beginning of this section) by regressing $y$ on $k$ and $l$
using OLS. Are the coefficients on $k$ and $\ell$ biased? In what direction, if so? Run `M=100` experiments and plot this histograms of the *bias* on each coefficient. (If this weren't a time constrained exam, we would run more.)


**Reminder:**

The production function is

$$
y = a + \beta_\ell \ell + \beta_k k,
$$

where $\beta_\ell = .7$ and $\beta_k = .3$. We want to know about the bias by plotting the distribution of $\hat \beta_\ell - \beta_\ell$
and $\hat \beta_k - \beta_k$.

In [None]:
# YOUR CODE HERE

#### Q7. Explain the economics of why OLS might produce biased results. Be specific.

In [None]:
# YOUR ANSWER HERE

#### Q8. Run a Monte Carlo Simulation to see if a Fixed Effects model produces biased estimates of the production function.

Now estimate the production function using firm fixed effects. Is there a bias, and how does this compare to OLS? Plot the histograms of the bias for each parameter as we did before. (To make the plots easier to read, plot the *bias* and not the parameter estimates themselves.)

In [None]:
# YOUR CODE HERE

#### Q9. Explain why adding firm fixed effects might reduce the bias of the parameter estimates.

In [None]:
# YOUR ANSWER HERE