Alex

Revisiting Misallocation Endogineity issue in measuring the total Factor productivity

Since github is giving me a hard time including equations, the equation included version is in the pdf file that I attached.

Short proposal

PDF version https://drive.google.com/file/d/1erYdpP7N2vXoAyvsDTNJkbmH4qJ7lMdc/view?usp=sharing

Object of the project

This project aims to see if using econometric and machine learning methods to estimate the physical productivity in the firm level can give a better measure of misallocation. The project will be following the key paper Hsieh and Klenow (2009) in many ways but will try to change the estimation techniques for the physical productivity.

Original paper and what to extend

The original paper Hsieh and Klenow (2009) does not incorporate estimation but rather obtains the physical productivity using calibration from a structural model that assumes monopolistic competition with heterogeneous firms. The firms produce differentiated products that will be aggregated with a CES aggregation function to produce an industry level output. The final output is a Cobb-Douglas production function of the industry level outputs. Single firm produces according to a firm level Cobb-Douglas production function.

By solving the model, the key equation that measures the effect of misallocation on the total factor productivity is

$TFP_s= \left [ \sum_{i=1}^{M_s} ( A_{si} \cdot \frac{\overline{TFPR_s}}{ TFPR_{si} })^{\sigma-1} \right ] ^{\frac{1}{\sigma-1}}$

One thing that Vollrath (2014) points out about the total factor productivity is that it is a monotonically increasing concave function of all the elements if $\frac{1}{\sigma-1}$ is less than 1. Therefore, if there is more dispersion in $A_{si}$ , it will lower the total factor productivity. Thus, estimating the correct $A_{si}$ is important in measuring the effect of misallocation on the total factor productivity and eventually output.

Hsieh and Klenow (2009) proposes a calibration method to obtain $A_{si}$ following the model. As mentioned in Vollrath (2014), the total factor productivity and $A_{si}$ are like Solow residuals. Thus, it might be able to estimate the model with regression. This project begins with this idea. In literature, there are attempts to get the residual productivity through econometric technique. Following Ackerberg, Vabes and Frazer (2015), when we consider a simple Cobb-Douglas production function in logs, we have a linear regression model in our hands. The productivity can be thought of as the residual of that equation. However, the problem is that, there are productivity shocks that are potentially observed or predictable by firms when they make input decisions such as managerial ability of a firm. This part of the residual causes the endogeneity problem since if there is a productivity shock that is only unobservable to the economists, this shock will affect the input decision and will make the coefficient of OLS inconsistent and biased.

In Hsieh and Klenow (2009), this endogeneity problem might not be important since the Solow residual captures all the productivity shock that are not explained by input combination. This might be true for the industry level tfp but for the firm level tfp, estimating the correct productivity for each firm seems important since it affects the measurement of misallocation. Also, I am still reading but there are literature that claim residual tfp might be misleading.

For example, say there is a managerial skill that is important in deciding the input level of capital and labor. If high management skill somehow increases the input levels consistently (maybe the firm with low level management skill underpredicts the optimal level of inputs), it would give a high firm level tfp systematically if we follow the tfp equation in Hsieh and Klenow (2009) since the scaled revenue growth is raised to the power. It seems that the equation is working well in capturing the managerial skill. However, it might be the case that higher managerial skill raise capital level while decrease labor level. Then, we have no information if this measure of tfp is correctly accounting for the higher managerial skill. Therefore, this project tries to gain a consistent estimate of tfp by solving this endogeneity problem with econometric tools and machine learning.

Method to be used for extension

3.1 Econometric modification

First, this project tries to estimate the firm level productivity using the econometric method presented in Ackerberg, Vabes and Frazer (2015). AVF, for abbreviation, is a modification of two papers, Olley and Pakes (1996) and Levinshohn and Petrin (2003).

In this short proposal, OP method will be shortly introduced. Among many other assumptions, the key assumption in OP is that investment is a function of the state variables. Further assumption sets the investment function as strictly increasing in the shock that is only observable to the economists. From this, we can write the productivity shock as an unknown function of observable. Substitute this into the production function and get the GMM estimates.

The unknown function phi in the literature can be approximated with a polynomial function. This is the first stage of the estimation. We can further get consistent coefficients by doing a second stage OP estimation procedure. After getting the coefficients, the Hick's neutral tfp can be gained by the difference of the observed output and the predicted output.

LP is similar but uses the demand function for an intermediate input rather than the investment demand equation. One advantage of using the intermediate input demand function is that it is not a dynamic problem such as investment. Ackerberg, Vabes and Frazer (2015) points out that even with all the adjustments and assumptions in OP and LP, there might be still an issue with functional dependence. The problem is that there might be functional dependencies between the inputs and investment or intermediate demand function, making the dependent variable coefficient unable to be estimated. To solve this problem AVF uses different assumptions and model specification. The key is that now the input demand function is conditional on labor.

If I rearrange the equation of firm level tfp in Hsieh and Klenow, we can obtain a similar equation as in AVF where an indicator variable for industry s is included. It tries to capture the industry specific kappa term. It seems possible to use the AVF method to estimate firm level tfp. One thing to mention is that there are papers such as Topalova and Khandewal (2011) that estimates tfp using LP method with directly using output level in India. It is curious if there are other data source in China and US that are similar to the data in Topalova and Khandewal (2011) and if the result of misallocation is somewhat different with Hiesh and Klenow (2009) using directly output levels.

3.2 Machine learning modification

Machine learning is now getting attention in approximating an unknown function to a certain degree when there is enough data. Since we have input data and the revenue data, we can approximate a prediction function for revenue given inputs. Then the residual can be thought of as the tfp. One machine learning technique that is popular for approximating the unknown function is Neural Networks. It is known by the universal approximation theorem that a continuous function can be approximated by any degree of error with a Neural Network. The advantage is that the inputs can have a highly non-linear relationship in producing the firm specific product.

One direction currently under thought is using Neural Network to approximate the production function and then predict the value without the bias term, which is like an intercept in the linear regression model. Then, the idea is that the difference between the observed tfp and the predicted tfp is the Solow residual. The advantage of this model is that we are now not assuming any functional form for the firm level production function.

Another direction is using Neural Network in the first stage of Ackerberg, Vabes and Frazer (2015), which means getting the unknown function phi with neural network. AVF uses polynomial function to approximate this function. It is known by Stone–Weierstrass theorem that this approximation can be close if the function is continuous in a closed interval. However, Azinovic,Gaegauf and Scheidegger (2021) provides comparison between polynomial approximation and Neural Nets. One advantage is that Neural Net can resolve local features accurately while polynomial cannot, meaning that it might fail on kinked features. Thus, there might be gain of using Neural Network.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alex

Clone this wiki locally