# Intro to Development Accounting
In this note, we produce some results which conceptually replicates [Caselli 2005](https://personal.lse.ac.uk/casellif/papers/handbook.pdf) (see also [Caselli 2016](https://documents1.worldbank.org/curated/ru/199521487336202705/pdf/112882-WP-PUBLIC-WDR17BPAccountingforCrossCountryDifferencesTenYearsLater.pdf)). However, we use Penn World Table 10 with a different and improved treatment of both capital and human capital.

## Loading modules

First, we load Python modules (informally: collections of functions and capabilities):

In [1]:
import numpy as np #numerical stuff (i.e., taking logarithms)
import pandas as pd #good for handling datasets
import plotly.express as px #good for plotting
import plotly.graph_objects as go

## Loading the dataset and constructing variables

Then we load the dataset from the website.

In [2]:
# Download data from Penn World Table
url = "https://www.rug.nl/ggdc/docs/pwt100.xlsx"
data_in = pd.read_excel(url, sheet_name='Data')

What does the dataset look like?

In [3]:
data = data_in.copy()
data

Unnamed: 0,countrycode,country,currency_unit,year,rgdpe,rgdpo,pop,emp,avh,hc,...,csh_x,csh_m,csh_r,pl_c,pl_i,pl_g,pl_x,pl_m,pl_n,pl_k
0,ABW,Aruba,Aruban Guilder,1950,,,,,,,...,,,,,,,,,,
1,ABW,Aruba,Aruban Guilder,1951,,,,,,,...,,,,,,,,,,
2,ABW,Aruba,Aruban Guilder,1952,,,,,,,...,,,,,,,,,,
3,ABW,Aruba,Aruban Guilder,1953,,,,,,,...,,,,,,,,,,
4,ABW,Aruba,Aruban Guilder,1954,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
12805,ZWE,Zimbabwe,US Dollar,2015,40141.617188,39798.644531,13.814629,6.393752,,2.584653,...,0.140172,-0.287693,-0.051930,0.479228,0.651287,0.541446,0.616689,0.533235,0.422764,1.533909
12806,ZWE,Zimbabwe,US Dollar,2016,41875.203125,40963.191406,14.030331,6.504374,,2.616257,...,0.131920,-0.251232,-0.016258,0.470640,0.651027,0.539631,0.619789,0.519718,0.416510,1.491724
12807,ZWE,Zimbabwe,US Dollar,2017,44672.175781,44316.742188,14.236595,6.611773,,2.648248,...,0.126722,-0.202827,-0.039897,0.473560,0.639560,0.519956,0.619739,0.552042,0.415592,1.514525
12808,ZWE,Zimbabwe,US Dollar,2018,44325.109375,43420.898438,14.438802,6.714952,,2.680630,...,0.144485,-0.263658,-0.020791,0.543757,0.655473,0.529867,0.641361,0.561526,0.425143,1.590120


## Development accounting

As organizing framework for our development accounting exercise, we employ an aggregate production function $Y=AK^\alpha(hL)^{1-\alpha}$ where $Y$ is real output, $K$ is the stock of capital, $L$ is the number of workers,$h$ is a measure of the human capital of the workers and $A$ is total-factor productivity. Dividing by $L$, we express the production function in per capita terms, arriving at
$$ y = Ak^\alpha h^{1-\alpha}.$$

To perform our accounting exercise, we need cross-country comparable measures of $y$, $k$, and $h$, as well as a calibrated value for $\alpha$. The Penn World Table provides us with $y$, $k$, and $h$. We set $\alpha=1/3$, to match the capital share in, e.g., the US.

_Note that our approach here is dangerous. We are using someone else's data set and crossing our fingers that whatever they did to construct $y$, $k$, and $h$ is the best we can do. Research on development accounting is all about figuring out conceptual problems with our ways of measuring variables, and on improving on estimates or at the very least cast some doubt on previous estimates._

### Constructing our per capita variables for 2015

First, we choose `country` as our index and restrict our data to 2015. For this sample, we choose our variables for output, capital, human capital (per capita), and the labor force.

In [4]:
data.index = data['country']
restriction = (data['year']==2015)
data = data[restriction]

data['output'] = data['cgdpo']
data['capital'] = data['cn']
data['human_capital_per_capita'] = data['hc']#*data['avh']
data['labor'] = data['emp']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['output'] = data['cgdpo']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['capital'] = data['cn']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['human_capital_per_capita'] = data['hc']#*data['avh']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_

With this variable definitions, we construct our per capita measures and back out total-factor productivity relative to the US. For this, we need our calibrated value of $\alpha$.

In [5]:
alpha = 0.33
data['output_per_capita'] = data['output']/data['emp']
data['capital_per_capita'] = data['capital']/data['emp']
data['output_per_capita_implied'] = data['capital_per_capita']**alpha*data['human_capital_per_capita']**(1-alpha)
data['TFP'] = data['output_per_capita']/data['output_per_capita_implied']
data['TFP_rel_US'] = data['TFP']/data['TFP'].loc['United States']
data['TFP_rel_DE'] = data['TFP']/data['TFP'].loc['Germany']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['output_per_capita'] = data['output']/data['emp']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['capital_per_capita'] = data['capital']/data['emp']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['output_per_capita_implied'] = data['capital_per_capita']**alpha*data['human_capital_per

Just like with growth accounting, "this is it". After having chosen our measures of capital, human capital, etc., the analysis is very simple.

Below, we plot output per capita relative to the US and total-factor productivity relative to the US. The main takeaway is that TFP lines up well with GDP per capita. I.e., high income countries have higher TFP and that GDP per capita.

In [6]:

data['output_per_capita_rel_US'] = data['output_per_capita']/data['output_per_capita'].loc['United States']
data['output_per_capita_implied_rel_US'] = data['output_per_capita_implied']/data['output_per_capita_implied'].loc['United States']

data['output_per_capita_rel_DE'] = data['output_per_capita']/data['output_per_capita'].loc['Germany']
data['output_per_capita_implied_rel_DE'] = data['output_per_capita_implied']/data['output_per_capita_implied'].loc['Germany']

fig = px.scatter(data, 'output_per_capita_rel_US', 'TFP_rel_US', text = data.index,
                 log_x = True, log_y=True)
fig.update_traces(mode='text')
#fig2 = px.line(x=[1/32,1.4],y=[1/32,1.4], log_x = True, log_y=True)
#fig = go.Figure(data = fig1.data + fig2.data, log_x=True)

fig.update_layout(template = 'simple_white',
                  width = 800, height = 600,
                  xaxis_title = 'GDP per capita',
                  yaxis_title = 'TFP',
                  title = 'TFP is higher in rich countries')

fig.update_xaxes(tickvals = [1/32, 1/16, 1/8, 1/4, 1/2, 1],
                 ticktext = ["1/32", "1/16", "1/8", "1/4", "1/2", "1"])
fig.update_yaxes(tickvals = [1/32, 1/16, 1/8, 1/4, 1/2, 1],
                 ticktext = ["1/32", "1/16", "1/8", "1/4", "1/2", "1"])


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['output_per_capita_rel_US'] = data['output_per_capita']/data['output_per_capita'].loc['United States']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data['output_per_capita_implied_rel_US'] = data['output_per_capita_implied']/data['output_per_capita_implied'].loc['United States']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.htm

For the poorer countries, TFP is closer to the US than GDP per capita, indicative of the fact that also a shortage of capital and human capital goes some way to account for the income differences with the US. Part of this is, arguably, mechanical since higher productivity, in itself, leads to greater capital accumulation. Following Klenow and Rodriguez-Clare (1997) and Hall and Jones (1999), we can rewrite the accounting identity as
$$ y = (k/y)^{\alpha/(1-\alpha)}h A^{1/(1-\alpha)}. $$
Setting $A$ constant across countries with this accounting identity assumes that the capital-output ratio remains stable in the thought experiment, something which is consistent with, e.g the Solow model. This formulation takes into account that an increase in productivity endogenously leads to an increase in the capital stock.

In [7]:
data['capital_to_output'] = data['capital']/data['output']
data['output_per_capita_implied_KR'] = data['capital_to_output']**(alpha/(1-alpha))*data['human_capital_per_capita']




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy




To quantify how large share of the variation is accounted for by factor inputs, following Table 1 in Caselli 2005, we compute the variance in log GDP per capita and the variance in log GDP per capita which would result if all countries kept their factor inputs but total-factor productivity was equalized across countries.

In [8]:
table = pd.DataFrame([np.var(np.log(data['output_per_capita'])), np.var(np.log(data['output_per_capita_implied'])), np.var(np.log(data['output_per_capita_implied']))/np.var(np.log(data['output_per_capita']))]).transpose()
table.columns = ['var[log(y)]', 'var[log(y_KH)]', 'success_1']
table.index = ['Baseline success of the factor-only model']
table


invalid value encountered in log



Unnamed: 0,var[log(y)],var[log(y_KH)],success_1
Baseline success of the factor-only model,1.081702,0.366579,0.338891


The factor-only model is only able to account for 34 percent of the variation in GDP per capita. If we use Klenow and Rodriguez-Clare's formulation, the success of the factor-only model is even poorer. See Caselli for the intuition behind this result.

In [9]:
table = pd.DataFrame([np.var(np.log(data['output_per_capita'])), np.var(np.log(data['output_per_capita_implied_KR'])), np.var(np.log(data['output_per_capita_implied_KR']))/np.var(np.log(data['output_per_capita']))]).transpose()
table.columns = ['var[log(y)]', 'var[log(y_KH)]', 'success_1']
table.index = ['Baseline success of the factor-only model with Klenow-Rodriguez-Clare formulation']
table


invalid value encountered in log



Unnamed: 0,var[log(y)],var[log(y_KH)],success_1
Baseline success of the factor-only model with Klenow-Rodriguez-Clare formulation,1.081702,0.185329,0.171331


## Focusing on the EU

One may be interested in a more narrow question. Rather than studying the global dispersion of income, we now focus on the European Union.

In [10]:
eu = ['Austria', 'Belgium', 'Bulgaria', 'Croatia', 'Cyprus', 'Czech Republic', 
      'Denmark', 'Estonia', 'Finland', 'France', 'Germany', 'Greece', 'Hungary', 
       'Italy', 'Latvia', 'Lithuania', 'Luxembourg', 'Malta', 'Netherlands',
      'Poland', 'Portugal', 'Romania', 'Slovakia', 'Slovenia', 'Spain', 
        'Sweden']

eu_90s = ['Austria', 'Belgium', 'Denmark', 'Finland', 'France', 'Germany', 'Greece',
          'Ireland', 'Italy', 'Luxembourg', 'Netherlands', 'Portugal', 'Spain', 'Sweden']


fig = px.scatter(data.loc[eu], 'output_per_capita_rel_DE', 'TFP_rel_DE', text = data.loc[eu].index,
                 log_x = True, log_y=True)
fig.update_traces(mode='text')


fig.update_layout(template = 'simple_white',
                  width = 800, height = 600,
                  xaxis_title = 'GDP per capita',
                  yaxis_title = 'TFP',
                  title = 'European Union excluding Ireland')




Again, we compute how much the variance of log income is decreased if all countries have the same TFP:

In [11]:
table = pd.DataFrame([
    np.var(np.log(data.loc[eu]['output_per_capita'])), np.var(np.log(data.loc[eu]['output_per_capita_implied'])), np.var(np.log(data.loc[eu]['output_per_capita_implied']))/np.var(np.log(data.loc[eu]['output_per_capita']))
]).transpose()
table.columns = ['var[log(y)]', 'var[log(y_KH)]', 'success_1']
table.index = ['Baseline success of the factor-only model']
table



Unnamed: 0,var[log(y)],var[log(y_KH)],success_1
Baseline success of the factor-only model,0.073353,0.022376,0.305051


## Focusing on the OECD

Let's do the same analysis for the OECD because why not?

In [12]:
oecd = ['Australia', 'Austria', 'Belgium', 'Canada', 'Chile', 'Colombia', 'Costa Rica', 'Czech Republic', 
        'Denmark', 'Estonia', 'Finland', 'France', 'Germany', 'Greece', 'Hungary', 'Iceland',  
        'Israel', 'Italy', 'Japan', 'Republic of Korea', 'Latvia', 'Lithuania', 'Luxembourg', 'Mexico',
        'Netherlands', 'New Zealand', 'Norway', 'Poland', 'Portugal', 'Slovakia', 'Slovenia', 'Spain', 
        'Sweden', 'Switzerland', 'Turkey', 'United Kingdom', 'United States'] #'Ireland',

fig = px.scatter(data.loc[oecd], 'output_per_capita_rel_DE', 'TFP_rel_DE', text = data.loc[oecd].index,
                 trendline = "ols")
fig.update_traces(mode='text')

fig.update_layout(template = 'simple_white',
                  width = 800, height = 600,
                  xaxis_title = 'GDP per capita',
                  yaxis_title = 'TFP',
                  title = 'OECD excluding Ireland')

In [13]:
table = pd.DataFrame([
    np.var(np.log(data.loc[oecd]['output_per_capita'])), np.var(np.log(data.loc[oecd]['output_per_capita_implied'])), np.var(np.log(data.loc[oecd]['output_per_capita_implied']))/np.var(np.log(data.loc[oecd]['output_per_capita']))]).transpose()
table.columns = ['var[log(y)]', 'var[log(y_KH)]', 'success_1']
table.index = ['Baseline success of the factor-only model']
table

Unnamed: 0,var[log(y)],var[log(y_KH)],success_1
Baseline success of the factor-only model,0.105484,0.041043,0.389096
