# Replication Exercise: Edmonds, Pavcnik and Topalova (2010)

The original paper can be found [here](https://www.aeaweb.org/articles?id=10.1257/app.2.4.42).

In this paper, the authors investigate the effect of India's 1991 liberalisation reforms on the human capital investment of households on their children; specifically the decisions with respect to schooling and working. The basic idea is that trade liberalisation often incurs some adjustment costs for the domestic workers, who were otherwise "protected", now exposed to the new trade regime. The schooling decisions of households is an important determinant of long-term human capital development. It is thus worth exploring the manner in which these adjustment costs (caused by the "loss of protection") affect child labour and school education.

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
from linearmodels.iv import IV2SLS
from IPython.display import display

The authors contend that trade liberalisation may affect schooling decisions through primarily three channels: changes in living standards, demand for child labour and returns to education. The representative household has one child, one adult and one decision-maker. The household's net income when the child is enrolled in ac school is given by $y_s = y_o - w^* - c$, where $y_o$ is the household income when the child is not enrolled in a school, $w^*$ is the economic contribution from the child in the absence of schooling and $c$ is the net cost of schooling. Thus, the household's decision to send the child to school is dictated by the constraint:

$$
u(y_s, s) + e_s \geq u(y_0, 0) + e_0 \tag{1}
$$

where the stochastic term $e_k, k \in \{s, 0\}$. The utility derived by the household by sending the child to school is given by: $u(y_s, s) = v(y_o - w^* - c, p) + \alpha r$, where $v(-)$ represents the indirect utility from income $y_s$ at consumer prices $p$. Then, the probability of the child being enrolled in a school is given by:

$$
Pr (s = 1) = Pr(v(y_0 - w^* - c, p) + \alpha r + e_s \geq v(y_0, p) + e_0)
$$

$$
Pr (s = 1) = Pr(e_0 - e_s \leq v(y_o - w^* - c, p) + \alpha r - v(y_o, p)) \tag{2}
$$

The equation 2 can be transformed by taking $u = e_0 - e_s$ with cdf $F(u)$ and strictly positive density $f(u)$:

$$
Pr (s = 1) = F(v(y_0 - w^* - c, p) + \alpha r - v(y_0, p))
$$

Totally differentiating:

$$
dPr(s = 1) = f(u) \left ( \left [ \frac{\partial v_s}{\partial y} - \frac{\partial v_0}{\partial y} \right ] dy_0 - \frac{\partial v_s}{\partial y} dw^* + \alpha dr + \left [ \frac{\partial v_s}{\partial p} - \frac{\partial v_0}{\partial p} \right ] dp - \frac{\partial v_s}{\partial y} dc \right ) \tag{3}
$$

where $v_s = v(y_0 - w^* - c, p)$ and $v_0 = v(y_0, p)$. The effect of trade liberalisation is abstracted through tariff $t$, which influences the marginal utility of income throught the consumption channel. Given that schooling costs are fixed ($dc = 0$), equation (3) can be reduced to:

$$
dPr(s = 1) = f(u) \left ( \left [ \frac{\partial v_s}{\partial y} - \frac{\partial v_0}{\partial y} \right ] \frac{\partial y_0}{\partial t} dt - \frac{\partial v_s}{\partial y} \frac{\partial w^*}{\partial t} dt + \alpha \frac{\partial r}{\partial t} \right ) \tag{4}
$$

From this, it follows that if a decline in tariffs ($dt < 0$) reduce the living standards, then the schooling might also reduce (since $\frac{\partial v_s}{\partial y} > \frac{\partial v_0}{\partial y} > 0$). The schooling of the child might also decline if the child engages in labour and if the returns to education declines. 

The primary dataset used for the analysis corresponds to the rural samples of rounds 43rd (July 1987–June 1988) and 55th (July 1999–June 2000) of National Sample Survey. The sample is restricted to children in the age bracket 10-14. The summary statistics on schooling and child labour are presented in table 1, which capture the trends from the 38th round (1983) to the 55th (2000).

In [2]:
child_df = pd.read_stata("data/child.dta")

In [34]:
summary = child_df[['schoolattend', 'labor', 'workonly', 'marketwork', 'domesticwork', 'round', 'mult']].dropna()
def weighted_mean(x, weights):
    return np.average(x, weights=weights)

summarystat = summary.groupby(['round']).apply(
    lambda x: pd.Series({
        'Attend School': weighted_mean(x['schoolattend'], x['mult']),
        'Work': weighted_mean(x['labor'], x['mult']),
        'Work Only': weighted_mean(x['workonly'], x['mult']),
        'Market Work': weighted_mean(x['marketwork'], x['mult']),
        'Domestic Work': weighted_mean(x['domesticwork'], x['mult']),
    })
).T

summarystat = summarystat.round(3)
summarystat.columns = [f'Round {col}' for col in summarystat.columns]

print("Table 1: Activities of Children in Rural India, 1983–2000")
print("="*60)
print(summarystat)

Table 1: Activities of Children in Rural India, 1983–2000
               Round 38  Round 43  Round 55
Attend School     0.485     0.551     0.727
Work              0.360     0.250     0.142
Work Only         0.355     0.246     0.137
Market Work       0.193     0.138     0.076
Domestic Work     0.167     0.112     0.066


Attend School: if a child reports attending school in the household roster regardless of his/her usual principal activity \
Work: if a child reports participation in market work or domestic work as a principal usual activity \
Work Only: if a child reports market or domestic work as a principal usual activity and does not report attending school \
Market Work: if a child reports work in a household enterprise such as a farm or business, wage work, and begging \
Domestic Work: if a child reports work including chores, collection activities, and sewing, tailoring, weaving, etc. for household use

The authors measure the effect of changes in the national tariffs at the district level. With the introduction of a new tariff regime, the magnitude of reduction in the tariff protection may vary based on the industrial composition of employment of the districts. The authors exploit this heterogeneity and focus on how schooling and child labour change in districts with greater reduction in tariff protection relative to districts with smaller reduction. In essence, a $district d$ has a $district tariff$ at time $t$ which is measured by the 1991 district-specific industry employment weighted average of tariffs. The authors measure employment $Emp_{i,d}$ based on the population and housing census 1991 and subsequently construct industry employment weights $w_{i,d} = \frac{Emp_{i,d}}{\sum_i Emp_{i,d}}$ for rural areas. This exercise done for each industry $i$ in district $d$. The district tariff is thus given by:

$$
tariff_{i,d} = \sum_i w_{i,d} \times tariff_{i,t}\tag{5}
$$

Another measure called traded tariff $TrTariff_{dt}$ is created using the same equation but for only employment in traded sectors. Table 2 presents the summary statistics of the district tariff in rural India.

In [35]:
# Table 2

summary2 = child_df[['tariff', 'trallmtariff', 'agralltariff', 'minmfgmtariff', 'round', 'mult']].dropna()
def weighted_mean(x, weights):
    return np.average(x, weights=weights)

summarystat2 = summary2.groupby(['round']).apply(
    lambda x: pd.Series({
        'Tariff': weighted_mean(x['tariff'], x['mult']),
        'Tariff on traded goods': weighted_mean(x['trallmtariff'], x['mult']),
        'Agricultural goods only': weighted_mean(x['agralltariff'], x['mult']),
        'Mining and manufacturing only': weighted_mean(x['minmfgmtariff'], x['mult']),
    })
).T

summarystat2 = summarystat2.round(3)
summarystat2.columns = [f'Round {col}' for col in summarystat2.columns]

print("Table 2: District Tariff Measures in Rural India")
print("="*60)
print(summarystat2)

Table 2: District Tariff Measures in Rural India
                               Round 43  Round 55
Tariff                            0.080     0.025
Tariff on traded goods            0.883     0.308
Agricultural goods only           0.812     0.230
Mining and manufacturing only     0.901     0.337


The empirical framework seeks to examine the changes in schooling and child labour that occurred due to differing levels of tariff protections in districts. The base specification explains an indicator $y_{jhdt}$ of whether child $j$, who lives in household $h$ within district $d$, participated in activity $y$ (such as domestic work, school and so on) at time $t$:

$$
y_{jhdt} = \beta_0 + \beta_1 Tariff_{dt} + \pi (A_{jt}, G_{jt}) + \alpha H_{ht} + \tau_t + \lambda_d + \epsilon_{jhdt}\tag{6}
$$

where $\pi (A_{jt}, G_{jt})$ is a third-order polynomial that includes the child's age, a gender indicator, and their interactions. The vector $H_{ht} $ represents household characteristics that could influence the household's decision regarding the child's activity. These characteristics include factors such as caste, religion, the gender of the household head, their age, literacy, and education. The primary coefficient of interest in this analysis is given by $\beta_1$, which corresponds to district tariffs. This specification also takes into account time-invariant characteristics with the district fixed effect $\lambda_d$. The post-reform fixed effect $\tau_t$ controls for the average changes in the children activities between the survey rounds 43rd and 55th.

In equation (5), the changes in the measure of district tariff $tariff_{i,d}$ is associated with the size of the non-traded sector. Thus, the initial size of the non-traded sector in a district may be linked to varying time trends and can potentially bias the model estimates. The authors circumvent this issue by interacting a measure of pre-reform conditions $D_d$ with the post_reform indicator $\tau_t$ and by instrumenting district tariff $tariff_{i,d}$ district tariff on traded goods $TrTariff_{dt}$. The main specification then becomes:

$$
y_{jhdt} = \beta_0 + \beta_1 Tariff_{dt} + \pi (A_{jt}, G_{jt}) + \alpha H_{ht} + \delta D_d \times \tau_t + \tau_t + \lambda_d + \epsilon_{jhdt}\tag{7}
$$

In [None]:
# Table 3

tab3 = child_df[child_df['round'] != 38]
tab3.loc[:, 'schoolattendtrend'] = child_df['scht'] * child_df['post']

In [25]:
cov1 = ["female", "age", "fem_age", "age2", "fem_age2", "age3", "fem_age3",
        "sc", "st", "hindu", "islam", "christ", "sikh", "headfemale", "headage",
        "headliterate", "headprimary", "headmiddle", "headsecond", "headhigher", "post"]

cov2 = ["female", "age", "fem_age", "age2", "fem_age2", "age3", "fem_age3",
        "sc", "st", "hindu", "islam", "christ", "sikh", "headfemale", "headage",
        "headliterate", "headprimary", "headmiddle", "headsecond", "headhigher",
        "post", "pcnt_litpost", "pcnt_scstpost", "pcnt_mfgpost", "pcnt_farmpost",
        "pcnt_tradepost", "pcnt_tranpost", "pcnt_minpost", "pcnt_servpost", "postlaw1", "postlaw2"]

tab3.loc[:, 'district'] = tab3['district'].astype('category')
tab3.loc[:, 'stateyear'] = tab3['stateyear'].astype('category')

In [26]:
all = ['schoolattend', 'tariff', 'trallmtariff'] + cov1 + cov2 + ['district', 'stateyear']
tab3 = tab3.dropna(subset=all)

In [None]:
# Columns 1-3

ols_model = smf.ols(f'schoolattend ~ tariff + {" + ".join(cov1)} + C(district)', data=tab3
                   ).fit(cov_type='cluster', cov_kwds={'groups': tab3['stateyear']})
ols_results = ols_model.summary()

rf_model = smf.ols(f'schoolattend ~ trallmtariff + {" + ".join(cov2)} + C(district)', data=tab3
                  ).fit(cov_type='cluster', cov_kwds={'groups': tab3['stateyear']})
rf_results = rf_model.summary()

iv_model = IV2SLS.from_formula(f'schoolattend ~ 1 + {" + ".join(cov2)} + C(district) [tariff ~ trallmtariff]', data=tab3
                              ).fit(cov_type='clustered', clusters=tab3['stateyear'])
iv_results = iv_model.summary

In [7]:
def add_significance_stars(p_values):
    stars = []
    for p in p_values:
        if p < 0.01:
            stars.append('***')
        elif p < 0.05:
            stars.append('**')
        elif p < 0.1:
            stars.append('*')
        else:
            stars.append('')
    return stars

ols_coeffs = ols_model.params[['tariff', 'post']]
ols_se = ols_model.bse[['tariff', 'post']]
ols_pvalues = ols_model.pvalues[['tariff', 'post']]

rf_coeffs = rf_model.params[['trallmtariff']]
rf_se = rf_model.bse[['trallmtariff']]
rf_pvalues = rf_model.pvalues[['trallmtariff']]

ols_coeffs = ols_coeffs.round(3)
ols_se = ols_se.round(3)
rf_coeffs = rf_coeffs.round(3)
rf_se = rf_se.round(3)

ols_stars = add_significance_stars(ols_pvalues)
rf_stars = add_significance_stars(rf_pvalues)

col12 = pd.DataFrame({
    'Coefficients (1)': ols_coeffs.astype(str) + ols_stars,
    'SE (1)': ols_se.round(3),
    'Coefficients (2)': rf_coeffs.astype(str) + rf_stars,
    'SE (2)': rf_se.round(3)
})

styled_results12 = col12.style.set_table_styles([
    {'selector': 'th', 'props': [('font-size', '9pt'), ('text-align', 'center'), ('background-color', '##605d63')]},
    {'selector': 'td', 'props': [('font-size', '9pt'), ('text-align', 'center')]},
]).set_caption("Table 3: The Effect of Tariffs on School Attendance")

display(styled_results12)

Unnamed: 0,Coefficients (1),SE (1),Coefficients (2),SE (2)
post,0.172***,0.011,,
tariff,0.376***,0.09,,
trallmtariff,,,0.124**,0.055


The OLS estimation of equation (6) is presented in column 1 with predictors district tariff and post-reform indicator for the outcome school attendance. In the second model, the predictor district tariff is replaced with (but not instrumented) with district tariff on traded goods only. The traded goods district tariff shows a smaller increase in schooling than the aggregate district tariff. The post variable in the first model indicates that districts with no tariff change see an increase in the probability of schooling by 17 percentage points.

In [27]:
# Column 4
variables_to_interact = ['regpcnt_serv', 'regpcnt_tran', 'regpcnt_trade',
                        'regpcnt_min', 'regpcnt_farm', 'regpcnt_mfg',
                        'regpcnt_lit', 'regpcnt_scst']

for X in variables_to_interact:
    tab3[f'{X}post'] = tab3[X] * tab3['post']

cov3 = ["female", "age", "fem_age", "age2", "fem_age2", "age3", "fem_age3",
        "sc", "st", "hindu", "islam", "christ", "sikh", "headfemale", "headage",
        "headliterate", "headprimary", "headmiddle", "headsecond", "headhigher",
        "post"]

interaction_terms = [f'{X}post' for X in variables_to_interact]
cov3 += interaction_terms + ['postlaw1', 'postlaw2']

all = ['schoolattend', 'regtariff', 'regtralltariff'] + cov3 + ['regcod50', 'stateyear']
tab3 = tab3.dropna(subset=all)

iv_region_model = IV2SLS.from_formula(f'schoolattend ~ 1 + {" + ".join(cov3)} + C(regcod50) [regtariff ~ regtralltariff]'
                                      , data=tab3).fit(cov_type='clustered', clusters=tab3['stateyear'])

iv_region_results = iv_region_model.summary

In [28]:
# Column 5

col5 = child_df[(child_df['round'] == 38) | (child_df['round'] == 43)]
col5 = col5.drop(columns=['post'] + [col for col in col5.columns if col.startswith('postlaw')])

col5.loc[:, 'post'] = 0
col5.loc[col5['round'] == 43, 'post'] = 1

variables5 = ['regpcnt_serv', 'regpcnt_tran', 'regpcnt_trade',
    'regpcnt_min', 'regpcnt_farm', 'regpcnt_mfg',
    'regpcnt_lit', 'regpcnt_scst']

for X in variables5:
    col5[f'{X}post'] = col5[X] * col5['post']

for X in range(1, 4):
    law_col = f'law{X}'
    postlaw_col = f'postlaw{X}'
    
    col5[law_col] = (col5['employer'] == X).astype(int)
    col5[postlaw_col] = col5[law_col] * col5['post']
    
    col5 = col5.drop(columns=[law_col])

In [29]:
cov4 = ["female", "age", "fem_age", "age2", "fem_age2", "age3", "fem_age3",
        "sc", "st", "hindu", "islam", "christ", "sikh", "headfemale", "headage",
        "headliterate", "headprimary", "headmiddle", "headsecond", "headhigher",
        "post", "regpcnt_litpost", "regpcnt_scstpost", "regpcnt_mfgpost",
        "regpcnt_farmpost", "regpcnt_tradepost", "regpcnt_tranpost",
        "regpcnt_minpost", "regpcnt_servpost", "postlaw1", "postlaw2"]

all = ['schoolattend', 'regtarifftest', 'regtrtarifftest'] + cov4 + ['regcod50', 'stateyear']
col5 = col5[all].dropna()

col5['regcod50'] = col5['regcod50'].astype('category')
col5['stateyear'] = col5['stateyear'].astype('category')

iv_model5 = IV2SLS.from_formula(
    'schoolattend ~ 1 + ' + ' + '.join(cov4) + ' + C(regcod50) [regtarifftest ~ regtrtarifftest]',
    data=col5
).fit(cov_type='clustered', clusters=col5['stateyear'])

iv_results5 = iv_model5.summary

In [30]:
col6 = child_df[child_df['round'] != 38]

In [None]:
# Column 6

col6.loc[:, 'schoolattendtrend'] = col6['scht'] * col6['post']

cov2 = [
    "female", "age", "fem_age", "age2", "fem_age2", "age3", "fem_age3",
    "sc", "st", "hindu", "islam", "christ", "sikh", "headfemale", "headage",
    "headliterate", "headprimary", "headmiddle", "headsecond", "headhigher",
    "post", "pcnt_litpost", "pcnt_scstpost", "pcnt_mfgpost", "pcnt_farmpost",
    "pcnt_tradepost", "pcnt_tranpost", "pcnt_minpost", "pcnt_servpost",
    "postlaw1", "postlaw2"
]

all = ['schoolattend', 'tariff', 'trallmtariff', 'schoolattendtrend'] + cov2 + ['district', 'stateyear']
col6 = col6[all].dropna()

col6['district'] = col6['district'].astype('category')
col6['stateyear'] = col6['stateyear'].astype('category')

In [32]:
ivtrend_model = IV2SLS.from_formula(
    'schoolattend ~ 1 + schoolattendtrend + ' + ' + '.join(cov2) + ' + C(district) [tariff ~ trallmtariff]',
    data=col6
).fit(cov_type='clustered', clusters=col6['stateyear'])

ivtrend_results = ivtrend_model.summary

In [33]:
# Column 7

col7 = child_df[child_df['round'] != 38]
all = ['schoolattend', 'tariff', 'trallmtariff', 'minmfgmlicense', 'minmfgmfdi', 'bankpercap', 
       'trallmexp97', 'prmscl_pop2'] + cov2 + ['district', 'stateyear']
col7 = col7[all].dropna()

In [34]:
oref_model = IV2SLS.from_formula(
    'schoolattend ~ 1 + ' + ' + '.join(cov2) + ' + minmfgmlicense + minmfgmfdi + bankpercap + trallmexp97 + prmscl_pop2 + C(district) [tariff ~ trallmtariff]',
    data=col7
).fit(cov_type='clustered', clusters=col7['stateyear'])

oref_results = oref_model.summary

In [35]:
# Column 8

col8 = child_df[child_df['round'] != 38]
all = ['schoolattend', 'tariff', 'trallmtariff', 'inp2tariff', 'inp2trallmtariff', 'constariff'] + cov2 + ['district', 'stateyear']
col8 = col8[all].dropna()

In [36]:
S7_model = IV2SLS.from_formula(
    'schoolattend ~ 1 + ' + ' + '.join(cov2) + ' + constariff + C(district) [tariff + inp2tariff ~ trallmtariff + inp2trallmtariff]',
    data=col8
).fit(cov_type='clustered', clusters=col8['stateyear'])

S7_results = S7_model.summary

In [41]:
def extract_model_results(model, variables):
    coeffs = model.params[variables]
    se = model.std_errors[variables]
    pvalues = model.pvalues[variables]

    coeffs = coeffs.round(3)
    se = se.round(3)
    stars = add_significance_stars(pvalues)

    return coeffs.astype(str) + stars, se

variables_iv_model = ['tariff']
variables_iv_region_model = ['regtariff']
variables_iv_model5 = ['regtarifftest']
variables_ivtrend_model = ['tariff', 'schoolattendtrend']
variables_oref_model = ['tariff']
variables_S7_model = ['tariff', 'inp2tariff']

iv_coeffs, iv_se = extract_model_results(iv_model, variables_iv_model)
iv_region_coeffs, iv_region_se = extract_model_results(iv_region_model, variables_iv_region_model)
iv_model5_coeffs, iv_model5_se = extract_model_results(iv_model5, variables_iv_model5)
ivtrend_coeffs, ivtrend_se = extract_model_results(ivtrend_model, variables_ivtrend_model)
oref_coeffs, oref_se = extract_model_results(oref_model, variables_oref_model)
S7_coeffs, S7_se = extract_model_results(S7_model, variables_S7_model)

col38 = pd.DataFrame({
    'Coeff (3)': iv_coeffs,
    'SE (3)': iv_se,
    'Coeff (4)': iv_region_coeffs,
    'SE (4)': iv_region_se,
    'Coeff (5)': iv_model5_coeffs,
    'SE (5)': iv_model5_se,
    'Coeff (6)': ivtrend_coeffs,
    'SE (6)': ivtrend_se,
    'Coeff (7)': oref_coeffs,
    'SE (7)': oref_se,
    'Coeff (8)': S7_coeffs,
    'SE (8)': S7_se
})

styled_results = col38.style.set_table_styles([
    {'selector': 'th', 'props': [('font-size', '8.5pt'), ('text-align', 'center'), ('background-color', '##605d63')]},
    {'selector': 'td', 'props': [('font-size', '8.5pt'), ('text-align', 'center')]},
]).set_caption("Table 3: Columns 3-8")

display(styled_results)

Unnamed: 0,Coeff (3),SE (3),Coeff (4),SE (4),Coeff (5),SE (5),Coeff (6),SE (6),Coeff (7),SE (7),Coeff (8),SE (8)
inp2tariff,,,,,,,,,,,-0.413,1.174
regtariff,,,0.618***,0.155,,,,,,,,
regtarifftest,,,,,-0.087,0.128,,,,,,
schoolattendtrend,,,,,,,0.178**,0.077,,,,
tariff,0.362***,0.136,,,,,0.37**,0.146,0.394***,0.14,0.471*,0.266


Column 3 presents the results of the IV estimation of equation (7) in which the district tariff is instrumented by district tariff on traded goods. Models and 4 and 5 include region indicators and an interaction of region indicators with post-reform indicator. In column 6, there are interactions between pre-reform trend in schooling and post-reform indicator (schoolattendtrend) and between initial district conditions and post-reform indicator. This model also includes district indicators. Column 7 controls for some other reform indicators like industry licensing, foreign direct investment, etc. Column 8 controls for consumption and input tariffs. In summary, the results in the table seem to indicate that districts which experience larger tariff decline have lower improvements in schooling attendance. 

In [37]:
# Table 4

adult_df = pd.read_stata("data/adult.dta")

In [38]:
var4 = [
    "age", "age2", "age3", "sc", "st", "hindu", "islam", "christ", "sikh",
    "post", "pcnt_litpost", "pcnt_scstpost", "pcnt_mfgpost", "pcnt_farmpost",
    "pcnt_tradepost", "pcnt_tranpost", "pcnt_minpost", "pcnt_servpost",
    "postlaw1", "postlaw2"
]

dependent_vars = ['wagework', 'twage']

results1 = {}

for Y in dependent_vars:
    df_subset = adult_df[adult_df['literate'] == 0].dropna(subset=[Y, 'tariff', 'trallmtariff'] + var4)
    
    formula = f'{Y} ~ 1 + {" + ".join(var4)} + C(district) [tariff ~ trallmtariff]'
    
    model1 = IV2SLS.from_formula(formula, data=df_subset).fit(cov_type='clustered', clusters=df_subset['stateyear'])
   
    results1[Y] = model1

In [39]:
summary_table1 = []

for Y, model1 in results1.items():
    tariff_coeff = model1.params.get('tariff', None)
    tariff_se = model1.std_errors.get('tariff', None)
   
    if tariff_coeff is not None:
        tariff_coeff = round(tariff_coeff, 3)
    if tariff_se is not None:
        tariff_se = round(tariff_se, 3)
    
    summary_table1.append({'Model': Y, 'Tariff': tariff_coeff, 'SE': tariff_se})

summary_df1 = pd.DataFrame(summary_table1)

In [40]:
var4 = [
    "age", "age2", "age3", "sc", "st", "hindu", "islam", "christ", "sikh",
    "post", "pcnt_litpost", "pcnt_scstpost", "pcnt_mfgpost", "pcnt_farmpost",
    "pcnt_tradepost", "pcnt_tranpost", "pcnt_minpost", "pcnt_servpost",
    "postlaw1", "postlaw2"
]

dependent_vars = ['wagework', 'twage']

results2 = {}

for Y in dependent_vars:
    df_subset = adult_df[adult_df['literate'] == 1].dropna(subset=[Y, 'tariff', 'trallmtariff'] + var4)
    
    formula = f'{Y} ~ 1 + {" + ".join(var4)} + C(district) [tariff ~ trallmtariff]'
    
    model2 = IV2SLS.from_formula(formula, data=df_subset).fit(cov_type='clustered', clusters=df_subset['stateyear'])
    
    results2[Y] = model2

In [45]:
summary_table2 = []

for Y, model2 in results2.items():
    tariff_coeff = model2.params.get('tariff', None)
    tariff_se = model2.std_errors.get('tariff', None)
   
    if tariff_coeff is not None:
        tariff_coeff = round(tariff_coeff, 3)
    if tariff_se is not None:
        tariff_se = round(tariff_se, 3)
    
    summary_table2.append({'Model': Y, 'Tariff': tariff_coeff, 'SE': tariff_se})

summary_df2 = pd.DataFrame(summary_table2)

print("Table 4: The Effect of Tariffs on Adult Male Employment in Wage Work by Literacy Status")
print("="*50)
print("Panel A: Illiterate Men")
print("="*50)
print(summary_df1)
print("="*50)
print("Panel B: Literate Men")
print("="*50)
print(summary_df2)

Table 4: The Effect of Tariffs on Adult Male Employment in Wage Work by Literacy Status
Panel A: Illiterate Men
      Model  Tariff     SE
0  wagework   0.112  0.292
1     twage   0.472  1.763
Panel B: Literate Men
      Model  Tariff     SE
0  wagework  -0.210  0.115
1     twage  -2.399  0.755


In table 4, the results of the estimation of equation (7) are presented. The sample consists of adult males aged 25-50, categorised by literacy. The dependent variables are participation in wage work (wagework) and number of days worked in wage work (twage). For literate men, both participation and days worked increase in respone to a decline in tariff, while the opposite holds true for illiterate men. 

In [42]:
# Table 5

cov5 = [
    "female", "age", "fem_age", "age2", "fem_age2", "age3", "fem_age3", 
    "sc", "st", "hindu", "islam", "christ", "sikh", "headfemale", "headage", 
    "headliterate", "headprimary", "headmiddle", "headsecond", "headhigher", 
    "post", "pcnt_litpost", "pcnt_scstpost", "pcnt_mfgpost", "pcnt_farmpost", 
    "pcnt_tradepost", "pcnt_tranpost", "pcnt_minpost", "pcnt_servpost", 
    "postlaw1", "postlaw2"
]

dependent_vars = ["schoolattend", "labor", "workonly", "marketwork", "domesticwork", "idle"]

results = {}

for Y in dependent_vars:
    tab5 = child_df.dropna(subset=[Y, 'tariff', 'trallmtariff', 'district', 'stateyear'] + cov5)
    
    formula = f'{Y} ~ 1 + {" + ".join(cov5)} + C(district) [tariff ~ trallmtariff]'
    
    model = IV2SLS.from_formula(formula, data=tab5).fit(cov_type='clustered', clusters=tab5['stateyear'])
    
    results[Y] = model

In [43]:
summary_table = []

for Y, model in results.items():
    tariff_coeff = model.params.get('tariff', None)
    tariff_se = model.std_errors.get('tariff', None)
    
    if tariff_coeff is not None:
        tariff_coeff = round(tariff_coeff, 3)
    if tariff_se is not None:
        tariff_se = round(tariff_se, 3)
        
    summary_table.append({'Model': Y, 'Tariff': tariff_coeff, 'SE': tariff_se})

summary_df = pd.DataFrame(summary_table)

print("Table 5")
print("="*50)
print(summary_df)

Table 5
          Model  Tariff     SE
0  schoolattend   0.362  0.136
1         labor  -0.117  0.109
2      workonly  -0.122  0.110
3    marketwork   0.050  0.092
4  domesticwork  -0.167  0.075
5          idle  -0.240  0.096


The table 5 shows how the participation of children in different work categories is affected by district tariff. The equation (7) is estimated for all the work categories as described in table 1. A decline in tariffs seem to reduce time in school and labour, and instead dedicated to more household work. However, it's hard to make sense of why there is an increase in idle time with a decline in tariffs. The authors speculate that this could be an error in the measurement of the activities of the children, or that the child's marginal product in labour activities decreasing to zero can induce them to withdraw from all works. 

In [8]:
loan_df = pd.read_stata("data/loan.dta")

In [9]:
# Table 6

cov6 = [
    "post", "postlaw1", "postlaw2", "pcnt_litpost", "pcnt_scstpost", "pcnt_mfgpost", 
    "pcnt_farmpost", "pcnt_tradepost", "pcnt_tranpost", "pcnt_minpost", "pcnt_servpost", 
    "sc", "st", "hindu", "islam", "christ", "sikh", "headfemale", "headage", "headliterate", 
    "headprimary", "headmiddle", "headsecond", "headhigher"
]

loan_df1 = loan_df.dropna(subset=['tariff', 'trallmtariff', 'district', 'stateyear'] + cov6)

Y = 'eduloan'

formula = f'{Y} ~ 1 + {" + ".join(cov6)} + C(district) [tariff ~ trallmtariff]'

model6a = IV2SLS.from_formula(formula, data=loan_df1).fit(cov_type='clustered', clusters=loan_df1['stateyear'])

In [10]:
tariff_coeff = model6a.params['tariff']
tariff_se = model6a.std_errors['tariff']

tariff_coeff = round(tariff_coeff, 3)
tariff_se = round(tariff_se, 3)

results_dfa = pd.DataFrame({
    'Variable': ['tariff'],
    'Coefficient': [tariff_coeff],
    'SE': [tariff_se]
})

In [11]:
edu_df = pd.read_stata("data/educationexp.dta")

In [12]:
edu_df1 = edu_df.dropna(subset=['tariff', 'trallmtariff', 'district', 'stateyear'] + cov6)

Y = 'edupercap'

formula = f'{Y} ~ 1 + {" + ".join(cov6)} + C(district) [tariff ~ trallmtariff]'

model6b = IV2SLS.from_formula(formula, data=edu_df1).fit(cov_type='clustered', clusters=edu_df1['stateyear'])

In [13]:
tariff_coeff = model6b.params['tariff']
tariff_se = model6b.std_errors['tariff']

tariff_coeff = round(tariff_coeff, 3)
tariff_se = round(tariff_se, 3)

results_dfb = pd.DataFrame({
    'Variable': ['tariff'],
    'Coefficient': [tariff_coeff],
    'SE': [tariff_se]
})

In [14]:
Y = 'shareeduexp'

formula = f'{Y} ~ 1 + {" + ".join(cov6)} + C(district) [tariff ~ trallmtariff]'

model6c = IV2SLS.from_formula(formula, data=edu_df1).fit(cov_type='clustered', clusters=edu_df1['stateyear'])

In [15]:
tariff_coeff = model6c.params['tariff']
tariff_se = model6c.std_errors['tariff']

tariff_coeff = round(tariff_coeff, 3)
tariff_se = round(tariff_se, 3)

results_dfc = pd.DataFrame({
    'Variable': ['tariff'],
    'Coefficient': [tariff_coeff],
    'SE': [tariff_se]
})

In [16]:
eduexp_df = pd.read_stata("data/eduexpend4252.dta")

In [17]:
cov6d = [
    "post", "postlaw1", "postlaw2", "pcnt_litpost", "pcnt_scstpost", "pcnt_mfgpost", 
    "pcnt_farmpost", "pcnt_tradepost", "pcnt_tranpost", "pcnt_minpost", "pcnt_servpost", 
    "sc", "st", "headfemale", "headage", "headliterate", 
    "headprimary", "headmiddle", "headsecond", "headhigher"
]


eduexp_df = eduexp_df.dropna(subset=['tariff4', 'trallmtariff4', 'district', 'stateyear', 'toteduexp'] + cov6d)


Y = 'toteduexp'

formula = f'{Y} ~ 1 + {" + ".join(cov6d)} + C(district) [tariff4 ~ trallmtariff4]'

model6d = IV2SLS.from_formula(formula, data=eduexp_df, weights=eduexp_df['mult_comb']
                             ).fit(cov_type='clustered', clusters=eduexp_df['stateyear'])

In [18]:
tariff_coeff = model6d.params['tariff4']
tariff_se = model6d.std_errors['tariff4']

tariff_coeff = round(tariff_coeff, 3)
tariff_se = round(tariff_se, 3)

results_dfd = pd.DataFrame({
    'Variable': ['tariff'],
    'Coefficient': [tariff_coeff],
    'SE': [tariff_se]
})

In [19]:
eduexp_df = eduexp_df.dropna(subset=['tariff4', 'trallmtariff4', 'district', 'stateyear', 'shareeduexp'] + cov6d)


Y = 'shareeduexp'

formula = f'{Y} ~ 1 + {" + ".join(cov6d)} + C(district) [tariff4 ~ trallmtariff4]'

model6e = IV2SLS.from_formula(formula, data=eduexp_df, weights=eduexp_df['mult_comb']
                             ).fit(cov_type='clustered', clusters=eduexp_df['stateyear'])

In [20]:
tariff_coeff = model6e.params['tariff4']
tariff_se = model6e.std_errors['tariff4']

tariff_coeff = round(tariff_coeff, 3)
tariff_se = round(tariff_se, 3)

results_dfe = pd.DataFrame({
    'Variable': ['tariff'],
    'Coefficient': [tariff_coeff],
    'SE': [tariff_se]
})

print("Table 6: The Effect of Tariffs on Educational Expenditures")
print("="*50)
print(results_dfa)
print("="*50)
print(results_dfb)
print("="*50)
print(results_dfc)
print("="*50)
print(results_dfd)
print("="*50)
print(results_dfe)

Table 6: The Effect of Tariffs on Educational Expenditures
  Variable  Coefficient    SE
0   tariff        -0.03  0.01
  Variable  Coefficient     SE
0   tariff       16.581  4.526
  Variable  Coefficient     SE
0   tariff        0.054  0.016
  Variable  Coefficient     SE
0   tariff        27.29  8.089
  Variable  Coefficient     SE
0   tariff        0.044  0.028


The authors try to investigate whether tariff declines could lead to decline in educational expenditure through the channel of poverty. That is, districts with high intensity of tariff decline might in some cases exacerbate poverty and reduce the household's investment in child education. In the first row of table 6, the negative effect of tariff suggests that high tariff decline is associated with households taking a loan to finance education. Rows 2 and 3 show the effect of tariff on household per capita expenditure on education and the share of education in household expenditure. The decline in these variables of household expenditure of education corresponds well with the increase in loan for education. This result also matches up well with the findings from table 5 about tariff induced decline in school attendance.

In [26]:
eduexp_df = pd.read_stata("data/eduexpend4252.dta")

In [27]:
# Table 7

eduexp_df = eduexp_df.dropna(subset=['tariff4', 'trallmtariff4', 'district', 'stateyear', 
                                     'tar_meal', 'tar_schlshp', 'tar_free', 'attend', 'enrolled', 
                                     'trmalltar_free', 'trmalltar_meal', 'trmalltar_schlshp'] + cov6d)

X_vars = ['attend', 'enrolled']

models = {}

for X in X_vars:
    
        formula = (f'{X} ~ 1 + {" + ".join(cov6d)} + C(district) '
                   f'[tariff4 + tar_meal ~ trmalltar_meal + trallmtariff4]')
        
        model7 = IV2SLS.from_formula(formula, data=eduexp_df, weights=eduexp_df['mult_comb']
                                    ).fit(cov_type='clustered', clusters=eduexp_df['stateyear'])
       
        models[X] = model7

In [28]:
for X in models:
    print("Table 7: Panel  A")
    print("\n" + "="*80 + "\n") 
    print(f"Results for {X}:")
    print(models[X].params[['tariff4', 'tar_meal']])
    print(models[X].std_errors[['tariff4', 'tar_meal']])  
    print(models[X].pvalues[['tariff4', 'tar_meal']])  
    print("\n" + "="*80 + "\n") 

Table 7: Panel  A


Results for attend:
tariff4     0.798595
tar_meal   -0.737046
Name: parameter, dtype: float64
tariff4     0.226790
tar_meal    0.297791
Name: stderr, dtype: float64
tariff4     0.000429
tar_meal    0.013322
Name: pvalue, dtype: float64


Table 7: Panel  A


Results for enrolled:
tariff4     0.770442
tar_meal   -0.641632
Name: parameter, dtype: float64
tariff4     0.220232
tar_meal    0.307710
Name: stderr, dtype: float64
tariff4     0.000468
tar_meal    0.037053
Name: pvalue, dtype: float64




In table 7, the measure of district tariff is interacted with free midday meal, scholarships and free tution. The dependent variables are school attendance and enrollment. Panel A contains the interaction of tariff and midday meal, Panel B contains the interaction of tariff and scholarships and Panel C contains the interaction of tariff and free tuition.

In [29]:
# Table 7: Panel B
models = {}

for X in X_vars:
    
        formula = (f'{X} ~ 1 + {" + ".join(cov6d)} + C(district) '
                   f'[tariff4 + tar_schlshp ~ trmalltar_schlshp + trallmtariff4]')
        
        model7b = IV2SLS.from_formula(formula, data=eduexp_df, weights=eduexp_df['mult_comb']
                                     ).fit(cov_type='clustered', clusters=eduexp_df['stateyear'])
        
        models[X] = model7b

In [30]:
for X in models:
    print("Table 7: Panel  B")
    print("\n" + "="*80 + "\n") 
    print(f"Results for {X}:")
    print(models[X].params[['tariff4', 'tar_schlshp']])
    print(models[X].std_errors[['tariff4', 'tar_schlshp']])  
    print(models[X].pvalues[['tariff4', 'tar_schlshp']])  
    print("\n" + "="*80 + "\n") 

Table 7: Panel  B


Results for attend:
tariff4        0.592451
tar_schlshp   -1.107483
Name: parameter, dtype: float64
tariff4        0.186170
tar_schlshp    3.070023
Name: stderr, dtype: float64
tariff4        0.001461
tar_schlshp    0.718293
Name: pvalue, dtype: float64


Table 7: Panel  B


Results for enrolled:
tariff4        0.594518
tar_schlshp   -1.695559
Name: parameter, dtype: float64
tariff4        0.185263
tar_schlshp    3.100940
Name: stderr, dtype: float64
tariff4        0.001332
tar_schlshp    0.584524
Name: pvalue, dtype: float64




In [31]:
# Table 7: Panel C
models = {}

for X in X_vars:
    
        formula = (f'{X} ~ 1 + {" + ".join(cov6d)} + C(district) '
                   f'[tariff4 + tar_free ~ trmalltar_free + trallmtariff4]')
        
        model7c = IV2SLS.from_formula(formula, data=eduexp_df, weights=eduexp_df['mult_comb']
                                     ).fit(cov_type='clustered', clusters=eduexp_df['stateyear'])
        
        models[X] = model7c

In [32]:
for X in models:
    print("Table 7: Panel  C")
    print("\n" + "="*80 + "\n") 
    print(f"Results for {X}:")
    print(models[X].params[['tariff4', 'tar_free']])
    print(models[X].std_errors[['tariff4', 'tar_free']])  
    print(models[X].pvalues[['tariff4', 'tar_free']])  
    print("\n" + "="*80 + "\n") 

Table 7: Panel  C


Results for attend:
tariff4     3.187643
tar_free   -2.678858
Name: parameter, dtype: float64
tariff4     1.885163
tar_free    1.954244
Name: stderr, dtype: float64
tariff4     0.090854
tar_free    0.170440
Name: pvalue, dtype: float64


Table 7: Panel  C


Results for enrolled:
tariff4     3.254809
tar_free   -2.748844
Name: parameter, dtype: float64
tariff4     1.857098
tar_free    1.928974
Name: stderr, dtype: float64
tariff4     0.079665
tar_free    0.154149
Name: pvalue, dtype: float64




It can be gathered from all the three panels that the prevalence of welfare provisions for school kids can to some extent alleviate the negative effects of tariff declines on school attendance and enrollment. 

In [12]:
dist_df = pd.read_stata("data/district.dta")

In [13]:
# Table 8

tab8 = dist_df[dist_df['bigstate'] == 1]

cov8 = [
    "pcnt_litpost", "pcnt_scstpost", "pcnt_mfgpost", "pcnt_farmpost", 
    "pcnt_tradepost", "pcnt_tranpost", "pcnt_minpost", "pcnt_servpost", 
    "postlaw1", "postlaw2"
]

tab8a = tab8.dropna(subset=['tariff', 'trallmtariff', 'district', 'stateyear', 'inpov', 'post'] + cov8)

formula = (f'inpov ~ 1 + post + {" + ".join(cov8)} + C(district) '
           f'[tariff ~ trallmtariff]')

model8a = IV2SLS.from_formula(formula, data=tab8a).fit(cov_type='clustered', clusters=tab8a['stateyear'])

In [17]:
tariff_coeff = model8a.params['tariff']
tariff_se = model8a.std_errors['tariff']

tariff_coeff = round(tariff_coeff, 3)
tariff_se = round(tariff_se, 3)

results_df8a = pd.DataFrame({
    'Variable': ['tariff'],
    'Coefficient': [tariff_coeff],
    'SE': [tariff_se]
})

print("Table 8: The Effect of Poverty on Activities of Children")
print("="*60)
print("Dependent Variable: Headcount Rate")
print("="*60)
print(results_df8a)

Table 8: The Effect of Poverty on Activities of Children
Dependent Variable: Headcount Rate
  Variable  Coefficient     SE
0   tariff       -0.494  0.164


In [20]:
child_df = pd.read_stata("data/child.dta")

In [49]:
child_df = child_df[child_df['bigstate'] == 1]

In [50]:
# Table 8

cov9 = [
    "female", "age", "fem_age", "age2", "fem_age2", "age3", "fem_age3", 
    "sc", "st", "hindu", "islam", "christ", "sikh", "headfemale", "headage", 
    "headliterate", "headprimary", "headmiddle", "headsecond", "headhigher", 
    "post", "pcnt_litpost", "pcnt_scstpost", "pcnt_mfgpost", "pcnt_farmpost", 
    "pcnt_tradepost", "pcnt_tranpost", "pcnt_minpost", "pcnt_servpost", 
    "postlaw1", "postlaw2"
]

dependent_vars = ["schoolattend", "labor", "workonly", "marketwork", "domesticwork", "idle"]

results = {}

for Y in dependent_vars:
    tab8 = child_df.dropna(subset=[Y, 'tariff', 'trallmtariff', 'district', 'stateyear', 'inpov'] + cov9)
    
    formula = f'{Y} ~ 1 + post + {" + ".join(cov9)} + C(district) [inpov ~ trallmtariff]'
    
    model = IV2SLS.from_formula(formula, data=tab8).fit(cov_type='clustered', clusters=tab8['stateyear'])
 
    results[Y] = model

In [51]:
summary_table = []

for Y, model in results.items():
    tariff_coeff = model.params.get('inpov', None)
    tariff_se = model.std_errors.get('inpov', None)
    
    if tariff_coeff is not None:
        tariff_coeff = round(tariff_coeff, 3)
    if tariff_se is not None:
        tariff_se = round(tariff_se, 3)
        
    summary_table.append({'Model': Y, 'Headcount Rate': tariff_coeff, 'SE': tariff_se})

summary_df = pd.DataFrame(summary_table)
print("Table 8: The Effect of Poverty on Activities of Children")
print("="*60)
print(summary_df)

Table 8: The Effect of Poverty on Activities of Children
          Model  Headcount Rate     SE
0  schoolattend          -0.794  0.361
1         labor           0.303  0.242
2      workonly           0.321  0.242
3    marketwork          -0.036  0.190
4  domesticwork           0.338  0.220
5          idle           0.473  0.217


In table 8, the equation (7) is estimated with the work categories previously described as the dependent variables. The headcount poverty is used in the analysis and instrumented by the district tariff on traded goods. The authors note that the instrument may not be valid in the case that the tariff on traded goods influenced returns to education and demand for child labour, though there is reason to believe this may not be the case. The estimates for schoolattend and marketwork suggest that a reduction in poverty would increase attendance in school and participation in market work, though the overall probability of working would decrease. 

In essence, this paper sheds some light on the mechanisms through which the intensity of trade liberalisation may affect human capital decisions of the households. Improvements in schooling and reduction in child labour are noted in the period under question, however districts which experience a high tariff decline show smaller improvements in these measures. The primary channel of this development is through poverty; highly affected districts saw smaller reduction in poverty relative to the national trend. However, this is also function of schooling costs and districts with welfare provisions like free midday meals, scholarships and tuition fare better in this regard. Overall this is a very fun and interesting paper on the local effects of trade liberalisation, though there are some possible heterogeneity issues with regards to the higher tariff intensity->lower relative living standards->lower schooling improvements channel. If you find this paper interesting, you might also enjoy another [paper](https://doi.org/10.7208/chicago/9780226318004.003.0008) in the same vein by Petia Topalova.