# Assignment 2: Convergence, Accounting, & Development

**GLBL 5010: Economics for Global Affairs**

**Group Members:**
* Kevin Chen
* Pranav Pattatathunaduvil
* Lucy Kim

---

# Data Cleaning

In [7]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.formula.api as smf

# Choosing country code for later question
MY_COUNTRY_CODE = "USA"

# Load the data
df = pd.read_csv('hwk2_accounting.csv', sep='\t')

# Filter for rows that have data for both 1960 and 2018 to ensure consistent samples
df = df.dropna(subset=['pop1960', 'cgdpo1960', 'cn1960', 'pop2018', 'cgdpo2018', 'cn2018']).copy()

print("Data loaded. Number of countries:", len(df))
print(df.head())

Data loaded. Number of countries: 102
  countrycode    country    pop1960    hc1960  cgdpo1960      cn1960  \
3         ARG  Argentina  20.545674  1.953866    65001.0  107963.000   
5         AUS  Australia  10.470019  2.746758   163815.0  744405.700   
6         AUT    Austria   7.093828  2.403941    71811.0  176051.700   
7         BDI    Burundi   2.781159  1.095495     1920.0    5961.864   
8         BEL    Belgium   9.113383  2.307354    94903.0  493083.300   

     pop2018    hc2018  cgdpo2018      cn2018                     region  
3  44.361150  3.065968    1022236  3361087.00  Latin America & Caribbean  
5  24.898152  3.536047    1350340  5795477.00        East Asia & Pacific  
6   8.891388  3.369997     470542  2835065.00      Europe & Central Asia  
7  11.175374  1.402834       9048    18111.23         Sub-Saharan Africa  
8  11.482178  3.142735     510223  3436160.00      Europe & Central Asia  


---
# 1A

The production function is $Y=AK^{\alpha}L^{1-\alpha}$.

To express this in GDPPC, we simply divided by $L$

We're also keeping in mind the lowercase variables ($y=Y/L$ and $k=K/L$)

$$\frac{Y}{L} = A \left(\frac{K}{L}\right)^\alpha \frac{L^{1-\alpha}}{L^{1-\alpha}}$$

$$y = A k^\alpha$$

--- 
# 1B

In [8]:
# Create GDPPC for 1960 and 2018
df['y_1960'] = df['cgdpo1960'] / df['pop1960']
df['y_2018'] = df['cgdpo2018'] / df['pop2018']

# Create log variables
df['ln_y_1960'] = np.log(df['y_1960'])
df['ln_y_2018'] = np.log(df['y_2018'])

# Calculate Growth (Difference in logs)
df['growth'] = df['ln_y_2018'] - df['ln_y_1960']

# Run OLS regression of growth based on initial log GDPPC (1960)
reg_1b = smf.ols('growth ~ ln_y_1960', data=df).fit()
print(reg_1b.summary())

                            OLS Regression Results                            
Dep. Variable:                 growth   R-squared:                       0.018
Model:                            OLS   Adj. R-squared:                  0.008
Method:                 Least Squares   F-statistic:                     1.816
Date:                Sat, 24 Jan 2026   Prob (F-statistic):              0.181
Time:                        10:22:41   Log-Likelihood:                -132.01
No. Observations:                 102   AIC:                             268.0
Df Residuals:                     100   BIC:                             273.3
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      2.3232      0.726      3.198      0.0


**Interpretation**

The coefficient on "ln_y_1960" is **-0.1214**.

While the coefficient is negative, which would typically suggest convergence (poorer countries growing faster), the p-value is **0.181**. 

Because this is >0.05, the result is **not statistically significant**, therefore we do not find strong evidence of convergence across this sample data of countries.

--- 
# 1C


In [10]:
# Assume alpha = 0.30
alpha = 0.30

# Calculate k (capital per capita)
df['k_1960'] = df['cn1960'] / df['pop1960']
df['k_2018'] = df['cn2018'] / df['pop2018']

# Log k
df['ln_k_1960'] = np.log(df['k_1960'])
df['ln_k_2018'] = np.log(df['k_2018'])

# Calculate TFP (ln A)
# Formula: ln y = ln A + alpha * ln k  =>  ln A = ln y - alpha * ln k
df['ln_A_1960'] = df['ln_y_1960'] - alpha * df['ln_k_1960']
df['ln_A_2018'] = df['ln_y_2018'] - alpha * df['ln_k_2018']

# Report Mean and Std Dev for 1960
print("1960 ln A Statistics:")
print(df['ln_A_1960'].describe()[['mean', 'std']])

1960 ln A Statistics:
mean    5.366537
std     0.620780
Name: ln_A_1960, dtype: float64
