### Goal
1. Determine whether or not the Heritage Foundation's Economic Freedom Index correlates expectedly with the FDI restrictiveness index from the Organization for Economic Co-operation and Development (OECD)
2. If this is the case, it can be determined that factors that impact economic freedom have a direct relationship with FDI incentives from other countries.
3. Additionally, given a high correlation, only the Economic Freedom dataset would need to be used for regression modeling with FDI data.

In [15]:
%pip install statsmodels

import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import LabelEncoder

import statsmodels.api as sm

import matplotlib.pyplot as plt


[notice] A new release of pip is available: 24.1.2 -> 24.3.1
[notice] To update, run: C:\Users\Harb\AppData\Local\Microsoft\WindowsApps\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\python.exe -m pip install --upgrade pip


Collecting statsmodelsNote: you may need to restart the kernel to use updated packages.

  Downloading statsmodels-0.14.4-cp310-cp310-win_amd64.whl.metadata (9.5 kB)
Collecting patsy>=0.5.6 (from statsmodels)
  Using cached patsy-1.0.1-py2.py3-none-any.whl.metadata (3.3 kB)
Downloading statsmodels-0.14.4-cp310-cp310-win_amd64.whl (9.8 MB)
   ---------------------------------------- 0.0/9.8 MB ? eta -:--:--
   ---------------------------------------- 0.1/9.8 MB 2.0 MB/s eta 0:00:05
    --------------------------------------- 0.2/9.8 MB 2.5 MB/s eta 0:00:04
   - -------------------------------------- 0.5/9.8 MB 4.3 MB/s eta 0:00:03
   --- ------------------------------------ 0.9/9.8 MB 6.6 MB/s eta 0:00:02
   ----- ---------------------------------- 1.3/9.8 MB 6.7 MB/s eta 0:00:02
   ------ --------------------------------- 1.7/9.8 MB 7.5 MB/s eta 0:00:02
   -------- ------------------------------- 2.1/9.8 MB 8.0 MB/s eta 0:00:01
   --------- ------------------------------ 2.5/9.8 MB 8.2

In [3]:
oecd_path = "C:\\Users\\Harb\\OneDrive\\Documents\\foreign-direct-investment-analysis\\data\\processed\\political_factors\\OECD_fdi_flows_fdi_restrictiveness.csv"
economic_freedom_path = "C:\\Users\\Harb\\OneDrive\\Documents\\foreign-direct-investment-analysis\\data\\processed\\political_factors\\economic_freedom_country_scores.csv"

oecd_df = pd.read_csv(oecd_path)
economic_freedom_df = pd.read_csv(economic_freedom_path)

print(oecd_df.head(10))
economic_freedom_df.head()

   Unnamed: 0  index LOCATION                       Country  \
0           0    407      IND                         India   
1           1    408      IND                         India   
2           2    409      IND                         India   
3           3    427      IND                         India   
4           4    428      IND                         India   
5           5    429      IND                         India   
6           6    434      CHN  China (People's Republic of)   
7           7    435      CHN  China (People's Republic of)   
8           8    474      IND                         India   
9           9    548      IND                         India   

             Indicator    SUBJECT  OBS_VALUE  TIME_PERIOD  
0  FDI restrictiveness  TRANSPORT   0.450000         1997  
1  FDI restrictiveness  FINANSERV   0.552000         1997  
2  FDI restrictiveness   BSNSSERV   0.663000         1997  
3  FDI restrictiveness  TRANSPORT   0.370000         2003  
4  FDI

Unnamed: 0.1,Unnamed: 0,name_web,Year,Overall,Property Rights,Government Integrity,Judicial Effectiveness,Tax Burden,Government Spending,Fiscal Health,Business Freedom,Trade Freedom,Monetary Freedom,Investment Freedom,Financial Freedom,Labor Freedom
0,34,china,2024,48.5,46.9,41.6,39.5,69.1,65.7,8.1,68.1,73.6,71.8,20.0,20.0,57.8
1,74,india,2024,52.9,49.2,40.8,52.1,73.7,73.5,6.9,68.3,62.2,69.1,40.0,40.0,58.4
2,218,china,2023,48.3,45.3,38.2,42.0,69.5,65.1,9.8,68.3,73.6,72.5,20.0,20.0,55.2
3,258,india,2023,52.9,49.7,53.0,42.2,78.5,73.8,5.8,64.3,59.8,70.0,40.0,40.0,58.1
4,402,china,2022,48.0,43.7,39.3,37.4,71.2,64.2,11.1,68.8,73.2,70.0,20.0,20.0,57.2


In [None]:
oecd_df = oecd_df.rename(columns={"TIME_PERIOD": "Year"})

economic_freedom_df = economic_freedom_df.rename(columns={"name_web": "Country"})

oecd_df.loc[oecd_df['Country'] != "India", 'Country'] = 'China'

oecd_df['Country'] = oecd_df['Country'].str.upper()
economic_freedom_df['Country'] = economic_freedom_df['Country'].str.upper()
oecd_df.head(20)

Unnamed: 0.1,Unnamed: 0,index,LOCATION,Country,Indicator,SUBJECT,OBS_VALUE,Year
0,0,407,IND,INDIA,FDI restrictiveness,TRANSPORT,0.45,1997
1,1,408,IND,INDIA,FDI restrictiveness,FINANSERV,0.552,1997
2,2,409,IND,INDIA,FDI restrictiveness,BSNSSERV,0.663,1997
3,3,427,IND,INDIA,FDI restrictiveness,TRANSPORT,0.37,2003
4,4,428,IND,INDIA,FDI restrictiveness,FINANSERV,0.512,2003
5,5,429,IND,INDIA,FDI restrictiveness,BSNSSERV,0.603,2003
6,6,434,CHN,CHINA,FDI restrictiveness,PRIMSECT,0.597,2003
7,7,435,CHN,CHINA,FDI restrictiveness,BSNSSERV,0.45,2003
8,8,474,IND,INDIA,FDI flows,OUTWARD,1.560575,2006
9,9,548,IND,INDIA,FDI restrictiveness,MEDIA,0.5,2006


In [5]:
merged_df = pd.merge(oecd_df, economic_freedom_df, on=['Year', 'Country'], how='inner')

In [7]:
merged_df.head()

Unnamed: 0,Unnamed: 0_x,index,LOCATION,Country,Indicator,SUBJECT,OBS_VALUE,Year,Unnamed: 0_y,Overall,...,Judicial Effectiveness,Tax Burden,Government Spending,Fiscal Health,Business Freedom,Trade Freedom,Monetary Freedom,Investment Freedom,Financial Freedom,Labor Freedom
0,0,407,IND,INDIA,FDI restrictiveness,TRANSPORT,0.45,1997,5045,49.7,...,,67.1,88.7,,55.0,13.2,65.1,50.0,30.0,
1,1,408,IND,INDIA,FDI restrictiveness,FINANSERV,0.552,1997,5045,49.7,...,,67.1,88.7,,55.0,13.2,65.1,50.0,30.0,
2,2,409,IND,INDIA,FDI restrictiveness,BSNSSERV,0.663,1997,5045,49.7,...,,67.1,88.7,,55.0,13.2,65.1,50.0,30.0,
3,86,3315,IND,INDIA,FDI restrictiveness,MFG,0.237,1997,5045,49.7,...,,67.1,88.7,,55.0,13.2,65.1,50.0,30.0,
4,87,3317,IND,INDIA,FDI restrictiveness,PRIMSECT,0.488,1997,5045,49.7,...,,67.1,88.7,,55.0,13.2,65.1,50.0,30.0,


In [None]:

merged_df.columns

Index(['Country', 'Indicator', 'SUBJECT', 'OBS_VALUE', 'Year', 'Overall',
       'Property Rights', 'Government Integrity', 'Judicial Effectiveness',
       'Tax Burden', 'Government Spending', 'Fiscal Health',
       'Business Freedom', 'Trade Freedom', 'Monetary Freedom',
       'Investment Freedom', 'Financial Freedom', 'Labor Freedom'],
      dtype='object')

In [None]:
independent_vars = [
    'Overall',
       'Property Rights', 'Government Integrity', 'Judicial Effectiveness',
       'Tax Burden', 'Government Spending', 'Fiscal Health',
       'Business Freedom', 'Trade Freedom', 'Monetary Freedom',
       'Investment Freedom', 'Financial Freedom', 'Labor Freedom'
]

merged_df['Indicator_Binary'] = LabelEncoder().fit_transform(
    merged_df['Indicator']
)

dependent_var = 'Indicator_Binary'


regression_data = merged_df[independent_vars + [dependent_var]].dropna()



for col in independent_vars:
    regression_data[col] = pd.to_numeric(regression_data[col], errors='coerce')

regression_data[dependent_var] = pd.to_numeric(regression_data[dependent_var], errors='coerce')


regression_data

Unnamed: 0,Overall,Property Rights,Government Integrity,Judicial Effectiveness,Tax Burden,Government Spending,Fiscal Health,Business Freedom,Trade Freedom,Monetary Freedom,Investment Freedom,Financial Freedom,Labor Freedom,Indicator_Binary
52,58.4,62.2,46.4,71.5,72.6,67.6,54.8,80.2,71.2,69.8,20.0,20.0,64.9,0
53,58.4,62.2,46.4,71.5,72.6,67.6,54.8,80.2,71.2,69.8,20.0,20.0,64.9,0
54,58.4,62.2,46.4,71.5,72.6,67.6,54.8,80.2,71.2,69.8,20.0,20.0,64.9,0
55,58.4,62.2,46.4,71.5,72.6,67.6,54.8,80.2,71.2,69.8,20.0,20.0,64.9,0
132,56.5,63.0,47.2,64.1,79.4,77.9,13.1,65.6,73.4,73.0,40.0,40.0,41.2,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
387,53.9,49.9,42.4,51.5,79.5,76.2,4.0,63.9,71.0,70.0,40.0,40.0,58.2,0
406,56.5,59.2,48.1,55.9,78.7,78.5,18.0,76.7,69.4,72.1,40.0,40.0,41.3,0
407,56.5,59.2,48.1,55.9,78.7,78.5,18.0,76.7,69.4,72.1,40.0,40.0,41.3,0
408,56.5,59.2,48.1,55.9,78.7,78.5,18.0,76.7,69.4,72.1,40.0,40.0,41.3,0


In [31]:

# 0 indicates restrictive, 1 indicates inflows?
print(regression_data['Indicator_Binary'].unique())

[0 1]


In [None]:
from statsmodels.stats.outliers_influence import variance_inflation_factor

X_with_const = sm.add_constant(regression_data[independent_vars])

vif_data = pd.DataFrame({
    "Variable": X_with_const.columns,
    "VIF": [variance_inflation_factor(X_with_const.values, i) for i in range(X_with_const.shape[1])]
})
print(vif_data)


                  Variable  VIF
0                    const  0.0
1                  Overall  inf
2          Property Rights  inf
3     Government Integrity  inf
4   Judicial Effectiveness  inf
5               Tax Burden  inf
6      Government Spending  inf
7            Fiscal Health  inf
8         Business Freedom  inf
9            Trade Freedom  inf
10        Monetary Freedom  inf
11      Investment Freedom  inf
12       Financial Freedom  inf
13           Labor Freedom  inf


  return 1 - self.ssr/self.centered_tss
  vif = 1. / (1. - r_squared_i)


## Finding
- according to the variance inflation factor of the dataset, economic freedom scores are highly correlated with FDI restrictiveness.
- this aligns with the earlier hypothesis.
- therefore, it can be concluded that the "Overall" variable from the economic freedom dataset OR the FDI restrictiveness categorical variable can be used to perform regression with the FDI model.

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

model = LogisticRegression(penalty='l2', solver='lbfgs')
model.fit(X_scaled, y)

print("Model coefficients:", model.coef_)


Model coefficients: [[ 0.          0.27245953  0.50875614  0.00602257  0.36577481 -0.39058237
  -0.19347746  0.25201508 -0.61325732  0.47993043  0.49217433 -0.00491573
   0.21788857 -0.39795512]]
