# Test a Logistic Regression Model

This is an assignment for week 4 of 'Regression Modeling in Practice' course by Wesleyan University.

The task at hand is to model a multiple logistic regression for chosen dataset and:
1. Discuss the results for the associations between all  explanatory variables and response variable
2. Report whether results supported the original hypothesis
3. Discuss whether there was evidence of confounding for the association between your primary explanatory and response variable

The dataset was prepared earlier and contains information about 217 countries for 2017: GDP, imports, exports, population, ease of doing business, whether a country is lanlocked and WTO status. 

The original hypotesis is that the country's is that countries that are more open to trade are mo likely be members of World Trade Organization.

## Prepare the dataset 

In [28]:
# Import libraries for working with data
import pandas as pd
import numpy as np

In [29]:
# Download prepared csv file
data = pd.read_csv('openness_df.csv')

In [30]:
data.sample(5)

Unnamed: 0,country,year,gdp,gdp_ppc,imports,exports,population,business_ease,code,region,tariff,wto_status,landlocked,openness
151,Pakistan,2017,304567300000.0,1464.993305,53590180000.0,25149090000.0,207896686.0,53.01708,PAK,South Asia,10.710943,Member,0,0.258528
44,Costa Rica,2017,58174550000.0,11752.543401,19194610000.0,19229170000.0,4949954.0,69.30028,CRI,Latin America & Caribbean,5.264909,Member,0,0.660491
140,Namibia,2017,13566190000.0,5646.456008,6324886000.0,5088883000.0,2402603.0,61.10494,NAM,Sub-Saharan Africa,7.73297,Member,0,0.841339
184,Eswatini,2017,4446249000.0,3953.08897,1949223000.0,1915237000.0,1124753.0,58.62573,SWZ,Sub-Saharan Africa,9.307752,Member,1,0.869151
200,Tanzania,2017,53320630000.0,1004.841121,9117114000.0,8072891000.0,54663906.0,53.98306,TZA,Sub-Saharan Africa,8.482603,Member,0,0.322389


In [31]:
# Center quantitative variables 
data['gdp_ppc_c'] = (data['gdp_ppc'] - data['gdp_ppc'].mean()) 
data['population_c'] = (data['population'] - data['population'].mean()) 

# Bin WTO status into two categories
data["wto_status"] = data["wto_status"].replace({'Member': 1, 'Observer': 0, 'No perticipation': 0})

## Logistic Regression Model

Logistic regression model for the association between WTO membership & Levels of Income

In [32]:
# Import statistical libraries
import statsmodels.api
import statsmodels.formula.api as smf

# Build logistic regression model 
model = smf.logit('wto_status ~ openness', data=data).fit()
model.summary()

Optimization terminated successfully.
         Current function value: 0.471728
         Iterations 5


0,1,2,3
Dep. Variable:,wto_status,No. Observations:,183.0
Model:,Logit,Df Residuals:,181.0
Method:,MLE,Df Model:,1.0
Date:,"Thu, 28 May 2020",Pseudo R-squ.:,0.0003432
Time:,19:44:30,Log-Likelihood:,-86.326
converged:,True,LL-Null:,-86.356
,,LLR p-value:,0.8076

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,1.5886,0.361,4.405,0.000,0.882,2.295
openness,-0.0804,0.326,-0.246,0.805,-0.720,0.559


Unfortunately, it seems that the relationship between membership in WTO and trade openness is not statistically significant.

In [33]:
# Add income levels
model = smf.logit('wto_status ~ openness + gdp_ppc_c', data=data).fit()
model.summary()

Optimization terminated successfully.
         Current function value: 0.463573
         Iterations 6


0,1,2,3
Dep. Variable:,wto_status,No. Observations:,183.0
Model:,Logit,Df Residuals:,180.0
Method:,MLE,Df Model:,2.0
Date:,"Thu, 28 May 2020",Pseudo R-squ.:,0.01762
Time:,19:44:46,Log-Likelihood:,-84.834
converged:,True,LL-Null:,-86.356
,,LLR p-value:,0.2183

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,1.8972,0.426,4.454,0.000,1.062,2.732
openness,-0.3385,0.373,-0.908,0.364,-1.069,0.392
gdp_ppc_c,2.185e-05,1.4e-05,1.564,0.118,-5.53e-06,4.92e-05


The level of income also does not have a statistically significant relationship with WTO membership.

In [34]:
# Add Ease of doing business score
model = smf.logit('wto_status ~ openness + gdp_ppc_c + business_ease', data=data).fit()
model.summary()

Optimization terminated successfully.
         Current function value: 0.391374
         Iterations 7


0,1,2,3
Dep. Variable:,wto_status,No. Observations:,175.0
Model:,Logit,Df Residuals:,171.0
Method:,MLE,Df Model:,3.0
Date:,"Thu, 28 May 2020",Pseudo R-squ.:,0.08998
Time:,19:45:04,Log-Likelihood:,-68.49
converged:,True,LL-Null:,-75.263
,,LLR p-value:,0.003596

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,-1.2176,1.304,-0.934,0.350,-3.774,1.338
openness,-0.3234,0.438,-0.739,0.460,-1.181,0.535
gdp_ppc_c,8.913e-06,2.03e-05,0.439,0.660,-3.08e-05,4.87e-05
business_ease,0.0553,0.020,2.746,0.006,0.016,0.095


It seems that business_ease does have a statistically significant relationship with WTO membership.

In [35]:
# Add categorical variable: 'lendlockness' of a country
model = smf.logit('wto_status ~ openness + gdp_ppc_c + business_ease + landlocked', data=data).fit()
model.summary()

Optimization terminated successfully.
         Current function value: 0.389285
         Iterations 7


0,1,2,3
Dep. Variable:,wto_status,No. Observations:,175.0
Model:,Logit,Df Residuals:,170.0
Method:,MLE,Df Model:,4.0
Date:,"Thu, 28 May 2020",Pseudo R-squ.:,0.09484
Time:,19:45:26,Log-Likelihood:,-68.125
converged:,True,LL-Null:,-75.263
,,LLR p-value:,0.006467

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,-1.1740,1.288,-0.912,0.362,-3.698,1.350
openness,-0.3499,0.439,-0.797,0.426,-1.211,0.511
gdp_ppc_c,7.524e-06,1.96e-05,0.385,0.700,-3.08e-05,4.58e-05
business_ease,0.0567,0.020,2.841,0.004,0.018,0.096
landlocked,-0.4264,0.491,-0.869,0.385,-1.388,0.535


In [36]:
# Add applied MFN tariffs
model = smf.logit('wto_status ~ openness + gdp_ppc_c + business_ease + landlocked + tariff', data=data).fit()
model.summary()

Optimization terminated successfully.
         Current function value: 0.325489
         Iterations 7


0,1,2,3
Dep. Variable:,wto_status,No. Observations:,135.0
Model:,Logit,Df Residuals:,129.0
Method:,MLE,Df Model:,5.0
Date:,"Thu, 28 May 2020",Pseudo R-squ.:,0.1057
Time:,19:45:41,Log-Likelihood:,-43.941
converged:,True,LL-Null:,-49.135
,,LLR p-value:,0.06496

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,1.7375,2.117,0.821,0.412,-2.411,5.886
openness,0.1791,0.687,0.260,0.794,-1.168,1.526
gdp_ppc_c,-6.732e-06,2.69e-05,-0.250,0.803,-5.95e-05,4.61e-05
business_ease,0.0245,0.031,0.794,0.427,-0.036,0.085
landlocked,-0.8753,0.599,-1.462,0.144,-2.048,0.298
tariff,-0.1095,0.058,-1.876,0.061,-0.224,0.005


## Odd ratios

In [24]:
params = model.params
conf = model.conf_int()
conf['OR'] = params
conf.columns = ['Lower CI', 'Upper CI', 'OR']
print (np.exp(conf))

               Lower CI    Upper CI        OR
Intercept      0.089732  359.938515  5.683142
openness       0.310935    4.601036  1.196087
gdp_ppc_c      0.999940    1.000046  0.999993
business_ease  0.964711    1.088589  1.024780
landlocked     0.128931    1.347072  0.416749
tariff         0.799408    1.004912  0.896289


## Conclusions

The null hypothesis that 'WTO membership does not have a relationship with trade openness' can not be rejected based on this analysis. The analysis also has not found a statistically significant relationship with the income level of the country and whether a country is landlocked. 

The relationship between WTO membership and 'Ease of doing business' was significant with p = 0.004, but significance disappeared when we accounted for tariffs. We can conclude that tariffs confound the relationship between WTO membership and 'Ease of doing business'.

The tariffs do not have a statistically significant relationship with WTO Membership if we set alpha on 0.05. However, it is the most significant in our analysis with p = 0.061. If we would increase alpha to 0.1 (for the sake of practice of interpreting confidence intervals), then we could conclude that relationship is significant. The relationship is negative (since coefficient is negative and the odds ratio is less than one): the less is most favorite nation applied tariff the more likely a country is a member of WTO, which is understandable since membership in WTO often requires to negotiate tariff reductions. It can be said with 95% certainty that for each decrease in tariff the odds of the country being a member of WTO increase by a factor from 0.8 to 1.