#### Overview: A Regularized Regression Approach to Predicting Democracy Index
**Summary**: This Python assignment explores the relationship between income per capita, democracy, and other demographic features across 195 countries from 1960 to 2000. The task involves employing regularized regression techniques—Ridge, Lasso, Adaptive Lasso, and Elastic Net—to predict the democracy index based on the given features. The notebook concludes with a comparison of the models to select the best predictor of the democracy index, based on their performance.

We analyze the impact of income per capita and demographic factors on democracy levels across 195 countries from 1960 to 2000.

**Data Preparation**
Split: 80% training, 20% testing; exclude missing values.

**Models & Evaluation**
Utilize and evaluate the following models:

Ridge Regression
Lasso Regression
Adaptive Lasso Regression
Elastic Net Regression
Tasks:

Train on the training set.
Optimize parameters via cross-validation.
Report coefficients and MSPE on the test set.

**Conclusion**
Determine the best model for predicting the democracy index based on MSPE and model suitability.

| Variable Name | Description |
|----------|----------|
| country    | country name   |
| year    | year   |
| dem_ind    | index of democracy   |
| log_gdppc    | logarithm of real GDP per capita   |
| log_pop    | logarithm of population   |
| age_1    | fraction of the population age 0-14   |
| age_2    | fraction of the population age 15-29   |
| age_3    | fraction of the population age 30-44   |
| age_4    | fraction of the population age 45-59   |
| age_5    | fraction of the population age 60+   |
| educ    | average years of education for adults (25 years and older)   |
| age_median    | median age   |
| code    | country code   |

In [10]:
import pandas as pd
import numpy as np

df = pd.read_csv("income_democracy.csv")
df.head(10)

Unnamed: 0,country,year,dem_ind,log_gdppc,log_pop,age_1,age_2,age_3,age_4,age_5,educ,age_median,code
0,Andorra,1960,,,,,,,,,,,1
1,Andorra,1965,,,,,,,,,,,1
2,Andorra,1970,0.5,,,,,,,,,,1
3,Andorra,1975,,,,,,,,,,,1
4,Andorra,1980,,,,,,,,,,,1
5,Andorra,1985,,,,,,,,,,,1
6,Andorra,1990,,,,,,,,,,,1
7,Andorra,1995,1.0,,,,,,,,,,1
8,Andorra,2000,1.0,,,,,,,,,,1
9,Afghanistan,1960,0.14,,,0.427272,0.265505,0.165539,0.096638,0.045046,,18.6,2


In [14]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import RidgeCV, LassoCV, ElasticNetCV
from sklearn.metrics import mean_squared_error

df = df.dropna()
df = df.drop(columns=['country'])

# Standardize the data
scaler = StandardScaler()
df_scaled = pd.DataFrame(scaler.fit_transform(df), columns = df.columns)

# Split data into training and testing sets
train, test = train_test_split(df_scaled, test_size=0.2, random_state=42)

# Create the feature matrices and target vectors for training and testing sets
X_train = train.drop('dem_ind', axis=1)
y_train = train['dem_ind']
X_test = test.drop('dem_ind', axis=1)
y_test = test['dem_ind']

### 1. Ridge Regression

In [15]:
from sklearn.linear_model import RidgeCV

# Define the grid of lambda (alpha) values to search over
# You might want to choose a more refined grid
alphas = np.logspace(-4, 4, 100)

# Fit Ridge Regression model with cross-validation to find the optimal alpha
ridge = RidgeCV(alphas=alphas, cv=5)
ridge.fit(X_train, y_train)

# Optimal tuning parameter (lambda) obtained using cross-validation
ridge_alpha_optimal = ridge.alpha_
print("Optimal tuning parameter from Ridge Regression: ", ridge_alpha_optimal)

# Coefficients obtained with optimal lambda
ridge_coefficients = ridge.coef_
print("Ridge coefficients: ", ridge_coefficients)

# Mean Squared Prediction Error for the test set
ridge_predictions = ridge.predict(X_test)
ridge_mse = mean_squared_error(y_test, ridge_predictions)
print("Mean Squared Prediction Error for Ridge Regression: ", ridge_mse)


Optimal tuning parameter from Ridge Regression:  54.62277217684348
Ridge coefficients:  [-0.00858558  0.27852792 -0.03212271 -0.06159966  0.0360874  -0.03687195
  0.03623417  0.09116055  0.22497409  0.07506949 -0.04037391]
Mean Squared Prediction Error for Ridge Regression:  0.4753448657852806


### 2. Lasso Regression

In [16]:
from sklearn.linear_model import LassoCV

# Define the grid of lambda (alpha) values to search over
# You might want to choose a more refined grid
alphas = np.logspace(-4, 4, 100)

# Fit Lasso Regression model with cross-validation to find the optimal alpha
lasso = LassoCV(alphas=alphas, cv=5)
lasso.fit(X_train, y_train)

# Optimal tuning parameter (lambda) obtained using cross-validation
lasso_alpha_optimal = lasso.alpha_
print("Optimal tuning parameter from Lasso Regression: ", lasso_alpha_optimal)

# Coefficients obtained with optimal lambda
lasso_coefficients = lasso.coef_
print("Lasso coefficients: ", lasso_coefficients)

# Mean Squared Prediction Error for the test set
lasso_predictions = lasso.predict(X_test)
lasso_mse = mean_squared_error(y_test, lasso_predictions)
print("Mean Squared Prediction Error for Lasso Regression: ", lasso_mse)


Optimal tuning parameter from Lasso Regression:  0.03853528593710531
Lasso coefficients:  [-0.          0.33507084 -0.         -0.00984714 -0.          0.
  0.          0.12358628  0.23063758  0.         -0.01392352]
Mean Squared Prediction Error for Lasso Regression:  0.4795620994064463


### 3. Adaptive Lasso Regression

In [18]:
from sklearn.linear_model import LinearRegression, LassoCV

# Step 1: Run an initial OLS regression
ols = LinearRegression()
ols.fit(X_train, y_train)

# Step 2: Use inverse of absolute value of coefficient estimates as weights
weights = np.abs(1 / ols.coef_)

# Step 3: Run a Lasso regression with weights
alphas = np.logspace(-4, 4, 100)
adaptive_lasso = LassoCV(alphas=alphas, fit_intercept=False, cv=5)
adaptive_lasso.fit(X_train * weights, y_train)

# Optimal tuning parameter (lambda) obtained using cross-validation
adaptive_lasso_alpha_optimal = adaptive_lasso.alpha_
print("Optimal tuning parameter from Adaptive Lasso Regression: ", adaptive_lasso_alpha_optimal)

# Coefficients obtained with optimal lambda
adaptive_lasso_coefficients = adaptive_lasso.coef_ / weights
print("Adaptive Lasso coefficients: ", adaptive_lasso_coefficients)

# Mean Squared Prediction Error for the test set
adaptive_lasso_predictions = adaptive_lasso.predict(X_test * weights)
adaptive_lasso_mse = mean_squared_error(y_test, adaptive_lasso_predictions)
print("Mean Squared Prediction Error for Adaptive Lasso Regression: ", adaptive_lasso_mse)

Optimal tuning parameter from Adaptive Lasso Regression:  0.022051307399030457
Adaptive Lasso coefficients:  [-8.84569645e-06  3.98006982e-02 -3.18465735e-05 -0.00000000e+00
  0.00000000e+00 -0.00000000e+00  0.00000000e+00  0.00000000e+00
  1.63550711e-02  3.01256798e-02 -1.01433183e-04]
Mean Squared Prediction Error for Adaptive Lasso Regression:  0.4636908277091001


### 4. Elastic Net Regression

In [19]:
from sklearn.linear_model import ElasticNetCV

# Define the grid of alpha (l1_ratio) values and l1_ratio (mixing parameter) to search over
alphas = np.logspace(-4, 4, 100)
l1_ratios = np.linspace(0.1, 1, 10)  # l1_ratio=1 corresponds to Lasso

# Fit Elastic Net model with cross-validation to find the optimal alpha and l1_ratio
elastic_net = ElasticNetCV(alphas=alphas, l1_ratio=l1_ratios, cv=5)
elastic_net.fit(X_train, y_train)

# Optimal tuning parameters obtained using cross-validation
elastic_net_alpha_optimal = elastic_net.alpha_
elastic_net_l1_ratio_optimal = elastic_net.l1_ratio_
print("Optimal alpha from Elastic Net Regression: ", elastic_net_alpha_optimal)
print("Optimal l1_ratio from Elastic Net Regression: ", elastic_net_l1_ratio_optimal)

# Coefficients obtained with optimal parameters
elastic_net_coefficients = elastic_net.coef_
print("Elastic Net coefficients: ", elastic_net_coefficients)

# Mean Squared Prediction Error for the test set
elastic_net_predictions = elastic_net.predict(X_test)
elastic_net_mse = mean_squared_error(y_test, elastic_net_predictions)
print("Mean Squared Prediction Error for Elastic Net Regression: ", elastic_net_mse)

Optimal alpha from Elastic Net Regression:  0.08111308307896872
Optimal l1_ratio from Elastic Net Regression:  0.30000000000000004
Elastic Net coefficients:  [-0.          0.29905701 -0.00468801 -0.04430731  0.          0.
  0.          0.09427257  0.22442681  0.03780381 -0.02477155]
Mean Squared Prediction Error for Elastic Net Regression:  0.4827581412578443


### Conclusion
-------------------------
It looks like these models preform similarly when comparing the mean squared prediction error, but when using this metric the adaptive lasso model has the best preformance on the test set. I would recommend using the adaptive lasso model.