<a href="https://colab.research.google.com/github/shahiryar/conflict-modelling/blob/main/Conflict_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction
## Problem statement
The objective of this project is to develop a predictive model using Support Vector Machines (SVM) to determine the likelihood of conflicts or civil unrest in a specific region or country. By analyzing historical data and various socio-economic, political, and demographic factors, the model aims to provide early warning indicators and insights that can help policymakers and organizations proactively address potential conflicts.



# Data Needs

## Data Needed

The data needed:
> Socio-Economic Indicator
* Year
* GDP
* Gini Coef
* Literacy Rate
* health indicators
* infrstructure development
* employment level
* percent of labor force by total population
* dependency ratio
* income level
* per captia income
* umemployment rate

> Political Idicators
* World Wide Governance Indicator
* Democracy Index
* poltical Instability Task Force

> Demographic
* Ethnic/Religion make up
* population density
* urbanisation rate
* size of middle class 
* migration patterns
* age structure (middle quantile)

> Environmental
* per captia water availabilty
* deforestation rate
* Climate change impact
* natural disaster

> Conflict Related
* Type of Conflict/unrest
* Casualities
* number of conflicts (in the year under review)

[Link](https://colab.research.google.com/drive/1bO-ccukQ-ZdE8mIQwBhHYU3ysM_sjC3b#scrollTo=l48gd4osbfAP)

## Code

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [1]:
import pandas as pd

In [48]:
df = pd.read_csv('/content/drive/MyDrive/Datasets/20year-world-soci-politico-economic.csv', na_values='..', )

In [49]:
df.head()

Unnamed: 0,Time,Time Code,Country Name,Country Code,Access to clean fuels and technologies for cooking (% of population) [EG.CFT.ACCS.ZS],"Access to clean fuels and technologies for cooking, rural (% of rural population) [EG.CFT.ACCS.RU.ZS]","Access to clean fuels and technologies for cooking, urban (% of urban population) [EG.CFT.ACCS.UR.ZS]",Access to electricity (% of population) [EG.ELC.ACCS.ZS],"Access to electricity, rural (% of rural population) [EG.ELC.ACCS.RU.ZS]","Access to electricity, urban (% of urban population) [EG.ELC.ACCS.UR.ZS]",...,Military expenditure (% of GDP) [MS.MIL.XPND.GD.ZS],Political Stability and Absence of Violence/Terrorism: Estimate [PV.EST],Rule of Law: Estimate [RL.EST],Voice and Accountability: Estimate [VA.EST],Adequacy of social insurance programs (% of total welfare of beneficiary households) [per_si_allsi.adq_pop_tot],Coverage of social insurance programs (% of population) [per_si_allsi.cov_pop_tot],Coverage of social safety net programs in poorest quintile (% of population) [per_sa_allsa.cov_q1_tot],International migrant stock (% of population) [SM.POP.TOTL.ZS],"Unemployment, youth total (% of total labor force ages 15-24) (modeled ILO estimate) [SL.UEM.1524.ZS]","Unemployment, total (% of total labor force) (modeled ILO estimate) [SL.UEM.TOTL.ZS]"
0,2022,YR2022,Afghanistan,AFG,,,,,,,...,,,,,,,,,,
1,2022,YR2022,Bangladesh,BGD,,,,,,,...,,,,,,,,,12.928,4.699
2,2022,YR2022,Bhutan,BTN,,,,,,,...,,,,,,,,,14.67,3.604
3,2022,YR2022,India,IND,,,,,,,...,,,,,,,,,23.224,7.33
4,2022,YR2022,Maldives,MDV,,,,,,,...,,,,,,,,,15.126,4.883


## Data Cleaning

In [4]:
df.describe()

Unnamed: 0,Access to clean fuels and technologies for cooking (% of population) [EG.CFT.ACCS.ZS],"Access to clean fuels and technologies for cooking, rural (% of rural population) [EG.CFT.ACCS.RU.ZS]","Access to clean fuels and technologies for cooking, urban (% of urban population) [EG.CFT.ACCS.UR.ZS]",Access to electricity (% of population) [EG.ELC.ACCS.ZS],"Access to electricity, rural (% of rural population) [EG.ELC.ACCS.RU.ZS]","Access to electricity, urban (% of urban population) [EG.ELC.ACCS.UR.ZS]",Account ownership at a financial institution or with a mobile-money-service provider (% of population ages 15+) [FX.OWN.TOTL.ZS],GDP per capita (constant LCU) [NY.GDP.PCAP.KN],Gross capital formation (constant LCU) [NE.GDI.TOTL.KN],Trade (% of GDP) [NE.TRD.GNFS.ZS],...,Military expenditure (% of GDP) [MS.MIL.XPND.GD.ZS],Political Stability and Absence of Violence/Terrorism: Estimate [PV.EST],Rule of Law: Estimate [RL.EST],Voice and Accountability: Estimate [VA.EST],Adequacy of social insurance programs (% of total welfare of beneficiary households) [per_si_allsi.adq_pop_tot],Coverage of social insurance programs (% of population) [per_si_allsi.cov_pop_tot],Coverage of social safety net programs in poorest quintile (% of population) [per_sa_allsa.cov_q1_tot],International migrant stock (% of population) [SM.POP.TOTL.ZS],"Unemployment, youth total (% of total labor force ages 15-24) (modeled ILO estimate) [SL.UEM.1524.ZS]","Unemployment, total (% of total labor force) (modeled ILO estimate) [SL.UEM.TOTL.ZS]"
count,4280.0,4266.0,4266.0,4729.0,4544.0,4676.0,653.0,3895.0,2897.0,4360.0,...,3701.0,3869.0,3861.0,3853.0,394.0,411.0,428.0,784.0,4692.0,4692.0
mean,62.561839,53.009329,72.741742,80.79798,75.377445,90.595375,57.602542,2060684.0,48208920000000.0,85.742416,...,1.970373,-0.027057,-0.027847,-0.022646,36.22576,20.665637,52.998388,10.218443,16.874527,7.732645
std,37.92359,41.188944,34.310303,28.07122,33.601625,17.446701,29.153401,12424950.0,482974800000000.0,55.922315,...,1.438138,0.999585,0.995386,0.99881,42.421437,18.227067,28.794172,15.260712,11.237297,5.356714
min,0.0,0.0,0.0,0.643132,0.522863,3.429757,0.4,360.8784,-8634937000000000.0,0.756876,...,0.0054,-3.312951,-2.590877,-2.313395,0.242614,0.370971,0.0,0.052003,0.304,0.095
25%,23.075,6.78896,46.613441,66.170357,52.408893,90.090456,33.39,16058.09,13945150000.0,53.211282,...,1.143856,-0.667904,-0.791663,-0.857851,25.008318,5.402161,28.94154,1.373828,9.174912,4.17425
50%,78.85,54.65,93.0,98.091935,96.72004,99.595647,55.35,49922.33,170235200000.0,72.435251,...,1.61547,0.06997,-0.181318,0.024783,33.241091,11.515865,53.903945,3.663202,14.34452,6.32
75%,99.9,99.9,100.0,100.0,100.0,100.0,85.38,340426.6,1397067000000.0,102.38377,...,2.380104,0.839443,0.764781,0.876453,42.483496,36.338993,79.367049,11.739982,22.256,9.85425
max,100.0,100.0,100.0,100.0,100.0,100.0,100.0,176817800.0,6152640000000000.0,863.195099,...,20.865745,1.965062,2.124782,1.800992,811.547901,59.520414,99.858678,88.404048,78.778,37.32


In [16]:
df_cf = pd.read_csv('/content/drive/MyDrive/Datasets/ucdp-nonstate-221.csv')

In [20]:
df_cf.columns

Index(['conflict_id', 'dyad_id', 'org', 'side_a_name', 'side_a_name_fulltext',
       'side_a_name_mothertongue', 'side_a_id', 'side_a_components',
       'side_a_2nd', 'gwno_a_2nd', 'side_b_name', 'side_b_name_fulltext',
       'side_b_name_mothertongue', 'side_b_id', 'side_b_components',
       'side_b_2nd', 'gwno_b_2nd', 'start_date', 'start_prec', 'start_date2',
       'start_prec2', 'ep_end', 'ep_end_date', 'ep_end_prec', 'year',
       'best_fatality_estimate', 'low_fatality_estimate',
       'high_fatality_estimate', 'location', 'gwno_location', 'region',
       'version'],
      dtype='object')

In [25]:
features=['year', 'location', 'best_fatality_estimate']
df_cf = df_cf[features]
df_cf['location'] = df_cf.location.str.split(',')

In [29]:
df_cf = df_cf.explode('location').reset_index(drop=True)
df_cf.head()

Unnamed: 0,year,location,best_fatality_estimate
0,2021,Nigeria,41
1,2013,Guinea,98
2,2021,Sudan,412
3,2005,Sudan,130
4,2017,Sudan,41


In [30]:
df_cf['location'] = df_cf.location.str.strip()

In [64]:
features = ['year', 'country', 'best_fatality_estimate']
df_cf.columns= features
df_cf.head()

Unnamed: 0,year,country,best_fatality_estimate
1272,1989,Mauritania,221
958,1989,Afghanistan,167
1080,1989,Lebanon,66
1273,1989,Senegal,221
70,1989,Sri Lanka,25


In [36]:
df_cf.sort_values('year', inplace=True)

In [54]:
print(len(df.columns))
features = ['year', 'year_code', 'country', 'country_code', 'clean_fuel_access', 'clean_fuel_access_rural', 
            'clean_fuel_access_urban', 'electricity', 'electricity_rural', 'electricity_urban',
            'financial_acc_ownership', 'gdp_pc', 'gcf', 'trade', 'houshold_consumption', 'primary_intake',
            'out_of_school', 'compulsory_edu', 'education_bs_eq', 'literacy_rate_youth', 'literacy_rate_female',
            'literacy_rate_male', 'literacy_rate_total', 'co2_em', 'water_stress', 'pm2.5_above_who', 'pm2.5_expo',
            'pop_slums', 'renewable_water_src', 'pop_rural', 'pop_urban', 'urban_growth', 'forest_cov', 'women_seats', 
            'women_violence', 'women_indep', 'women_life_dec', 'women_subj', 'fertility', 'dependency_ratio', 
            'dependency_ratio_old','dependency_ratio_young', 'iodine_consumption', 'contraceptive_prev',
            'female_headed_house', 'exc_breastfeeding','pop_growth', 'food_insec_mod', 'food_insec_sever',
            'undernourishment', 'broadband_sub', 'gini_index', 'income_top20pc', 'income_low20pc', 
            'internet_pen', 'mdp_headcount', 'mpi_0-1','povertry_ratio_$2.15', 'povertry_ratio_$3.65',
            'povertry_ratio_$6.85', 'rail_total', 'r&d',
            'armed_forces', 'corruption', 'gov_effectiveness', 'homocides', 'idp_conflict', 'idp_disaster',
            'idp_conflict_violence_total', 'military_expenditure_gov', 'military_expenditure_gdp',
            'political_stability', 'rule_of_law', 'voice_accountibility', 'social_insurance_adq', 'social_insurance_cov',
            'social_safety_cov', 'intl_migrant', 'umemployment_youth', 'umemployment_total'
            ]
len(features)

80


80

In [55]:
dict_col_name = dict()
for k, i in zip(features, df.columns):
  dict_col_name[k] = i

In [56]:
df.columns = features

In [61]:
df.drop('year_code', axis=1, inplace=True)

In [62]:
df.head()

Unnamed: 0,year,country,country_code,clean_fuel_access,clean_fuel_access_rural,clean_fuel_access_urban,electricity,electricity_rural,electricity_urban,financial_acc_ownership,...,military_expenditure_gdp,political_stability,rule_of_law,voice_accountibility,social_insurance_adq,social_insurance_cov,social_safety_cov,intl_migrant,umemployment_youth,umemployment_total
0,2022,Afghanistan,AFG,,,,,,,,...,,,,,,,,,,
1,2022,Bangladesh,BGD,,,,,,,,...,,,,,,,,,12.928,4.699
2,2022,Bhutan,BTN,,,,,,,,...,,,,,,,,,14.67,3.604
3,2022,India,IND,,,,,,,,...,,,,,,,,,23.224,7.33
4,2022,Maldives,MDV,,,,,,,,...,,,,,,,,,15.126,4.883


In [69]:
df.dropna(subset=['country', 'year'], inplace=True)

In [80]:
df.year = df.year.astype('int64')

In [83]:
dataset_conflict_socio_pol = df.merge(df_cf,how='inner', on=['country', 'year'])

In [84]:
dataset_conflict_socio_pol.to_csv('/content/drive/MyDrive/Datasets/conflictFatality_socioPolitical.csv', index=False)

In [85]:
dataset_conflict_socio_pol.head()

Unnamed: 0,year,country,country_code,clean_fuel_access,clean_fuel_access_rural,clean_fuel_access_urban,electricity,electricity_rural,electricity_urban,financial_acc_ownership,...,political_stability,rule_of_law,voice_accountibility,social_insurance_adq,social_insurance_cov,social_safety_cov,intl_migrant,umemployment_youth,umemployment_total,best_fatality_estimate
0,2021,Bolivia,BOL,,,,,,,68.89,...,-0.318567,-1.163193,-0.109921,,,,,8.451,5.09,43
1,2021,Brazil,BRA,,,,,,,84.04,...,-0.485007,-0.280397,0.278197,,,,,28.496,13.34,25
2,2021,Brazil,BRA,,,,,,,84.04,...,-0.485007,-0.280397,0.278197,,,,,28.496,13.34,38
3,2021,Brazil,BRA,,,,,,,84.04,...,-0.485007,-0.280397,0.278197,,,,,28.496,13.34,25
4,2021,Brazil,BRA,,,,,,,84.04,...,-0.485007,-0.280397,0.278197,,,,,28.496,13.34,80


# Data Preprocessing

## Handling Missing Values

In [87]:
dataset_conflict_socio_pol.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 820 entries, 0 to 819
Data columns (total 80 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   year                         820 non-null    int64  
 1   country                      820 non-null    object 
 2   country_code                 820 non-null    object 
 3   clean_fuel_access            698 non-null    float64
 4   clean_fuel_access_rural      698 non-null    float64
 5   clean_fuel_access_urban      698 non-null    float64
 6   electricity                  749 non-null    float64
 7   electricity_rural            649 non-null    float64
 8   electricity_urban            712 non-null    float64
 9   financial_acc_ownership      164 non-null    float64
 10  gdp_pc                       736 non-null    float64
 11  gcf                          636 non-null    float64
 12  trade                        688 non-null    float64
 13  houshold_consumption

In [93]:
#drop columns where 20% of data is missing
drop_thresh = int(dataset_conflict_socio_pol.shape[0]*0.2)
dataset_conflict_socio_pol.dropna(axis=1, thresh = drop_thresh, inplace=True)

In [95]:
dataset_conflict_socio_pol.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 820 entries, 0 to 819
Data columns (total 65 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   year                         820 non-null    int64  
 1   country                      820 non-null    object 
 2   country_code                 820 non-null    object 
 3   clean_fuel_access            698 non-null    float64
 4   clean_fuel_access_rural      698 non-null    float64
 5   clean_fuel_access_urban      698 non-null    float64
 6   electricity                  749 non-null    float64
 7   electricity_rural            649 non-null    float64
 8   electricity_urban            712 non-null    float64
 9   financial_acc_ownership      164 non-null    float64
 10  gdp_pc                       736 non-null    float64
 11  gcf                          636 non-null    float64
 12  trade                        688 non-null    float64
 13  houshold_consumption

In [115]:
#use knn imputer to impute missing data
from sklearn.compose import ColumnTransformer
from sklearn.impute import KNNImputer

columns_to_impute = dataset_conflict_socio_pol.columns[dataset_conflict_socio_pol.isnull().any()].tolist()
ct = ColumnTransformer([('imputer', KNNImputer(n_neighbors=5, add_indicator=True), columns_to_impute)], remainder='passthrough')
transfomer_imputer_cf_data = ct.fit(dataset_conflict_socio_pol)

In [117]:
transfomer_imputer_cf_data.get_feature_names_out().shape

(112,)