# Armed conflict events in Latin America, the Middle East and Africa

by: Sara Mendoza

Data Analytics - Ironhack Amsterdam / cohort Jan - June 2020

Project 3 - April 2020

## 1 Introduction

On this project, I analyze and visualize Armed Conflict from 2019 in 3 regions: Latin America, the Middle East and Africa.

The data on Armed Conflict can be found here: https://acleddata.com/curated-data-files/

The data on countries demographics can be found here: https://www.kaggle.com/sudalairajkumar/undata-country-profiles/data

The files used have been stored in google drive, and can be downloaded from here: https://drive.google.com/drive/folders/1cMNgPfnamJHK7AOCYj6X9xdhZktulOuM


### Target

I chose to focus on Armed Conflict, as its a subject that interests me and I have read a lot about, but never analyzed. The purpose of this project is to identify the most conflict ridden countries and try to understand from their demographical information if there are other elements that puts them appart from other countries in the same region.

I'll be using a great data source: ACLED (Armed Conflict Location & Event Data) this NGO documents all Armed Conflict events, and provides very clear and clean information for each event.

I decided to download data on Africa, Latin America and the Middle East as they are the most conflict-ridden regions.

To understand more these regions and their profiles, I also downloaded a data set with some demographic information.

## 2 Importing libraries and reading the files

In [None]:
#importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import chart_studio.plotly as py
import cufflinks as cf
from ipywidgets import interact

cf.go_offline()

In [None]:
# reading the data files:

africa = pd.read_excel('../data/Africa_1997-2020_Mar21.xlsx')
latinamerica = pd.read_excel('../data/LatinAmerica_2018-2020_Mar21.xlsx')
middleeast = pd.read_excel('../data/MiddleEast_2015-2020_Mar21-1.xlsx')
demographics = pd.read_excel('../data/country_profile_variables.xlsx')

#testing they are all working
# africa.head()
# latinamerica.head()
# middleeast.head()
# demographics.head()

# demographics.dtypes

#checking for nan values
#null_cols = demographics.isna().sum()
#null_cols[null_cols > 0]

#there is a lot of missing values, but as I don't know which countries I'll be using, I'll leave it for now

## 3 Selecting relevant data and merging files

The imported files are very large, and as ACLED only started documenting events in LatinAmerica in mid 2018, I decided to select all events from 2019 for all regions to have a balanced picture of what happened in the space of one year

In [None]:
# select events only from 2019 
year_africa = africa.loc[africa.YEAR == 2019]
year_latinamerica = latinamerica.loc[latinamerica.YEAR == 2019]
year_middleeast = middleeast.loc[middleeast.YEAR == 2019]

# adding column to signify continent
year_africa['continent'] = 'Africa'
year_latinamerica['continent'] = 'LatinAmerica'
year_middleeast['continent'] = 'MiddleEast'

#concatenating all countries in one file
percountry_df = pd.concat((year_africa,year_latinamerica,year_middleeast))

#percountry_df.head()

In [None]:
#want to know event types per country
eventpercountry_df = percountry_df.pivot_table(index=('continent','COUNTRY'),columns='EVENT_TYPE',values='FATALITIES', aggfunc=(len))

# replace nan by zeros
eventpercountry_df = eventpercountry_df.fillna(0)

# and total events and fatalities per country
fatalitiespercountry_df = percountry_df.pivot_table(index='COUNTRY',values='FATALITIES', aggfunc=(len,sum))
fatalitiespercountry_df = fatalitiespercountry_df.rename(columns={'len': 'total conflicts 2019', 'sum': 'total fatalities 2019'})

# merge everything to have one file
df = pd.merge(eventpercountry_df, demographics, left_on ="COUNTRY" , right_on = "country")
df = pd.merge(df, fatalitiespercountry_df, left_on ="country" , right_on = "COUNTRY")

# adding continent info
cont = percountry_df[['COUNTRY','continent']]
cont = cont.drop_duplicates()
df = pd.merge(df, cont, left_on ="country" , right_on = "COUNTRY",how = 'left')

#df.head()

As countries have different populations, I decided to divide the total events, event types and fatalities by the population of each country to have an event per capita measure

In [None]:
# adding a few new colums per capita %
df['total conflicts 2019 per capita'] = df['total conflicts 2019'] / df['Population in thousands (2017)']
df['total fatalities 2019 per capita'] = df['total fatalities 2019'] / df['Population in thousands (2017)']
df['total protests 2019 per capita'] = df['Protests'] / df['Population in thousands (2017)']
df['total explosions/remote violence 2019 per capita'] = df['Explosions/Remote violence'] / df['Population in thousands (2017)']
df['total battles 2019 per capita'] = df['Battles'] / df['Population in thousands (2017)']


## 4 Initial visualization of the data

As the data set is very large, I will visualize first everything in plots to see if I can already pick out patterns or interesting elements.

In [None]:
# initial chart to see data
df_cont_event =percountry_df[['continent', 'COUNTRY', 'EVENT_TYPE', 'EVENT_DATE']]
df_cont_event

@interact(Continent=list(df_cont_event['continent'].unique()))
         
def dyn_linec(Continent):
    data = df_cont_event[(df_cont_event['continent'] == Continent)]
    
    data.pivot_table(index=['COUNTRY'], columns=['continent', 'EVENT_TYPE'], aggfunc=len).iplot(
    kind='bar',xTitle='Country', yTitle='Total Events',title='Events per country')

In [None]:
#trying to find some correlations in all the data
corr = df.corr()
corr

# Generate a mask for the upper triangle
mask = np.triu(np.ones_like(corr, dtype=np.bool))

# Set up the matplotlib figure
f, ax = plt.subplots(figsize=(20, 15))

# Generate a custom diverging colormap
cmap = sns.diverging_palette(220, 10, as_cmap=True)

# Draw the heatmap with the mask and correct aspect ratio
sns.heatmap(corr, mask=mask, cmap=cmap, vmax=.3, center=0,
            square=True, linewidths=.5, cbar_kws={"shrink": .5});

In the correlation graph I'm only focusing on the last 7 rows, as I want to see the correlation of Armed Conflict events with demographics, and I'm ignoring the correlations of the different demographic information against each oter.

The following points I found interesting and want to explore further:

I - There is a negative correlation between GDP growth and total conflicts and fatalities, but this decreases significantly for protests

II - Education has a low to none correlation between fatalities, but increases for protests, thus more education more protests?

III - All the Armed conflict rows in the data are negatively correlated with labor force participation female pop %

## 5 Visualizing the data

To explore each point, I will create separate plots and analysis

### I - There is a negative correlation between GDP growth and total conflicts and fatalities, but this decreases significantly for protests:

In [None]:
#selecting only the relevant data and seeing the correlation with actual number
df[['country',
    'total conflicts 2019 per capita', 
    'total fatalities 2019 per capita',
    'total Protests 2019 per capita',
    'total Explosions/Remote violence 2019 per capita',
    'total Battles 2019 per capita',
    'GDP growth rate (annual %, const. 2005 prices)']].corr()

The negative correlations goes from -.68 for explosions to -0.03 for protests, showing that in a country where explosions ocurr we can expect the GDP to contract

In [None]:
# Plotting explosions vs GDP growth
explpergdp = df[['country','total Explosions/Remote violence 2019 per capita','GDP growth rate (annual %, const. 2005 prices)']]

explpergdp.iplot( x='total Explosions/Remote violence 2019 per capita', y='GDP growth rate (annual %, const. 2005 prices)', 
                 categories='country',xTitle='total Explosions/Remote violence per capita', yTitle='GDP growth per year', 
                 title='Impact of war on GDP growth')


the graph points go from upper right to lower left, indicating more explosions equal to more contraction of GDP

### II - Education has a low to none correlation between fatalities, but increases for protests, thus more education more protests?

In this second point I want to explore the diference in correlation between Eucation and Fatalities/Protests. 

I'm chosing the column "terciary education (f per 100 pop.)" as it has the largest correlation. 

I also find it interesting that higher education for women that is more correlated with the violence in a country, than other measures of education.

In [None]:
eduvsprotes = df[['country','total Protests 2019 per capita','total fatalities 2019 per capita','Education: Tertiary gross enrol. ratio (f per 100 pop.)']] 

@interact(Selection=['total Protests 2019 per capita', 'total fatalities 2019 per capita'])
def linechart(Selection):
     eduvsprotes.iplot(kind='scatter',x=Selection, xTitle=Selection.title(),
                       y='Education: Tertiary gross enrol. ratio (f per 100 pop.)',
                       yTitle='terciary education (f per 100 pop.)',
                       categories='country',
                       title='Education vs ' + Selection.title())

If we compare The graph for Education vs Fatalities against Education vs Protest, we see an important difference. Countries where women are less educated tend to have more fatalities per capita, while countries where women have received a higher level of education have more protests. The difference between these 2 types of conflict is very important, as one is much less violent than other. It seems then that countries where women are less educated tend more towards violence, and coutries with more education have less.

### III - All the Armed conflict rows in the data are negatively correlated with labor force participation female pop %

The third and last point on this analysis refers to the negative correlation between labor force participation of female population vs armed conflict.

Same as in point 2, we can see that in countries were more of the female population is included in edcutation and labor, the amount of fatalities or violent events (battles / explosions) tends to diminish.

More analysis is required to know which one is the cause and which one is the result (less battles more female education or more female education is less battles).

I started with a histogram to see the frequency and tedency of this data.

In [None]:
# selecting the relevant columns
labvscon = df[['continent','total conflicts 2019 per capita', 
    'total fatalities 2019 per capita',
    'total Protests 2019 per capita',
    'total Explosions/Remote violence 2019 per capita',
    'total Battles 2019 per capita',
    'Labour force participation (female pop. %)']]

sns.distplot(labvscon['Labour force participation (female pop. %)'],bins=10)

As we can see in the above graph, the data is very disperse without a clear tendency. 
I chose to plot it below per continent to compare better between regions and see the deviations between them

In [None]:
# checking per continent the labour force participation
labvscon.pivot_table(index='continent',values='Labour force participation (female pop. %)', aggfunc=(np.mean,np.std))

In [None]:
# plotting to visualize the data
sns.boxplot(x="continent", y='Labour force participation (female pop. %)', data=labvscon)
plt.title("Labour force participation (female pop. %) per continent")
plt.xlabel('continents')   
plt.ylabel('Labour force participation (female pop. %)')
plt.show()

We can see from the above boxplot that Latin America has a very narrow standard deviation, with most of the countries in this region behaving the same way (with 50% participation of female population in labor). There is some outliers, but most of the data is concentrated in the same sliver of 50% to 60%.
Africa and the Middle East however have a larger standard deviation, with countries having different profiles. However the Middle East has clearly the lowest score on this graph, with an average of only 33%.

Below is a comparison of the total conflicts vs the labor force participation of female population. This plot has a right skew, where we see that as labor increases, the total conflicts decrease. The Middle East countries tending towards the bottom half of the graph, meaning less labor participation of women and more conflicts.

In [None]:
# plotting the data
labvscon.corr()
sns.scatterplot(x="total conflicts 2019 per capita", y="Labour force participation (female pop. %)", data=labvscon, hue='continent')
plt.show()

## 6 Conclussion: 
Does the fact that education and labor participation for women is the lowest in the Middle East, make it a more conflict ridden region? Does a higher level of education reduce fatalities and increases protests for regions with developing countries? 

Its hard to pull any hard conclussions on the reasosns WHY, since many elements (several of which are not evaluated or present in this data set) affect a country and the reasons for armed conflict. We can however see some clear correlations. I personally was surprised to see such strong correlations on gender and conflict. 

I would venture to say, that I walk away from this data set with much more confidence that including women in education and economy, is not only beneficial for women themselves, but for the entire population, the decrease of violence and the strive for peace.