# Dataset Preparation

## 1. Introduction to the dataset

The dataset presented here contains information on all conflicts that happened in Africa from 1997 to 2020.

Source :  https://www.acleddata.com
The Armed Conflict Location & Event Data Project (ACLED) is a disaggregated data collection, analysis, and crisis mapping project. ACLED collects the dates, actors, locations, fatalities, and types of all reported political violence and protest events across Africa, East Asia, South Asia, Southeast Asia, the Middle East, Central Asia and the Caucasus, Latin America and the Caribbean, and Southeastern and Eastern Europe and the Balkans. The ACLED team conducts analysis to describe, explore, and test conflict scenarios, and makes both data and analysis open for free use by the public.
ACLED is a registered non-profit organization with 501(c)(3) status in the United States.

To better understand the data columns this pdf file is provided by ACLED https://www.acleddata.com/wp-content/uploads/2017/12/ACLED-Data-Columns_Quick-ReferenceFINAL.pdf.
You will also find this file with the notebook path. 

## 2. Dataset conversion

The raw dataset material is provided in .xlsx format. Our aim is to first convert the dataset to a .cvs format for better data manipulation with pandas and other frameworks or libraries.

In [3]:
import pandas as pd
read_file = pd.read_excel(r'africa.xlsx')
read_file.to_csv(r'/home/psn/Hochschule/Project/africa.csv')

## 3. Dataset reduction
### Removing unwanted rows

The goal of this step is to remove unwanted data. As mentioned, the dataset contrains records from 1970 to 2020. We would like to have data only from 2010-12-17 ( The start of the arab spring, the date of the self-immolation of Mohamed Bouazizi ) to 2020.

In [15]:
dataframe = pd.read_csv('africa.csv')
dframe = dataframe[dataframe.EVENT_DATE > '2010-12-17']
dframe.head(3)

Unnamed: 0.1,Unnamed: 0,ISO,EVENT_ID_CNTY,EVENT_ID_NO_CNTY,EVENT_DATE,YEAR,TIME_PRECISION,EVENT_TYPE,SUB_EVENT_TYPE,ACTOR1,...,ADMIN3,LOCATION,LATITUDE,LONGITUDE,GEO_PRECISION,SOURCE,SOURCE_SCALE,NOTES,FATALITIES,TIMESTAMP
2483,2483,12,ALG2538,2538,2010-12-20,2010,1,Battles,Armed clash,Military Forces of Algeria (1999-),...,,Ait Dahmane,36.611,3.577,1,TSA Algerie,National,A militant was captured by security forces on ...,0,1563903165
2484,2484,12,ALG2539,2539,2010-12-25,2010,1,Riots,Violent demonstration,Police Forces of Algeria (1999-),...,,Algiers,36.752,3.042,1,Liberte (Algeria),National,Riots broke out in districts covered by the pr...,0,1579554013
2485,2485,12,ALG2540,2540,2010-12-26,2010,1,Battles,Armed clash,AQIM: Al Qaeda in the Islamic Maghreb,...,,Jijel,36.8,5.767,2,Xinhua,International,Two AQLMI militants are killed and five wounde...,2,1572403789


### Removing unwanted columns

Some columns which will not be of importance to the purpose of using this dataset, judged as irrelevant, will be dropped. The columns to be dropped are:
* EVENT_ID_CNTY
* EVENT_ID_NO_CNTY
* YEAR
* TIME_PRECISION
* INTER1
* INTER2
* ADMIN3
* LATITUDE
* LONGITUDE
* GEO_PRECISION
* SOURCE_SCALE
* TIMESTAMP

In [16]:
del dframe['EVENT_ID_CNTY']
del dframe['EVENT_ID_NO_CNTY']
del dframe['YEAR']
del dframe['TIME_PRECISION']
del dframe['INTER1']
del dframe['INTER2']
del dframe['ADMIN3']
del dframe['LATITUDE']
del dframe['LONGITUDE']
del dframe['GEO_PRECISION']
del dframe['SOURCE_SCALE']
del dframe['TIMESTAMP']
dframe.head(5)

Unnamed: 0.1,Unnamed: 0,ISO,EVENT_DATE,EVENT_TYPE,SUB_EVENT_TYPE,ACTOR1,ASSOC_ACTOR_1,ACTOR2,ASSOC_ACTOR_2,INTERACTION,REGION,COUNTRY,ADMIN1,ADMIN2,LOCATION,SOURCE,NOTES,FATALITIES
2483,2483,12,2010-12-20,Battles,Armed clash,Military Forces of Algeria (1999-),,Unidentified Armed Group (Algeria),,13,Northern Africa,Algeria,Boumerdes,Ammal,Ait Dahmane,TSA Algerie,A militant was captured by security forces on ...,0
2484,2484,12,2010-12-25,Riots,Violent demonstration,Police Forces of Algeria (1999-),,Rioters (Algeria),,15,Northern Africa,Algeria,Alger,Sidi M'Hamed,Algiers,Liberte (Algeria),Riots broke out in districts covered by the pr...,0
2485,2485,12,2010-12-26,Battles,Armed clash,AQIM: Al Qaeda in the Islamic Maghreb,,Military Forces of Algeria (1999-),,12,Northern Africa,Algeria,Jijel,Jijel,Jijel,Xinhua,Two AQLMI militants are killed and five wounde...,2
2486,2486,12,2010-12-27,Riots,Violent demonstration,Rioters (Algeria),,,,50,Northern Africa,Algeria,Alger,Baraki,Baraki,Liberte (Algeria),Clashes have taken place in the city of Palmie...,0
2487,2487,12,2010-12-27,Riots,Violent demonstration,Police Forces of Algeria (1999-),,Rioters (Algeria),,15,Northern Africa,Algeria,Alger,Sidi M'Hamed,Sidi M'Hamed,Liberte (Algeria); Al Jazeera,Clashes have taken place in the city of Palmie...,0
