# Project 1

## Dataset: Disorders involving peacekeepers

### Description:

The dataset was downloaded from the ACLED database. It contains instances of political disorders (violent and non-violent) directly or indirectly involving actors in peacekeeping missions from 1997 - 2024. This dataset covers a variety of peacekeeping missions like MINUSMA, AMISOM etc. *More details are available here: https://acleddata.com/knowledge-base/faqs-what-can-and-cannot-be-done-with-acled-data-on-disorder-involving-peacekeepers/*

### Overall Approach:

I found that it was easier to write codes when I was using pandas, as compared to working on the codes the 'hard way.' Nonetheless, it was an interesting exercise especially in terms of interpretation and visualization of data. I took the help of a friend (who studied computer science) who checked my codes and I made some modifications accordingly. 

### Importing pandas & uploading the dataset

In [1]:
import pandas as pd

ModuleNotFoundError: No module named 'pandas'

In [4]:
data_peacekeeping_disorders = pd.read_csv("C:/Users/joosa/OneDrive/Desktop/peacekeeping_2024-10-25.csv")

### Computing mean, median and mode via pandas

I used pandas to calculate the mean, median and mode of the numeric column ('Fatalities')

**Interpretation**:
* **Mean** - On an average, disorders involving peacekeeping actors have only resulted in approximately 1.19 fatalities.
* **Median** - The middle value is 0; this suggests that nearly half of the disorders involving peacekeepers did not result in fatalities.
* **Mode** - the most frequently occuring number is 0; this is consistent with the median, that the majority disorders did not result in fatalities. 

In [21]:
mean_fatalities = data_peacekeeping_disorders["FATALITIES"].mean()
print("a. Mean fatalities from disorders involving peacekeepers is", mean_fatalities)

median_fatalities = data_peacekeeping_disorders["FATALITIES"].median()
print("b. Median fatalities from disorders involving peacekeepers is", median_fatalities)

mode_fatalities = data_peacekeeping_disorders["FATALITIES"].mode()[0]
print("c. Mode fatalities from disorders involving peacekeepers is", mode_fatalities)

a. Mean fatalities from disorders involving peacekeepers is 1.1932650073206441
b. Median fatalities from disorders involving peacekeepers is 0.0
c. Mode fatalities from disorders involving peacekeepers is 0


### Computing mean, median and mode the 'hard way'

**Approach**: I calculated the mean, median and mode using only the Python standard library. This section was slightly challenging since I had to translate mathematical formulas into the code and it took me some time to ensure that the code worked correctly (like, the odd and even lengths in median.)

**Interpretation**: The mean, median and mode remain the same as interpreted above.

In [6]:
def mean(numbers):
    return sum(numbers) / len(numbers)

def median(numbers):
    sorted_list = sorted(numbers)
    n = len(sorted_list)
    mid = n // 2
    if n % 2 == 0:
        return (sorted_list[mid - 1] + sorted_list[mid]) / 2.0
    else:
        return sorted_list[mid]

def mode(numbers):
    count = {}
    for number in numbers:
        count[number] = count.get(number, 0) + 1
    max_count = max(count.values())
    modes = [key for key, value in count.items() if value == max_count]
    if len(modes) == len(numbers):
        return "No mode"
    return modes

fatalities_list = data_peacekeeping_disorders["FATALITIES"].tolist()

mean_fatalities = mean(fatalities_list)
print("Mean fatalities are:", mean_fatalities)

median_fatalities = median(fatalities_list)
print("Median fatalities are:", median_fatalities)

mode_fatalities = mode(fatalities_list)
print("Mode fatalities are:", mode_fatalities)

Mean fatalities are: 1.1932650073206441
Median fatalities are: 0
Mode fatalities are: [0]


### Data Visualization

**Approach**: I had challenges with the grouping and aggregating the fatalities over the years and had to get a friend's help in writing this code. I was also confused about how to portray the data (whether a table or a histogram.)

**Interpretation**: The fatalities ebb and flow over the years from 1997 till 2024. I think the most interesting thing for me was that even though there have been fatalities in each year (except 1999), the mean, median and mode (calculated earlier) actually indicate the opposite, that is, that there have been very few fatalities resulting from disorders involving peacekeepers.

In [22]:
fatalities_over_the_years = {}
for year, fatalities in zip(data_peacekeeping_disorders["YEAR"], data_peacekeeping_disorders["FATALITIES"]):
    fatalities_over_the_years[year] = fatalities_over_the_years.get(year, 0) + fatalities

def Fatalities_Chart(data):
    print("Fatalities from 1997-2024")
    for year, count in sorted(data.items()):
        print(f"{year}: {count}")
Fatalities_Chart(fatalities_over_the_years)

Fatalities from 1997-2024
1997: 108
1998: 87
1999: 0
2000: 1
2001: 50
2002: 3
2003: 40
2004: 15
2005: 54
2006: 105
2007: 56
2008: 31
2009: 78
2010: 114
2011: 60
2012: 133
2013: 226
2014: 406
2015: 325
2016: 296
2017: 372
2018: 416
2019: 136
2020: 157
2021: 389
2022: 221
2023: 165
2024: 31
