# Milestone 2 - Exploratory Analysis

This notebook regroups the different plots of the exploratory analysis of our dataset. 

In [None]:
import pandas as pd
import requests
import re
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

## Colonizing countries

We will first observe which countries were the biggest colonisers over the past centuries. The first common observation is that all of them are european countries.

In [None]:
colonies_df = pd.read_csv('datasets/colonies.csv')

In [None]:
x = np.linspace(0,9,10)
my_xticks = colonies_df['Colonizer Country'].value_counts().index
y = colonies_df['Colonizer Country'].value_counts().values


plt.figure(figsize=(20,10), dpi=60, facecolor='w', edgecolor='k')
sns.set(style="whitegrid")
sns.barplot(x,y,palette="Reds_d")
plt.xticks(x, my_xticks)
plt.ylabel('Number of colonies')
plt.title('Most colonizing countries')
plt.show()

## Decolonisation over time and continent

> Can't continue this part because of lack of dataset cleaning

By looking at the deconolisation dates, we can observe when European Colonisers has lost their grasps on their colonies. (Happening by vague in continents? Communicating between continents? Map of the decolonisation date would be good to look at limitrophs countries)

In [None]:
decolonisation_df = pd.read_csv('datasets/colonies_wikipedia.csv')
decolonisation_df = decolonisation_df[decolonisation_df['Date'] != 'False']
decolonisation_df.head()

In [None]:
x = np.linspace(1776,2017,242)
my_xticks = decolonisation_df['Date'].value_counts().index
y = decolonisation_df['Date'].value_counts().values

plt.figure(figsize=(20,10), dpi=60, facecolor='w', edgecolor='k')
sns.barplot(x,y,palette="Reds_d")
plt.xticks(x, my_xticks)
plt.ylabel('Number of colonies')
plt.title('Most colonizing countries')
plt.show()

In [None]:
y = decolonisation_df['Date'].value_counts()
y.head(21)

## Evolution of conflicts

### Presence of conflicts on continents

In this part, all conflicts are taken into account, given that they present more than 25 casualties.
Maybe separating this table also between minor and major conflicts would be relevant.

In [None]:
ucdp_df = pd.read_csv('datasets/clean_conflict.csv')
ucdp_df.head()

In [None]:
x = np.linspace(1,5,5)
y_ = ucdp_df['region'].value_counts().values
y = y_[:5]
myticks = ['Asia','Africa','Middle East','Americas','Europe']

plt.figure(figsize=(20,10), dpi=60, facecolor='w', edgecolor='k')
sns.set(style="whitegrid")
sns.barplot(x,y,palette="Reds_d")
plt.ylabel('Number of conflicts')
plt.title('Number of conflicts per Regions')
plt.xticks(x-1,myticks)
plt.show()

### What are they fighting for?

The dataset separates the main reason of the conflict. What are the protagonists fighting for? We can observe in the graph below that the main reason of conflicts happens for territorial claims. Then comes governemental conflicts. And finally, only a few of them combine those reason. (Separate this by continents now?)

In [None]:
cleaned_df = ucdp_df.loc[~ucdp_df['region'].isin(['1, 3', '1, 2', '1, 4', '1, 5', '1, 3, 5', '1, 2, 3, 5'])]
myticks = ['Europe','Middle East','Asia','Africa','Americas']
df2 = cleaned_df.groupby(['region', 'incomp'])['region'].count().unstack('incomp').fillna(0)

plt.figure(figsize=(20,10), dpi=60, facecolor='w', edgecolor='k')
df2[[1,2,3]].plot(kind='bar', stacked=True, figsize = (20,10), fontsize = 13)
plt.xticks(np.linspace(1,5,5)-1, myticks)
plt.legend(['Territories', 'Governments', 'Territories & Governements'],fontsize = 11)
plt.title('Origin of conflicts per continents', fontsize = 14)
plt.ylabel('Number of conflicts')
plt.show()

### Period of conflicts

In [None]:
plt.figure(figsize=(20,10), dpi=60, facecolor='w', edgecolor='k')
y = ucdp_df['year'].value_counts()
y = y.sort_index()
plt.bar(y.index, y.values, width = 0.9)
plt.xlabel('Year')
plt.xlabel('Number of conflicts')
plt.title('Number of conflicts per year')
plt.show()

## Relevant countries examples

### Violently decolonized country - Algeria, Congo, Syria, Vietnam?

### Peacefully decolonized country - India, Lebanon, Philippines?