# Water Scarcity and Global Conflict Analysis
This project aims to explore the complex relationship between armed conflict and water scarcity by integrating and analyzing datasets from various sources. We will leverage geospatial and environmental data to assess how water scarcity influences the occurrence and intensity of conflicts.

## Definitions
- Scarcity: Demand for a good or service is greater than the availability of the good or service (Oxford Languages).
- Supply: total freshwater resources available in cubic meters per person, per year (The ImpEE Project).
- Withdrawal: amount extracted for use by country (The ImpEE Project).
- Water Stress: ratio between total freshwater withdrawn (TFWW) and total renewable freshwater resources (TRWR). Water stress = TFWW / TRWR (Wikipedia).
- Water Scarcity: volume of fresh water available does not meet the per person per day recommendations for human health (University of Nottingham).er day
- Human Development Index (HDI): a statistical composite index of life expectancy, education (mean years of schooling completed and expected years of schooling upon entering the education system), and per capita income indicators, which is used to rank countries into four tiers of human development.
- Political Stability Index (PSI): Political Stability and Absence of Violence/Terrorism measures perceptions of the likelihood of political instability and/or politically-motivated violence, including terrorism. Estimate gives the country's score on the aggregate indicator, in units of a standard normal distribution, i.e. ranging from approximately -2.5 to 2.5.

## Data Sources
- Food and Agriculture Organization (FAO) https://data.apps.fao.org/aquastat/?lang=en
- University of Alabama https://internationalconflict.ua.edu/data-download/
- Correlates of War (COW) https://correlatesofwar.org/data-sets/cow-country-codes-2/
- World Bank Group's Data Bank https://databank.worldbank.org/reports.aspx
- International Monetary Fund: https://www.imf.org/en/Home

## Task 1: Data Collection

In [1]:
# Dependencies
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import linregress
from functools import partial, reduce
import json
import requests
from config import api_key

In [2]:
# Collect water scarcity data from the Food and Agriculture Organization (FAO)
aqua_csv = pd.read_csv('Resources/AQUASTAT Dissemination System.csv')

# Collect international conflict data from the University of Alabama
mie_csv = pd.read_csv('Resources/ua-mie-1.0.csv')
micnames = pd.read_csv('Resources/ua-micnames-1.0.csv')

# Collect country codes from Correlates of War (COW)
COW_Country_Codes = pd.read_csv('Resources/COW-country-codes.csv')

# Collect crime datasets annually by country from World Bank Group
crime_csv = pd.read_csv('Resources/wbgroup_crime.csv')

# Collect political stability data from World Bank Group
stability_csv = pd.read_csv('Resources/political_stability.csv')

# Collect precipitation by country from International Monetary Fund
precip_csv = pd.read_csv('Resources/wb_precipitation.csv')

# Collect freshwater resources per capita by country from International Monetary Fund
wtr_rsrc_csv = pd.read_csv('Resources/wb_wtr_rsrc.csv')

# Collect surface temperature by country from International Monetary Fund
srfc_temp_csv = pd.read_csv('Resources/imf_surface_temp.csv')

## Task 2: Data Cleanup

### Cleanup the Militarized Interstate Events (MIE) csv file

In [3]:
# Copy the dataframe with only the columns we want 
mie_df = mie_csv[['styear', 'ccode1', 'eventnum', 'micnum', 'hostlev', 'ccode2']].copy()

# Create a dictionary for the country codes and their names and the confrontation codes and their name
code_to_country = pd.Series(COW_Country_Codes.StateNme.values, index=COW_Country_Codes.CCode).to_dict()
conflict_name = pd.Series(micnames.micname.values, index= micnames.micnum).to_dict()

# Map the country codes to their names from the dictionary and replace
mie_df['ccode1'] = mie_df['ccode1'].map(code_to_country)
mie_df['ccode2'] = mie_df['ccode2'].map(code_to_country)
mie_df['micnum'] = mie_df['micnum'].map(conflict_name)

# Rename columns headers
mie_df = mie_df.rename(columns={'styear': 'Year',
                                'ccode1': 'Country',
                                'ccode2': 'Target Country',
                                'eventnum': 'Event Number',
                                'micnum': 'Conflict Name',
                                'hostlev': 'Hosility Level'
                                })
# Display the clean dataframe
mie_df.head()

Unnamed: 0,Year,Country,Event Number,Conflict Name,Hosility Level,Target Country
0,1902,United States of America,1,Alaska Boundary Dispute (1902),3,United Kingdom
1,1913,Austria-Hungary,1,Serbian and Austro-Hungarian Fighting over Alb...,2,Yugoslavia
2,1946,Albania,2,British Attempts to Pass the Albanian Corfu Ch...,4,United Kingdom
3,1946,United Kingdom,3,British Attempts to Pass the Albanian Corfu Ch...,3,Albania
4,1946,United Kingdom,4,British Attempts to Pass the Albanian Corfu Ch...,3,Albania


### Cleanup the AQUASTAT csv

In [4]:
# Copy the dataframe with only the columns we want 
aqua_df = aqua_csv[['Year', 'Area', 'Variable', 'Value', 'Unit']].copy()

# Rename column header
aqua_df = aqua_df.rename(columns={'Area': 'Country'})

# Replace country with dictionary values
aqua_df['Country'] = aqua_df['Country'].replace(code_to_country)

In [5]:
# Create dataframe for Human Capital Index (max value = 1)
hdi_df = aqua_df.loc[aqua_df['Variable'] == 'Human Development Index (HDI)']
hdi_df = hdi_df.rename(columns={'Value': 'HDI'}).drop(columns=['Variable', 'Unit'])

In [6]:
# Create dataframe for Pop Density (ppl/km2)
pop_dens_df = aqua_df.loc[aqua_df['Variable'] == 'Population density']
pop_dens_df = pop_dens_df.rename(columns={'Value': 'Pop Density'}).drop(columns=['Variable', 'Unit'])

In [7]:
# Create dataframe for Wtr Stress %
wstress_df = aqua_df.loc[aqua_df['Variable'] == 'SDG 6.4.2. Water Stress']
wstress_df = wstress_df.rename(columns={'Value': 'Wtr Stress'}).drop(columns=['Variable', 'Unit'])

In [8]:
# Create dataframe for Total exploitable water resources (1b m3/yr)
tw_res_df = aqua_df.loc[aqua_df['Variable'] == 'Total exploitable water resources']
tw_res_df = tw_res_df.rename(columns={'Value': 'Tot Wtr Resource'}).drop(columns=['Variable', 'Unit'])

In [9]:
# Create dataframe for Total freshwater withdrawal 1b m3/yr)
tfw_wdrl_df = aqua_df.loc[aqua_df['Variable'] == 'Total freshwater withdrawal']
tfw_wdrl_df = tfw_wdrl_df.rename(columns={'Value': 'FreshW Wdrl'}).drop(columns=['Variable', 'Unit'])

In [10]:
# Create dataframe for Total Population (1000ppl)
tpop_df = aqua_df.loc[aqua_df['Variable'] == 'Total population']
tpop_df = tpop_df.rename(columns={'Value': 'Total Population'}).drop(columns=['Variable', 'Unit'])

In [11]:
# Create dataframe for Total Water Withdrawl (ppl/km2)
twdrl_df = aqua_df.loc[aqua_df['Variable'] == 'Total water withdrawal']
twdrl_df = twdrl_df.rename(columns={'Value': 'Total Withdrawl'}).drop(columns=['Variable', 'Unit'])

In [12]:
# Create dataframe for Total water withdrawal per capita (m3/ppl/yr)
tw_wdrl_pc_df = aqua_df.loc[aqua_df['Variable'] == 'Total water withdrawal per capita']
tw_wdrl_pc_df = tw_wdrl_pc_df.rename(columns={'Value': 'Wtr Withdrawl'}).drop(columns=['Variable', 'Unit'])

In [13]:
# Make a list of dataframes
aqua_df_lst = [hdi_df, pop_dens_df, wstress_df, tw_res_df, tfw_wdrl_df, tpop_df, twdrl_df, tw_wdrl_pc_df]

# Create a clean dataframe for water data 
aqua_df_merge = reduce(lambda left,right: pd.merge(left,right,on=['Year', 'Country'],how='outer'), aqua_df_lst)

# Display the clean dataframe
aqua_df_merge.head()

Unnamed: 0,Year,Country,HDI,Pop Density,Wtr Stress,Tot Wtr Resource,FreshW Wdrl,Total Population,Total Withdrawl,Wtr Withdrawl
0,1967,Afghanistan,,15.332583,,,,10010.03,,
1,1967,Albania,,74.17193,,13.0,,2132.443,,
2,1967,Algeria,,5.414997,,7.9,,12897.115,,
3,1967,Andorra,,33.5,,,,15.745,,
4,1967,Angola,,4.674343,,,,5827.503,,


### Cleanup World Bank Group csv files

#### Cleanup World Bank Group Political Stability csv

In [14]:
# Create stability dataframe from csv: set country as index, drop unused columns, stack the remaining columns, and reset the index
stability_df = stability_csv.set_index('Country Name').drop(columns=['Series Name', 'Series Code', 'Country Code']).stack().reset_index()

# Rename the column headers and replace null values ".." with 0
stability_df = stability_df.rename(columns={'Country Name': 'Country',
                                                  'level_1': 'Year',
                                                  0: 'Pol Stability'
                                                 }).replace('..', 0)

# Grab the first four values of the year column and convert to an integer
stability_df['Year'] = stability_df['Year'].str[0:4].astype(int)

# Display the clean dataframe
stability_df.tail()

Unnamed: 0,Country,Year,Pol Stability
5131,Zimbabwe,2018,-0.721038401
5132,Zimbabwe,2019,-0.943286121
5133,Zimbabwe,2020,-1.052728176
5134,Zimbabwe,2021,-0.954425931
5135,Zimbabwe,2022,-0.884499907


#### Cleanup World Bank Group Precipitation csv

In [15]:
# Create precipitation dataframe from csv: set country as index, drop unused columns, stack the remaining columns, and reset the index
precip_df = precip_csv.set_index('Country Name').drop(columns=['Country Code', 'Indicator Name', 'Indicator Code']).stack().reset_index()

# Rename the column headers and replace null values ".." with 0
precip_df = precip_df.rename(columns={'Country Name': 'Country',
                                                    'level_1': 'Year',
                                                    0: 'Precipitation'
                                                    })

# Convert to an integer
precip_df['Year'] = precip_df['Year'].astype(int)

# Display the clean dataframe
precip_df.tail()

Unnamed: 0,Country,Year,Precipitation
10081,Zimbabwe,2016,657.0
10082,Zimbabwe,2017,657.0
10083,Zimbabwe,2018,657.0
10084,Zimbabwe,2019,657.0
10085,Zimbabwe,2020,657.0


#### Cleanup World Bank Group Freshwater Resources Per Capita csv

In [16]:
# Create a freshwater resources dataframe from csv: set country as index, drop unused columns, stack the remaining columns, and reset the index
wtr_rsrc_df = wtr_rsrc_csv.set_index('Country Name').drop(columns=['Country Code', 'Indicator Name', 'Indicator Code']).stack().reset_index()

# Rename the column headers and replace null values ".." with 0
wtr_rsrc_df = wtr_rsrc_df.rename(columns={'Country Name': 'Country',
                                                    'level_1': 'Year',
                                                    0: 'FrshW / Cap'
                                                    })

# Convert to an integer
wtr_rsrc_df['Year'] = wtr_rsrc_df['Year'].astype(int)

# Display the clean dataframe
wtr_rsrc_df.tail()

Unnamed: 0,Country,Year,FrshW / Cap
12852,Zimbabwe,2016,848.284169
12853,Zimbabwe,2017,831.124402
12854,Zimbabwe,2018,814.499743
12855,Zimbabwe,2019,798.457375
12856,Zimbabwe,2020,782.403403


### Cleanup International Monetary Fund Surface Temperature csv

In [17]:
# Create a surface temp dataframe from csv: set country as index, drop unused columns, stack the remaining columns, and reset the index
srfc_temp_df = srfc_temp_csv.set_index('Country').drop(columns=['ObjectId',
                                                                'ISO2',
                                                                'ISO3',
                                                                'Indicator',
                                                                'Unit',
                                                                'Source',
                                                                'CTS Code',
                                                                'CTS Name',
                                                                'CTS Full Descriptor',
                                                                ]).stack().reset_index()

# Rename the column headers and replace null values ".." with 0
srfc_temp_df = srfc_temp_df.rename(columns={'level_1': 'Year',
                                                    0: 'Surf Temp'
                                                    })

# Convert to an integer
srfc_temp_df['Year'] = srfc_temp_df['Year'].astype(int)

# Display the clean dataframe
srfc_temp_df.tail()

Unnamed: 0,Country,Year,Surf Temp
13243,Zimbabwe,2019,1.199
13244,Zimbabwe,2020,0.581
13245,Zimbabwe,2021,0.109
13246,Zimbabwe,2022,-0.251
13247,Zimbabwe,2023,0.612


## Task 3: Data Integration: Create Master Dataframe and Export to csv for Analysis

### Integrate Geographic Data Sets

In [18]:
# Make a list of dataframes
geo_df_lst = [precip_df, wtr_rsrc_df, srfc_temp_df]

# Create a clean dataframe for water data 
geo_df = reduce(lambda left,right: pd.merge(left,right,on=['Year', 'Country'],how='inner'), geo_df_lst)

# Replace country with dictionary values
geo_df['Country'] = geo_df['Country'].replace(code_to_country)

# Display the clean dataframe
geo_df.tail()

Unnamed: 0,Country,Year,Precipitation,FrshW / Cap,Surf Temp
7376,Zimbabwe,2016,657.0,848.284169,1.248
7377,Zimbabwe,2017,657.0,831.124402,0.243
7378,Zimbabwe,2018,657.0,814.499743,0.636
7379,Zimbabwe,2019,657.0,798.457375,1.199
7380,Zimbabwe,2020,657.0,782.403403,0.581


#### Integrate Conflicts, Water, Geo, and Political Stability Datasets

In [19]:
# Merge aquastat and conflicts dataframes by year and country
aqua_mie_df = pd.merge(aqua_df_merge, mie_df, how='left', on=['Year', 'Country'])

# Merge aquastat, conflicts and geo dataframes by year and country
aqua_mie_geo_df = pd.merge(aqua_mie_df, geo_df, how='left', on=['Year', 'Country'])

# Merge aquastat, conflicts, crime, and stability dataframes by year and country
master_df = pd.merge(aqua_mie_geo_df, stability_df, how='left', on=['Year', 'Country'])

# Fill the NaN under conflicts with no conflict
master_df['Conflict Name'] = master_df['Conflict Name'].fillna('No Conflict')
master_df = master_df.fillna(0)

# Display the final merged clean dataframe for analysis
master_df.tail()

Unnamed: 0,Year,Country,HDI,Pop Density,Wtr Stress,Tot Wtr Resource,FreshW Wdrl,Total Population,Total Withdrawl,Wtr Withdrawl,Event Number,Conflict Name,Hosility Level,Target Country,Precipitation,FrshW / Cap,Surf Temp,Pol Stability
26384,2021,Western Asia,0.0,0.0,62.92,0.0,169.22356,289733.124,180.411929,622.68313,0.0,No Conflict,0.0,0,0.0,0.0,0.0,0.0
26385,2021,World,0.0,0.0,18.55,0.0,3949.09152,7915610.122,3990.183502,504.090454,0.0,No Conflict,0.0,0,0.0,0.0,0.0,0.0
26386,2021,Yemen,0.0,62.468779,169.761905,0.0,3.565,32981.641,3.565,108.090437,0.0,No Conflict,0.0,0,0.0,0.0,0.0,0.0
26387,2021,Zambia,0.0,25.874125,2.835498,0.0,1.572,19473.125,1.572,80.726642,0.0,No Conflict,0.0,0,0.0,0.0,0.0,0.052347746
26388,2021,Zimbabwe,0.0,40.929276,46.09162,1.5,4.909679,15993.524,4.909679,306.979212,0.0,No Conflict,0.0,0,0.0,0.0,0.0,-0.954425931


In [20]:
# Write the new merged dataframe to a csv file
master_df.to_csv('Resources/master_data.csv')

## Task 4: Analysis and Visualizations

### Hypothesis: 
Water is one of the most critical resources on the planet for human survival. When the demand for critical resources, such as water, exceeds the supply, the number of conflicts increases.

### Questions:
- How does water scarcity correlate with the frequency and intensity of armed conflicts?
- What are the geographical patterns of conflict relative to water scarcity?
- Can changes in water availability predict increases in conflict events?
- Are certain types of conflicts more likely to occur in water-scarce regions?
- Are there other factors that affect the frequency of armed conflicts in water scarce areas?

### Topic 1: Water Scarcity and Armed Conflicts

### Topic 2: Geographical Patterns Relative to Water Scarcity

### Topic 3: Conflict Types and Water Scarcity

### Topic 4: Other Factors Affecting Frequency of Armed Conflict

#### 4.1: Sociopolitical Factors

#### 4.2: Geographical Factors