# DATS 6103-11 Introduction to Data Mining - Prof. Nima Zahadat

# Project 3 - A Network Analysis of Trade and Security Cooperation Agreements

# Graham Hulsey
# December 15, 2020

In this project, I will use data from the [Correlates of War Database](https://correlatesofwar.org/data-sets) on international trade and defense cooperation agreements. Trade and alliance structure are key concepts in political science and international relations, but network analysis has only recently emerged as a tool to study them. In this project, I will use the Python package NetworkX to graph networks and calculate statistics based on their structure. Then, I will use the graphs and statistics to make inferences about the nature of trade, alliance structure, globalization, and conflict.

In [None]:
# Import necessary packages and turn off warnings
import numpy as np
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
import imageio
from IPython.display import Image
import warnings
warnings.filterwarnings('ignore')

In [None]:
# Load the data:
# Data on trade available here: https://correlatesofwar.org/data-sets/bilateral-trade
trade = pd.read_csv("Dyadic_COW_4.0.csv")
# Data on defense cooperation agreeements available here: https://correlatesofwar.org/data-sets/defense-cooperation-agreement-dataset
defense = pd.read_csv("DCAD-v1.0-main.csv")
# This file will be used to crosswalk numerical country codes with country names
codes = pd.read_csv("codes.csv")

First, there are just a few data cleaning tasks for the defense cooperation agreement dataset.

In [None]:
# Create column to merge with codes.csv and get country name. This cell block merges for one country party to 
# each defense cooperation agreement
defense.rename(columns={'ccode1':'CCode'},inplace=True)
defense2 = pd.merge(defense, codes, on='CCode')
defense2.rename(columns={'StateNme':'state1'},inplace=True)

In [None]:
# This cell block does the same as above for the other country party to each defense cooperation agreement
defense2.rename(columns={'CCode':'CCode1'}, inplace=True)
defense2.rename(columns={'ccode2':'CCode'}, inplace=True)
defense3 = pd.merge(defense2,codes,on='CCode')
defense3.rename(columns={'StateNme':'state2'}, inplace=True)

Now, let's get an idea of how many countries we'll be working with.

In [None]:
# Get a list of each country in the trade data
all_countries = [i for i in trade['importer1']]
all_countries = list(set(all_countries))
len(all_countries)

In [None]:
# Get a list of each country in the defense cooperation agreement data
defense_states = [i for i in defense3['state1'].unique() for i in defense3['state2'].unique()]
defense_states = list(set(defense_states))
len(defense_states)

In the trade data, there are 206 unique countries. We are interested in bilateral trade data, and only in dyads (or pairs) of countries. This means that there are ${206 \choose 2}$ = 21,115 possible pairs of countries. For a network graph, this is simply too many pairs- it would be impossible to read anything off the graph. So, I construct four regions based off World Bank Region definitions and the European Union (EU) member list.

In [None]:
# Create one list of country names for each region
# Middle East
middle_east = ['Algeria', 'Bahrain', 'Djibouti', 'Egypt', 'Iran', 'Iraq', 'Israel', 'Jordan', 'Kuwait',
              'Lebanon', 'Libya', 'Malta', 'Morocco', 'Oman', 'Qatar', 'Saudi Arabia', 'Syria', 'Tunisia',
              'United Arab Emirates', 'Yemen']

# East Asia
e_asia = ['Australia', 'Brunei', 'Cambodia', 'China', 'Fiji', 'Indonesia',
         'Japan', 'Kiribati', 'Korea', 'North Korea', 'Laos', 'Malaysia',
         'Mongolia', 'Myanmar', 'Nauru', 'New Zealand', 'Palau', 'Papua New Guinea',
         'Philippines', 'Singapore', 'Solomon Islands', 'Thailand', 'Tonga','Tuvalu',
         'Vanuatu', 'Vietnam']

# Sub-Saharan Africa
ss_africa = ['Angola', 'Benin', 'Botswana', 'Burkina Faso', 'Burundi', 
             'Cameroon', 'Central African Republic', 'Chad', 'Comoros',
            'Democratic Republic of the Congo', 'Congo', 'Equatorial Guinea',
            'Eritrea', 'Ethiopia', 'Gabon', 'Gambia', 'Ghana', 'Guinea', 'Guinea-Bissau',
            'Kenya', 'Lesotho', 'Liberia', 'Madagascar', 'Malawi', 'Mali', 'Mauritania',
            'Mauritius', 'Mozambique', 'Namibia', 'Niger', 'Nigeria', 'Rwanda', 'Senegal',
            'Seychelles', 'Sierra Leone', 'Somalia', 'South Africa', 'South Sudan',
             'Sudan', 'Tanzania', 'Togo', 'Uganda', 'Zambia', 'Zimbabwe']

# European Union
eu = ['Austria', 'Belgium', 'Bulgaria', 'Croatia', 'Cyprus', 'Czech Republic',
     'Denmark', 'Estonia', 'Finland', 'France', 'Germany', 'Greece','Hungary',
     'Ireland', 'Italy', 'Latvia', 'Lithuania', 'Luxembourg', 'Malta', 'Netherlands',
     'Poland', 'Portugal', 'Romania', 'Slovakia', 'Slovenia', 'Spain',
     'Sweden']

Before we begin plotting, there are three key concepts from network science that I will use. The first is called an adjacency matrix.

An adjacency matrix is a compact notation that can describe the relations, or edges, in an undirected graph. Each row of the matrix refers to one unit of observation (in this case country) and the columns mirror the rows. Each element corresponding to any pair of countries will be 0 if the two countries ar enot connected, and 1 if the two countries are connected.

The second concept is Eigenvector Centrality, which measures how "central" each node is to the entire network. Eigenvector Centrality can be calculated from the adjacency matrix using the following formula:

$x_i = \frac{1}{\lambda} \sum_k a_{k,i} x_{k}$

where $a_{k,i}$ is the element of the adjacency matrix corresponding to countries $k$ and $i$, $x_{k}$ is the Eigenvector Centrality of the $k^{th}$ node, and $\lambda$ is the associated eigenvalue.

The third concept is Average Node Connectedness, which measure the number of nodes each node is connected to. In this project, I normalize the Average Node Connectedness by the number of total nodes in the network $N$ - 1. Thus, the Average Node Connectedness is the average percent connectedness of all nodes in a graph.

In [None]:
# Create a function that will plot one year of trade data for a region
def plot_trade_single(year, region):
    # Get plot title corresponding to selected region
    if region == middle_east:
        region_title = "Middle East"
    elif region == eu:
        region_title = "European Union"
    elif region == ss_africa:
        region_title = "Sub-Saharan Africa"
    elif region == e_asia:
        region_title = "East Asia"
    # Get a list of all countries in a region
    countries = region
    
    # Create an empty matrix that will store total volume of trade between two countries
    df = pd.DataFrame(0,index=countries,columns=countries)
    # Create an empty matrix that will become the adjacency matrix
    adjacency = pd.DataFrame(0, index=countries, columns=countries)

    # Select the chosen year
    temp = trade[trade['year'] == year]
    
    # Calculate the total volume of trade between two countries i and j and add into df
    # Skip if i = j as we ignore self-relations
    for i in countries:
        for j in countries:
            if i != j: 
                if len(temp.flow1[(temp['importer1'] == i) & (temp['importer2'] == j)]) > 0:
                    float1 = float(temp.flow1.loc[(temp['importer1'] == i) & (temp['importer2'] == j)])
                    float2 = float(temp.flow2.loc[(temp['importer1'] == i) & (temp['importer2'] == j)])
                    df.loc[i,j] = float1 + float2
                    df.loc[j,i] = float1 + float2
                else:
                    continue
            else:
                continue
    # Remove any negative values (missing data)
    df[df < 0 ] = 0
    # Create adjacency matrix   
    for i in countries:
        for j in countries:
            if df.loc[i,j] > 0:
                adjacency.loc[i, j] = 1
                adjacency.loc[j, i] = 1
            elif df.loc[i, j] <= 0:
                adjacency.loc[i, j] = 0
                adjacency.loc[j, i] = 0
    # Create a network from the adjacency matrix; this network will be used to calculate Average Node Connectivity           
    temp_network = nx.from_pandas_adjacency(adjacency)
    avg_connectivity = (float(nx.average_node_connectivity(temp_network) / float(len(countries) - 1)))
    # Create a new graph G for plotting
    G = nx.Graph()
    # Spring Layout gives a circular network representation
    pos=nx.spring_layout(countries) 
    # Add one node per country in region
    for country in countries:
        G.add_node(country)
    # Add an edge between countries i and j if their corresponding element in the adjacency matrix is 1
    for i in countries:
        for j in countries:
            if adjacency.loc[i,j] == 1:
                G.add_edge(i, j, alpha = 0.5, weight = df.loc[i,j])
    
    # Calculate eigenvector centrality for each node
    centrality = {}
    centrality = nx.eigenvector_centrality_numpy(G)
    # Rescale eigenvector centrality for plotting purposes
    ec = []
    ec = [(i*1500)**1.2 for i in centrality.values()]
    # Create a list of tuples of edges to pass to plot
    edges = []
    for i in countries:
        for j in countries:
            if adjacency.loc[i,j] == 1:
                edge = (i,j)
                edges.append(edge)
    # Get total volume of trade to use as a weight - optional           
    weights = []
    for i in countries:
        for j in countries:
            vol = df.loc[i, j]
            vol = vol / 1000
            weights.append(vol)
    # Label each node as country name
    labels={}
    for i in countries:
        labels[i] = i
    # Draw network G; set seed = x to prevent nodes from changing positions with each draw   
    G = nx.Graph()
    pos=nx.spring_layout(countries, seed=1)
    plt.figure(figsize=(25,18))
    plt.title("{0} Trade - {1}\nAverage Node Connectivity: {2}".format(region_title,year, round(avg_connectivity, 3)))
    # Draw nodes
    nx.draw_networkx_nodes(G,pos,
                           nodelist=countries,
                           node_color='lightblue',
                           node_size=ec,
                           alpha=1,
                           with_labels=True)
    # Draw edges
    nx.draw_networkx_edges(G,pos,
                           edgelist=edges,
                           alpha=0.2,edge_color='gray',weights=weights)
    # Draw labels
    nx.draw_networkx_labels(G,pos,labels,font_size=14)
    # Save plot as png
    plt.savefig("trade_{0}_{1}.png".format(region_title,year),bbox_inches="tight")
    # Clear plot for next call
    G.clear()


Let's call this function and see what we get.

In [None]:
plot_trade_single(1970, ss_africa)

In [None]:
# Define a plotting function that creates on graph per year and combines them into a gif
# This function is the same as the above function except where noted
def plot_trade_time(startyear, endyear, region):
    
    if region == middle_east:
        region_title = "Middle East"
    elif region == eu:
        region_title = "European Union"
    elif region == ss_africa:
        region_title = "Sub-Saharan Africa"
    elif region == e_asia:
        region_title = "East Asia"
    
    countries = region
    # Move adjacency matrix outside for loop to prevent network from resetting each iteration
    adjacency = pd.DataFrame(0, index=countries, columns=countries)
    # Create list of plot images and file names
    images = []
    file_names = []
    
    for year in range(startyear,endyear+1):
        
        df = pd.DataFrame(0,index=countries,columns=countries)

        temp = trade[trade['year'] == year]

        
        for i in countries:
            for j in countries:
                if i != j:
                    if len(temp.flow1[(temp['importer1'] == i) & (temp['importer2'] == j)]) > 0:
                        float1 = float(temp.flow1.loc[(temp['importer1'] == i) & (temp['importer2'] == j)])
                        float2 = float(temp.flow2.loc[(temp['importer1'] == i) & (temp['importer2'] == j)])
                        df.loc[i,j] = float1 + float2
                        df.loc[j,i] = float1 + float2
                    else:
                        continue
                else:
                    continue

        df[df < 0 ] = 0

        for i in countries:
            for j in countries:
                if df.loc[i,j] > 0:
                    adjacency.loc[i, j] = 1
                    adjacency.loc[j, i] = 1
                elif df.loc[i, j] <= 0:
                    adjacency.loc[i, j] = 0
                    adjacency.loc[j, i] = 0

        temp_network = nx.from_pandas_adjacency(adjacency)
        avg_connectivity = (float(nx.average_node_connectivity(temp_network) / float(len(countries) - 1)))

        G = nx.Graph()
        pos=nx.spring_layout(countries) 
        for country in countries:
            G.add_node(country)

        for i in countries:
            for j in countries:
                if adjacency.loc[i,j] == 1:
                    G.add_edge(i, j, alpha = 0.5, weight = df.loc[i,j])

        centrality = {}
        # If the number of edges in G is 0 in some year, set all eigenvector centrality measures to 0
        if G.number_of_edges() > 0:
            centrality = nx.eigenvector_centrality_numpy(temp_network)

        else:
            for i in range(len(countries)):
                centrality[i] = 0
        
        ec = []
        ec = [(i*1500)**1.2 for i in centrality.values()]
        
        
        edges = []
        for i in countries:
            for j in countries:
                if adjacency.loc[i,j] == 1:
                    edge = (i,j)
                    edges.append(edge)

        weights = []
        for i in countries:
            for j in countries:
                vol = df.loc[i, j]
                vol = vol / 1000
                weights.append(vol)

        labels={}
        for i in countries:
            labels[i] = i

        G = nx.Graph()
        pos=nx.spring_layout(countries, seed=1)
        plt.figure(figsize=(25,18))
        plt.title("{0} Trade - {1}\nAverage Node Connectivity: {2}".format(region_title,year, round(avg_connectivity, 3)))
        nx.draw_networkx_nodes(G,pos,
                               nodelist=countries,
                               node_color='lightblue',
                               node_size=ec,
                               alpha=1,
                               with_labels=True)

        nx.draw_networkx_edges(G,pos,
                               edgelist=edges,
                               alpha=0.2,edge_color='gray',weights=weights)

        nx.draw_networkx_labels(G,pos,labels,font_size=14)
        
        plot_title = "trade_{0}_{1}.png".format(region_title,year)
        file_names.append(plot_title)
        plt.savefig("trade_{0}_{1}.png".format(region_title,year),bbox_inches="tight")
        plt.close()
        images.append(imageio.imread(plot_title))
        
        
        G.clear()
    # Calculate number of years plotted 
    years = endyear - startyear
    # Calculate time for each year to be displayed in gif so that the total gif runs for 30 seconds
    time = 30/years
    # Combine all images (from list images) into a gif and save
    imageio.mimsave('trade_{0}_all_years.gif'.format(region_title), images, duration = time)


Now let's call the plot and load the gif for each of the four regions.

In [None]:
# European Union
plot_trade_time(1870,2014,eu)
Image(filename='trade_European Union_all_years.gif')

In [None]:
# Sub-Saharan Africa
plot_trade_time(1950,2014,ss_africa)
Image(filename='trade_Sub-Saharan Africa_all_years.gif')

In [None]:
# East Asia
plot_trade_time(1920,2014,e_asia)
Image(filename='trade_East Asia_all_years.gif')

In [None]:
# Middle East
plot_trade_time(1950,2014,middle_east)
Image(filename='trade_Middle East_all_years.gif')

Let's do the same thing, but instead of trade, use the data on defense cooperation agreements.

In [None]:
# This plotting function is the same as the single year plotting function for trade, except where noted
def plot_defense_single(year, region):
    
    if region == middle_east:
        region_title = "Middle East"
    elif region == eu:
        region_title = "European Union"
    elif region == ss_africa:
        region_title = "Sub-Saharan Africa"
    elif region == e_asia:
        region_title = "East Asia"
    
    countries = region
    
    df = pd.DataFrame(0,index=countries,columns=countries)
    adjacency = pd.DataFrame(0, index=countries, columns=countries)

    
    temp = defense3[defense3['signYear'] == year]
    
    # In this loop, get counts for each pair of state on both sides of the agreement, ie 'state1' and 'state2'
    for i in countries:
        for j in countries:
            if i != j:
                one_way = len(temp.loc[(temp['state1'] == i) & (temp['state2'] == j)])
                other_way = len(temp.loc[(temp['state1'] == j) & (temp['state2'] == i)])
                total = one_way + other_way
                if total > 0:
                    # Divide by 2 to avoid double counting
                    df.loc[i,j] = int(total / 2)
                    df.loc[j,i] = int(total / 2)
                else:
                    continue
            else:
                continue
    
    
    for i in countries:
        for j in countries:
            if df.loc[i,j] > 0:
                adjacency.loc[i, j] = 1
                adjacency.loc[j, i] = 1
            elif df.loc[i, j] <= 0:
                adjacency.loc[i, j] = 0
                adjacency.loc[j, i] = 0
                
    temp_network = nx.from_pandas_adjacency(adjacency)
    avg_connectivity = (float(nx.average_node_connectivity(temp_network) / float(len(countries) - 1)))
   
    G = nx.Graph()
    pos=nx.spring_layout(countries) 
    for country in countries:
        G.add_node(country)
    
    for i in countries:
        for j in countries:
            if adjacency.loc[i,j] == 1:
                G.add_edge(i, j, alpha = 0.5, weight = df.loc[i,j])
    
    centrality = {}
    centrality = nx.eigenvector_centrality_numpy(G)
    ec = []
    ec = [(i*1500)**1.2 for i in centrality.values()]
    
    edges = []
    for i in countries:
        for j in countries:
            if adjacency.loc[i,j] == 1:
                edge = (i,j)
                edges.append(edge)
                
    weights = []
    for i in countries:
        for j in countries:
            agreements = df.loc[i, j]
            weights.append(agreements)
    
    labels={}
    for i in countries:
        labels[i] = i
        
    G = nx.Graph()
    pos=nx.spring_layout(countries, seed=1)
    plt.figure(figsize=(25,18))
    plt.title("{0} Defense Cooperation Agreements - {1}\nAverage Node Connectivity: {2}".format(region_title,year, round(avg_connectivity, 3)))
    nx.draw_networkx_nodes(G,pos,
                           nodelist=countries,
                           node_color='lightblue',
                           node_size=ec,
                           alpha=1,
                           with_labels=True)

    nx.draw_networkx_edges(G,pos,
                           edgelist=edges,
                           alpha=0.2,edge_color='gray',weights=weights)

    nx.draw_networkx_labels(G,pos,labels,font_size=14)
    
    plt.savefig("dca_{0}_{1}.png".format(region_title,year),bbox_inches="tight")
    
    G.clear()
    

Now let's call this plotting function and take a look.

In [None]:
plot_defense_single(2000,e_asia)

As before, let's look at how these networks evolve over time.

In [None]:
# This plotting function is the same as previous plotting function over time, adapted for defense cooperation agreement
# data, except where noted
def plot_defense_time(startyear, endyear, region):
    
    if region == middle_east:
        region_title = "Middle East"
    elif region == eu:
        region_title = "European Union"
    elif region == ss_africa:
        region_title = "Sub-Saharan Africa"
    elif region == e_asia:
        region_title = "East Asia"
    
    countries = region
    
    adjacency = pd.DataFrame(0, index=countries, columns=countries)
    
    images = []
    file_names = []
    
    for year in range(startyear,endyear+1):
        
        df = pd.DataFrame(0,index=countries,columns=countries)

        temp = defense3[defense3['signYear'] == year]

        
        for i in countries:
            for j in countries:
                if i != j:
                    one_way = len(temp.loc[(temp['state1'] == i) & (temp['state2'] == j)])
                    other_way = len(temp.loc[(temp['state1'] == j) & (temp['state2'] == i)])
                    total = one_way + other_way
                    if total > 0:
                        df.loc[i,j] = int(total / 2)
                        df.loc[j,i] = int(total / 2)
                    else:
                        continue
                else:
                    continue

        for i in countries:
            for j in countries:
                if adjacency.loc[i,j] == 0:
                    if df.loc[i,j] > 0:
                        adjacency.loc[i, j] = 1
                        adjacency.loc[j, i] = 1

        temp_network = nx.from_pandas_adjacency(adjacency)
        avg_connectivity = (float(nx.average_node_connectivity(temp_network) / float(len(countries) - 1)))

        G = nx.Graph()
        pos=nx.spring_layout(countries) 
        for country in countries:
            G.add_node(country)

        for i in countries:
            for j in countries:
                if adjacency.loc[i,j] == 1:
                    G.add_edge(i, j, alpha = 0.5, weight = df.loc[i,j])
       
        centrality = {}

        if G.number_of_edges() > 0:
            centrality = nx.eigenvector_centrality_numpy(temp_network)

        else:
            for i in range(len(countries)):
                centrality[i] = 0
        
        ec = []
        ec = [(i*1500)**1.2 for i in centrality.values()]
    
        edges = []
        for i in countries:
            for j in countries:
                if adjacency.loc[i,j] == 1:
                    edge = (i,j)
                    edges.append(edge)

        weights = []
        for i in countries:
            for j in countries:
                agreements = df.loc[i, j]
                weights.append(agreements)

        labels={}
        for i in countries:
            labels[i] = i

        G = nx.Graph()
        pos=nx.spring_layout(countries, seed=1)
        plt.figure(figsize=(25,18))
        plt.title("{0} Defense Cooperation Agreements - {1}\nAverage Node Connectivity: {2}".format(region_title,year, round(avg_connectivity, 3)))
        nx.draw_networkx_nodes(G,pos,
                               nodelist=countries,
                               node_color='lightblue',
                               node_size=ec,
                               alpha=1,
                               with_labels=True)

        nx.draw_networkx_edges(G,pos,
                               edgelist=edges,
                               alpha=0.2,edge_color='gray',weights=weights)

        nx.draw_networkx_labels(G,pos,labels,font_size=14)
        
        plot_title = "dca_{0}_{1}.png".format(region_title,year)
        file_names.append(plot_title)
        plt.savefig("dca_{0}_{1}.png".format(region_title,year),bbox_inches="tight")
        plt.close()
        images.append(imageio.imread(plot_title))
        
        
        G.clear()
    
    years = endyear - startyear
    # Normalize each frame so it takes 15 seconds to watch the entire gif
    time = 15/years
    imageio.mimsave('dca_{0}_all_years.gif'.format(region_title), images, duration = time)


Now let's plot each region over time, from 1980 - 2010.

In [None]:
# European Union
plot_defense_time(1980,2010,eu)
Image(filename="dca_European Union_all_years.gif")

In [None]:
# Sub-Saharan Africa
plot_defense_time(1980,2010,ss_africa)
Image(filename="dca_Sub-Saharan Africa_all_years.gif")

In [None]:
# Middle East
plot_defense_time(1980,2010,middle_east)
Image(filename="dca_Middle East_all_years.gif")

In [None]:
# East Asia
plot_defense_time(1980,2010,e_asia)
Image(filename="dca_East Asia_all_years.gif")

# Conclusions

There are several important conclusions we can draw from this analysis. First, economic and security networks exhibit vastly different levels of interconnectedness across region and time. Overall, globalization has lead to more interconnectedness, but the trend is not absolute. More research into network formation and network collapse will help add qualitative understanding to these patterns.

Second, large conflicts such as WWI and WWII are both preceeded by and cause trade networks to collapse. This is roughly in line with the theory of neoliberal institutionalism, which argues that trade moderates conflict between states. However, trade is itself a political decision- if states anticipate conflict in the future, they will be less likely to trade and more likely to sever trade with potential adversaries. 

Third, defense cooperation agreements can have cascading effects. It is commonly held that the onset of WWII was due to a cascading system of alliances that were triggered. At the same time, alliances can offer deterrence against attacks from states who do not wish to fight an alliance. Therefore, alliances may at once decrease the probability of war by adding to deterrence while also increasing the probability of conflict cascading through the network should one alliance be called upon.

Lastly, some individual countries see their importance shift over time. Trade deals and other political agreements such as the formation of the EU, tend to "level the playing field" on a network. Countries that forgo influence to enter trade agreements or other agreements give up their influnce and power in exchange for wealth or security. For countries that have little influence, entering trade deals and political agreements can effectively increase their influence and power.

This project is available at [Github](https://github.com/grahamh39/DATS6103-Proect-3-Graham-Hulsey/), [Github Pages](https://grahamh39.github.io/), and [Zenodo](https://zenodo.org/record/4321990). 