# Project Two - Global Terrorism Data
---
Jeff Shamp, John Kellogg, Grace Han - CUNY MSDS 620 - Spring 2021

## Purpose

In this project, we were asked to:

- Identify a large 2-node network dataset—you can start with a dataset in a repository.  
 - Your data should meet the criteria that it consists of ties between and not within two (or more) distinct groups.
- Reduce the size of the network using a method such as the island method described in chapter 4 of social network analysis.
- What can you infer about each of the distinct groups?

We choose to use the same data from our project 1, data from Global Terrorism Database (GTD).

## Data

The data is from Global Terrorism Database (GTD) at the University of Maryland. More information about the database, please visit the [UMD GTD Site](https://www.start.umd.edu/data-tools/global-terrorism-database-gtd).

The GTD “is an open-source database including information on domestic and international terrorist attacks around the world from 1970 through 2019, and now includes more than 200,000 cases”. For each event, the team gather as much data as possible to include:

>Date and location of the incident
<br/>The weapons used and nature of the target
<br/>The number of casualties
<br/>When identifiable – the group or individual responsible

We will subset this database to only the necessary columns for this project as well as subset the start year to 1985 to make the data set more manageable.

In [1]:
import pandas as pd
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
import seaborn as sns
from pyvis.network import Network
import networkx.algorithms.bipartite as bipartite

import warnings
warnings.filterwarnings('ignore')


Bad key "text.kerning_factor" on line 4 in
/Users/jeffshamp/.conda/envs/sps620/lib/python3.7/site-packages/matplotlib/mpl-data/stylelib/_classic_test_patch.mplstyle.
You probably need to get an updated matplotlibrc file from
http://github.com/matplotlib/matplotlib/blob/master/matplotlibrc.template
or from the matplotlib source distribution


In [2]:
df=pd.read_csv('./global_terror_data_proj_1.csv')

In [3]:
# def from text book
def trim_edges(g, weight=1):
        g2=nx.Graph()
        for f, to, edata in g.edges(data=True):
                if edata['weight'] > weight:
                        g2.add_edge(f,to,edata)
        return g2

In [4]:
# def from text book
def island_method(g, iterations=5):
    weights= [edata['weight'] for f,to,edata in g.edges(data=True)]

    mn=int(min(weights))
    mx=int(max(weights))
    #compute the size of the step, so we get a reasonable step in iterations
    step=int((mx-mn)/iterations)

    return [[threshold, trim_edges(g, threshold)] for threshold in range(mn,mx,step)]

In [5]:
# Separating the nodes
ter_df = df[["country_txt", "gname"]]
G=nx.from_pandas_dataframe(ter_df, "country_txt", "gname")

In [6]:
print (nx.info(G))

Name: 
Type: Graph
Number of nodes: 3046
Number of edges: 4249
Average degree:   2.7899


The data has enough nodes and edges to continue for this project.  From previous work, we already know the data can easily be broken into 2 separate nodes for analysis.

In [7]:
#condencing the data down to United States involvement as in Project 1
me_list = ["Syria", "Turkey", "Iraq","Jordan", 
           "Pakistan", "Afghanistan","Iran", "Lebanon"]
ter_df =  ter_df[ter_df.country_txt.isin(me_list)]
G=nx.from_pandas_dataframe(ter_df,
                               "country_txt", 
                               "gname")
ter_df.head()

Unnamed: 0,country_txt,gname
59,Lebanon,Relatives of terrorist
60,Lebanon,Unknown
71,Lebanon,Unknown
79,Lebanon,Unknown
80,Lebanon,Unknown


In [8]:
orgs = [node for node in G.nodes() if node in ter_df.gname.unique()]

In [9]:
W = bipartite.weighted_projected_graph(G, orgs)

In [None]:
C = Network(height='1000px',
            width='1000px',
            bgcolor='#222222', 
            font_color='white', 
            notebook=True)

C.barnes_hut()
for n in W.nodes():
    C.add_node(n)
for e in W.edges():
    C.add_edge(e[0], e[1])
C.show('network_2.html')

## Islands

Its difficult to make sense of the data as a whole.  We can glean insights into the general structure, however, deep analysis would take too much time; we could easily miss key features.  To break down the data further, we will employ both the Islands and cliques methods simultaneously.

The island method uses a system of weights to drop off the lowest values. It allows us to take a graph, like the one above, and systematically "raise the water level" (apply weights) to leave only the strongest amount of activity; leaving them to become their own components.  

Cliques identifies a cohesive group of points; tightly connected to each other and not as connected to other points outside the group. Basically, each point is directly connected to the other points in the group and no other point or node can be added without lessening the connection.  

In [11]:
# generating the data
CC = list(nx.connected_component_subgraphs(W))[0]
islands=island_method(CC)

In [12]:
for i in islands:
    # print the threshold level, size of the graph, and number of connected components
    print(i[0], len(i[1]), len(list(nx.connected_component_subgraphs(i[1]))))

1 53 1
2 18 1
3 9 1
4 5 1
5 5 1
6 2 1
7 2 1


From the numbers above, we can deduce there will be at least 6 "levels" of useful data to analyze, the last two levels (6&7 are basically the same numerically).  While the first level will still have a large number of nodes (53), we can start looking at the nodes which DO NOT have strong connections.  Some times looking at the negative is as useful as looking at the positive.  

In [13]:
# Breaking the islands into seperate dataframes
lvl1 = islands[0][1]
lvl2 = islands[1][1]
lvl3 = islands[2][1]
lvl4 = islands[3][1]
lvl5 = islands[4][1]
lvl6 = islands[5][1]
lvl7 = islands[6][1]

In [14]:
C = Network(height='1000px',
            width='1000px',
            bgcolor='#222222', 
            font_color='white', 
            notebook=True)

C.barnes_hut()
for n in lvl1.nodes():
    C.add_node(n)
for e in lvl1.edges():
    C.add_edge(e[0], e[1])
C.show('island1.html')

In [15]:
eco = bipartite.weighted_projected_graph(G, orgs)
eco1 =  trim_edges(eco, weight = 1)
cliques = pd.DataFrame(nx.find_cliques(eco1))
#only showing the first 10 columns due to the size of the generated dataframe
cliques.iloc[:5,:10]

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,Muslim extremists,Unknown,Southern Front,ISIS network: Islamic State of Iraq and the L...,Al-Qaida network: Al-Qaida in Iraq,,,,,
1,Muslim extremists,Unknown,Kurdistan Freedom Hawks (TAK),Kurdistan Workers' Party (PKK),ISIS network: Islamic State of Iraq and the L...,Hezbollah,Kurdish Rebels,,,
2,Muslim extremists,Unknown,Tawhid and Jihad,ISIS network: Islamic State of Iraq and the L...,Islamist extremists,Al-Qaida network: Al-Qaida in Iraq,,,,
3,Muslim extremists,Unknown,Baloch Nationalists,Sikh Extremists,Sipah-e-Sahaba/Pakistan (SSP),Muslims,,,,
4,Muslim extremists,Unknown,Mujahedin-e Khalq (MEK),Kurdistan Workers' Party (PKK),ISIS network: Islamic State of Iraq and the L...,,,,,


In [16]:
print(nx.info(eco1))

Name: 
Type: Graph
Number of nodes: 53
Number of edges: 417
Average degree:  15.7358


At level 1, we still see the strength of the networks in Pakistan.  It's the only country bubble left (Circle of nodes that include groups like Jundallah, Haqqani network, Korhaasan, and Al-Quida).  While any group in this stage of the analysis, does carry more weight than those not shown here, it's becomes clear even at this stage, there are powerhouses. 

When we look around the powerhouses, we can also see groups who's connection strength is still strong.  We can also assume some of the ones who will start to fall off as we get higher in the levels (Baloch Nationalists, Sikh extremists, etc...).  We do have to take this data with a grain of salt.  'Al-Qaida network: Al-Qaida in Iraq' does not have a lot of entries in the cliques, however, they are still present in the next level.

In [17]:
C = Network(height='1000px',
            width='1000px',
            bgcolor='#222222', 
            font_color='white', 
            notebook=True)

C.barnes_hut()
for n in lvl2.nodes():
    C.add_node(n)
for e in lvl2.edges():
    C.add_edge(e[0], e[1])
C.show('island2.html')

In [18]:
eco = bipartite.weighted_projected_graph(G, orgs)
eco2 =  trim_edges(eco, weight = 2)
cliques = pd.DataFrame(nx.find_cliques(eco2))
cliques

Unnamed: 0,0,1,2,3,4,5
0,Unknown,Muslim extremists,Abdullah Azzam Brigades,Islamist extremists,,
1,Unknown,Muslim extremists,Gunmen,ISIS network: Islamic State of Iraq and the L...,Kurdistan Workers' Party (PKK),Kurdish extremists
2,Unknown,Muslim extremists,Gunmen,ISIS network: Islamic State of Iraq and the L...,Kurdistan Workers' Party (PKK),Hezbollah
3,Unknown,Muslim extremists,Gunmen,ISIS network: Islamic State of Iraq and the L...,Islamist extremists,Shia Muslim extremists
4,Unknown,Muslim extremists,Gunmen,ISIS network: Islamic State of Iraq and the L...,Free Syrian Army,Hezbollah
5,Unknown,Muslim extremists,Gunmen,ISIS network: Islamic State of Iraq and the L...,Shia Muslim extremists,Al-Nusrah Front
6,Unknown,Muslim extremists,Gunmen,Separatists,Islamist extremists,
7,Unknown,Muslim extremists,Gunmen,Muslim Militants,Al-Qaida network: Al-Qaida,Islamist extremists
8,Unknown,Muslim extremists,Gunmen,Militants,Islamist extremists,
9,Unknown,Muslim extremists,Gunmen,Abu Nidal Organization (ANO),Shia Muslim extremists,Islamist extremists


In [19]:
print(nx.info(eco2))

Name: 
Type: Graph
Number of nodes: 18
Number of edges: 67
Average degree:   7.4444


At Level 2 we start getting into the the most important and dangerous groups.  As the average degree drops, the greater chance of having connections to all the other nodes.  As we get higher and higher in the levels, the more connected the nodes become until we are left with only 2 nodes at level 6 and 7. (note: level 7 omitted from this report, it is redundant to level 6).

Above we are seeing the relative importance of ISIS, Al-Qaida, and Hezbolla. Additionally, we see that various Kurdish groups are very active in this network. While the west has partnered with some Kurdish group to combat ISIS, we see that others continue to engage in terrorist acts. It could an interesting project to analyze the extent to which US military involvement has increased or decreased attacks organized by groups like PKK. 

In [20]:
C = Network(height='1000px',
            width='1000px',
            bgcolor='#222222', 
            font_color='white', 
            notebook=True)

C.barnes_hut()
for n in lvl3.nodes():
    C.add_node(n)
for e in lvl3.edges():
    C.add_edge(e[0], e[1])
C.show('island3.html')

In [21]:
eco = bipartite.weighted_projected_graph(G, orgs)
eco3 =  trim_edges(eco, weight = 3)
cliques = pd.DataFrame(nx.find_cliques(eco3))
cliques

Unnamed: 0,0,1,2,3,4
0,Unknown,Muslim extremists,Muslim Militants,Gunmen,Islamist extremists
1,Unknown,Muslim extremists,ISIS network: Islamic State of Iraq and the L...,Kurdistan Workers' Party (PKK),
2,Unknown,Muslim extremists,ISIS network: Islamic State of Iraq and the L...,Gunmen,Islamist extremists
3,Unknown,Muslim extremists,ISIS network: Islamic State of Iraq and the L...,Hezbollah,
4,Unknown,Muslim extremists,Shia Muslim extremists,Gunmen,


In [22]:
print(nx.info(eco3))

Name: 
Type: Graph
Number of nodes: 9
Number of edges: 23
Average degree:   5.1111


Again, we see the rising importance of ISIS as well as the general term for Shia extremism and Hezbollah. Hezbolla being the one group that is _not_ an off spring of the US involvement in the middle east. 

In [23]:
C = Network(height='1000px',
            width='1000px',
            bgcolor='#222222', 
            font_color='white', 
            notebook=True)

C.barnes_hut()
for n in lvl4.nodes():
    C.add_node(n)
for e in lvl4.edges():
    C.add_edge(e[0], e[1])
C.show('island4.html')

In [24]:
eco = bipartite.weighted_projected_graph(G, orgs)
eco4 =  trim_edges(eco, weight = 4)
cliques = pd.DataFrame(nx.find_cliques(eco4))
cliques

Unnamed: 0,0,1,2,3
0,Unknown,Muslim extremists,ISIS network: Islamic State of Iraq and the L...,
1,Unknown,Muslim extremists,Gunmen,Islamist extremists


At this level we see that ISIS is the dominate known group. Beyond a general moniker for Islamist extremists.

In [25]:
print(nx.info(eco4))

Name: 
Type: Graph
Number of nodes: 5
Number of edges: 8
Average degree:   3.2000


In [26]:
C = Network(height='1000px',
            width='1000px',
            bgcolor='#222222', 
            font_color='white', 
            notebook=True)

C.barnes_hut()
for n in lvl5.nodes():
    C.add_node(n)
for e in lvl5.edges():
    C.add_edge(e[0], e[1])
C.show('island5.html')

In [27]:
eco = bipartite.weighted_projected_graph(G, orgs)
eco5 =  trim_edges(eco, weight = 5)
cliques = pd.DataFrame(nx.find_cliques(eco5))
cliques

Unnamed: 0,0,1,2
0,Unknown,Muslim extremists,ISIS network: Islamic State of Iraq and the L...
1,Unknown,Muslim extremists,Gunmen
2,Unknown,Muslim extremists,Islamist extremists


In [28]:
print(nx.info(eco5))

Name: 
Type: Graph
Number of nodes: 5
Number of edges: 7
Average degree:   2.8000


In [None]:
C = Network(height='1000px',
            width='1000px',
            bgcolor='#222222', 
            font_color='white', 
            notebook=True)

C.barnes_hut()
for n in lvl6.nodes():
    C.add_node(n)
for e in lvl6.edges():
    C.add_edge(e[0], e[1])
C.show('island6.html')

In [30]:
eco = bipartite.weighted_projected_graph(G, orgs)
eco6 =  trim_edges(eco, weight = 6)
print(nx.info(eco6))

Name: 
Type: Graph
Number of nodes: 2
Number of edges: 1
Average degree:   1.0000


We finally get down to the strongest nodes. However, as unknown, by it's very nature is 'unknown', we are left with only one group who has the strongest connection to all the other groups: Muslim Extremists.  

## Countries

Next, as this is a dual mode network, we wanted to take a quick flip and analyze the the network from the other perspective.  Focusing on country instead of the groups, we can see which countries start to be the heaviest influencers in the terror groups.     

In [31]:
country = [node for node in G.nodes() if node in ter_df.country_txt.unique()]

In [32]:
V = bipartite.weighted_projected_graph(G, country)

In [33]:
C = Network(height='1000px',
            width='1000px',
            bgcolor='#222222', 
            font_color='white', 
            notebook=True)

C.barnes_hut()
for n in V.nodes():
    C.add_node(n)
for e in V.edges():
    C.add_edge(e[0], e[1])
C.show('network_3.html')

In [34]:
CV = list(nx.connected_component_subgraphs(V))[0]
islands=island_method(CV, 3)

In [35]:
for i in islands:
    # print the threshold level, size of the graph, and number of connected components
    print(i[0], len(i[1]), len(list(nx.connected_component_subgraphs(i[1]))))

2 8 1
7 6 1
12 2 1
17 2 1


The first island is the same as the weighted projection above.  We have chosen to skip it and move to level 2 and 3.  Some interesting differences between the islands and the cliques start to emerge.  

In [36]:
lvl1 = islands[0][1]
lvl2 = islands[1][1]
lvl3 = islands[2][1]

In [37]:
C = Network(height='1000px',
            width='1000px',
            bgcolor='#222222', 
            font_color='white', 
            notebook=True)

C.barnes_hut()
for n in lvl2.nodes():
    C.add_node(n)
for e in lvl2.edges():
    C.add_edge(e[0], e[1])
C.show('C_island2.html')

In [38]:
eco = bipartite.weighted_projected_graph(G, country)
eco2c =  trim_edges(eco, weight = 2)
cliques = pd.DataFrame(nx.find_cliques(eco2c))
cliques

Unnamed: 0,0,1,2,3,4,5,6
0,Turkey,Pakistan,Iraq,Jordan,Syria,Lebanon,Iran
1,Turkey,Pakistan,Iraq,Jordan,Syria,Lebanon,Afghanistan


In [39]:
print(nx.info(eco2c))

Name: 
Type: Graph
Number of nodes: 8
Number of edges: 27
Average degree:   6.7500


Remember as stated earlier, cliques are a measure of cohesion between groups.  The strength of the cohesiveness is measured or adjusted by levels.  Here we have a clear example of a clique which does not follow the island.  Jordan is not part of the Level 2 island but still has a strong cohesion to tie to these countries. This is interesting that Jordan is a neighbor to almost all of major countries with active terror network, but manages to avoid the relative frequency of attacks. 

In [None]:
C = Network(height='1000px',
            width='1000px',
            bgcolor='#222222', 
            font_color='white', 
            notebook=True)

C.barnes_hut()
for n in lvl3.nodes():
    C.add_node(n)
for e in lvl3.edges():
    C.add_edge(e[0], e[1])
C.show('island3.html')

In [41]:
eco = bipartite.weighted_projected_graph(G, country)
eco3c =  trim_edges(eco, weight = 3)
cliques = pd.DataFrame(nx.find_cliques(eco3c))
cliques

Unnamed: 0,0,1,2,3,4,5
0,Turkey,Pakistan,Iraq,Lebanon,Afghanistan,
1,Turkey,Pakistan,Iraq,Lebanon,Syria,Jordan
2,Turkey,Pakistan,Iraq,Lebanon,Syria,Iran


Finally, we can reach the conclusion, Afghanistan and Pakistan are the central countries of these terror groups.  

## Final thoughts

1) Neighbors Pakistan and Afghanistan are the strongest sources of terrorism. 

   - Afghanistan has been at war for 30 of the last 40 years, so this comes as no surprise.
   - Pakistan has not shared the same fate as its neighbor but is still a strong source for terror by the same groups. One might infer that perhaps Pakistan is a bit more than complicit.
    
2) Jordan maintains stability

   - For a small country (10MM) that borders Syria, Iraq, Saudi Arabia, Palestine, and Israel they are able to avoid the formation of connections in regard to terror.  
    
3) Lebanon 

   - This small country has a long standing terror group in Hezbollah that forges strong connections with our other countries through acts of terror. 

### Video Submission

[youtube](https://www.youtube.com/watch?v=WSUSleBMXXc)