# Project2 - Two-Node Network Analysis

**GROUP: Forhad Akbar, Adam Douglas, and Soumya Ghosh**

## Project Overview
1. Identify a large 2-node network dataset—you can start with a dataset in a repository.  Your data should meet the criteria that it consists of ties between and not within two (or more) distinct groups.
2. Reduce the size of the network using a method such as the island method described in chapter 4 of social network analysis.
3. What can you infer about each of the distinct groups?

## Datset Background

In the 1990s Rick Rosenfeld and Norm White used police records to collect data on crime in St. Louis. They began with five homicides and recorded the names of all the individuals who had been involved as victims, suspects or witnesses. They then explored the files and recorded all the other crimes in which those same individuals appeared. This snowball process was continued until they had data on 557 crime events. Those events involved 870 participants of which: 569 appeared as victims 682 appeared as suspects 195 appeared as witnesses, and 41 were dual (they were recorded both as victims and suspects in the same crime. Their data appear, then, as an 870 by 557, individual by crime event matrix. Victims are coded as 1, suspects as 2, witnesses as 3 and duals as 4. In addition Rosenfeld and White recorded the sex of each individual.

### Data Sources:

https://github.com/nderzsy/Network-Analysis-in-Python---Tutorial-JupyterCon18-ODSCEast18/tree/master/datafiles/social/crime

http://moreno.ss.uci.edu/data.html#crime

## Data Import

We begin by importing our data from the source site (github). The data is provided in several files, so there are a few steps that need to be taken to put the data together into a single bipartite graph.

In [145]:
# Import all necessary packages

import pandas as pd
import numpy as np
import math
import networkx as nx
import matplotlib.pyplot as plt
from functools import reduce
%matplotlib inline
import networkx.algorithms.bipartite as bipartite
from pyvis import network as net
import matplotlib.pyplot as plt

In [42]:
# Read in persons and their associated sex

person = pd.read_csv('Dataset/ent.moreno_crime_crime.person.name', sep='\t', header = None, names = ['Name'])
person['Sex'] = pd.read_csv('Dataset/ent.moreno_crime_crime.person.sex', header = None)

person.loc[person.Sex == 0, ['Sex']] = 'F'
person.loc[person.Sex == 1, ['Sex']] = 'M'

In [43]:
# Create the crime dataframe that associates a person with a crime in some manner

crime = pd.read_csv("Dataset/out.moreno_crime_crime", delim_whitespace = True, skiprows = [0,1], names = ['Person', 'Crime'])

In [44]:
# Add the roles of each person (e.g. Person 1 is the suspect in Crime 1)

crime['Role'] = pd.read_csv("Dataset/rel.moreno_crime_crime.person.role", header = None)

In [45]:
# Add the name and sex to the main crime dataframe

crime["Name"] = ""
crime["Sex"] = ""

for i in range(0, len(person)):
    crime.loc[crime.Person == i+1, ['Sex']] = person.iloc[i]["Sex"]
    crime.loc[crime.Person == i+1, ['Name']] = person.iloc[i]["Name"]

# Change the crime event to a string and append "C"
crime = crime.astype({"Crime": str})
crime["Crime"] = "C" + crime["Crime"]

# Remove witness entries
crime.drop(crime[crime['Role'] == "Witness"].index, inplace = True) 

#Final dataset
crime.head()

Unnamed: 0,Person,Crime,Role,Name,Sex
0,1,C1,Suspect,AbelDennis,M
1,1,C2,Victim,AbelDennis,M
2,1,C3,Victim,AbelDennis,M
3,1,C4,Suspect,AbelDennis,M
4,2,C5,Victim,AbramsChad,M


Now that we have the data in a combined dataframe, we can begin to put it together into a graph format.

In [47]:
crime[crime['Crime']=='C82']

Unnamed: 0,Person,Crime,Role,Name,Sex
570,319,C82,Suspect,GreenByron,M
1028,567,C82,Suspect,OneilLinda,F
1120,632,C82,Victim Suspect,ReddickJohn,M
1365,772,C82,Victim,TylerOwen,M


## Creating the Graph
We are going to create the graph with following features -
 1. A person (victim or suspect) will be reperesented as a node with *'bipartite' = 0*
   - For male, the color code is *'dodgerblue'*
   - For female, the color code is *'teal'*
 2. Each crime occurence will be represnted as a node with *'bipartite' = 1* and 'red' as the color code
 3. A *suspect* will be repesneted by an inbound edge towards the crime event
 4. A *victim* will be represented by an outbound edge from the crime event

In [104]:
G = nx.DiGraph()

for n, r, c, s in zip(crime['Name'],crime['Role'],crime['Crime'],crime['Sex']):
    if n not in G.nodes() and r != 'Witness' and s == 'M':
        G.add_node(n, Sex = s, bipartite = 0, color = 'dodgerblue')
    if n not in G.nodes() and r != 'Witness' and s == 'F':
        G.add_node(n, Sex = s, bipartite = 0, color = 'teal')
    if c not in G.nodes():
        G.add_node(c, bipartite = 1, color = 'red')
    if r == 'Suspect':
        G.add_edge(n,c)
    elif r == 'Victim':
        G.add_edge(c,n)

print(nx.info(G))

Name: 
Type: DiGraph
Number of nodes: 1259
Number of edges: 1322
Average in degree:   1.0500
Average out degree:   1.0500


In [105]:
nx.is_bipartite(G)

True

In [106]:
list(G.nodes.data())[1:10]

[('C1', {'bipartite': 1, 'color': 'red'}),
 ('C2', {'bipartite': 1, 'color': 'red'}),
 ('C3', {'bipartite': 1, 'color': 'red'}),
 ('C4', {'bipartite': 1, 'color': 'red'}),
 ('AbramsChad', {'Sex': 'M', 'bipartite': 0, 'color': 'dodgerblue'}),
 ('C5', {'bipartite': 1, 'color': 'red'}),
 ('C6', {'bipartite': 1, 'color': 'red'}),
 ('C7', {'bipartite': 1, 'color': 'red'}),
 ('C8', {'bipartite': 1, 'color': 'red'})]

In [103]:
n = net.Network(height = "1000px", width = "100%", notebook = True, 
                bgcolor = "#ffffff", font_color = "black",
                heading = 'Crimes and People', directed = True)
nx_graph = nx.Graph(G)
n.from_nx(nx_graph, default_node_size = 50, default_edge_weight = 1)
n.show_buttons(filter_=['physics'])
n.show("crimenet.html")

In [81]:
#n1 = net.Network(height = "800px", width = "100%", notebook = True,
#               heading = 'Crimes and People', directed = True)

#n1.add_nodes(G.nodes())
#for u,v in G.edges():
#    n1.add_edge(u,v)

#n1.show("graph.html")

#bipartite.degrees(nx_graph, nx_graph.nodes, nx_graph.weight)

The first thing that jumps out is the number of nodes where there are no connections. Because we specifically excluded the witnesses, this should not be.

Let's look and see why that is: 

In [107]:
list(nx.isolates(G))

[]

In [108]:
unconn = [n for n in nx.isolates(G)]

isol = [(n, c, r) for n, c, r in zip(crime["Name"],crime["Crime"],crime["Role"]) if c in unconn or n in unconn]

sorted(isol, key = lambda x: x[1], reverse = True)

[]

It appears that these unconnected nodes are where the person is both a victim and a suspect. Why would such a thing occur?

Well, one situation might be when the crime is a fight or sorts where both parties are responsible. In those cases, we would see more than one person in that role (e.g. C86). However we see a few that only have a single person. So, unless that person is fighting themselves (see *Fight Club*), that makes no sense.

Let's look at an example:

In [109]:
crime[crime["Crime"]=="C82"]

Unnamed: 0,Person,Crime,Name,Sex,Role
570,319,C82,GreenByron,M,Suspect
1028,567,C82,OneilLinda,F,Suspect
1120,632,C82,ReddickJohn,M,Victim
1120,632,C82,ReddickJohn,M,Suspect
1120,632,C82,ReddickJohn,M,Victim
1120,632,C82,ReddickJohn,M,Suspect
1365,772,C82,TylerOwen,M,Victim


Now we see a bit more clearly. In the above example the person "ReddickJohn" is not the only participant in the crime. Perhaps the authorities suspect that this person was "in on" the crime despite their attempts to present themselves as another victim?

We should split these entries into 2, one as a suspect and one as a victim.

In [110]:
# Split the role column

crime = crime.drop('Role', axis=1).join(crime['Role'].str.split(' ', expand=True).stack().reset_index(level=1, drop=True).rename('Role'))

# Check our example from above
crime[crime["Crime"]=="C82"]

Unnamed: 0,Person,Crime,Name,Sex,Role
570,319,C82,GreenByron,M,Suspect
1028,567,C82,OneilLinda,F,Suspect
1120,632,C82,ReddickJohn,M,Victim
1120,632,C82,ReddickJohn,M,Suspect
1120,632,C82,ReddickJohn,M,Victim
1120,632,C82,ReddickJohn,M,Suspect
1120,632,C82,ReddickJohn,M,Victim
1120,632,C82,ReddickJohn,M,Suspect
1120,632,C82,ReddickJohn,M,Victim
1120,632,C82,ReddickJohn,M,Suspect


Now we see that "ReddickJohn" is listed twice, once as a Victim and once as a Suspect. We will need to regenerate the graph to show this update:

In [112]:
G = nx.DiGraph()

for n, r, c, s in zip(crime['Name'],crime['Role'],crime['Crime'],crime['Sex']):
    if n not in G.nodes() and r != 'Witness' and s == 'M':
        G.add_node(n, Sex = s, bipartite = 0, color = 'dodgerblue')
    if n not in G.nodes() and r != 'Witness' and s == 'F':
        G.add_node(n, Sex = s, bipartite = 0, color = 'teal')
    if c not in G.nodes():
        G.add_node(c, bipartite = 1, color = 'red')
    if r == 'Suspect':
        G.add_edge(n,c)
    elif r == 'Victim':
        G.add_edge(c,n)

print(nx.info(G))

Name: 
Type: DiGraph
Number of nodes: 1259
Number of edges: 1322
Average in degree:   1.0500
Average out degree:   1.0500


In [113]:
n2 = net.Network(height = "1000px", width = "100%", notebook = True, 
                bgcolor = "#ffffff", font_color = "black",
                heading = 'Crimes and PeopleV2', directed = True)
nx_graph = nx.Graph(G)
n2.from_nx(nx_graph, default_node_size = 50, default_edge_weight = 1)
n2.show_buttons(filter_=['physics'])
n2.show("crimenet2.html")

In [114]:
#n2 = net.Network(height = "800px", width = "100%", notebook = True,
#               heading = 'Crimes and People v2', directed = True)

#n2.add_nodes(G.nodes())
#for u,v in G.edges():
#    n2.add_edge(u,v)

#n2.show("graph.html")

This looks MUCH better. Let's check for isolates again:

In [115]:
unconn = [n for n in nx.isolates(G)]

isol = [(n, c, r) for n, c, r in zip(crime["Name"],crime["Crime"],crime["Role"]) if c in unconn or n in unconn]

sorted(isol, key = lambda x: x[1], reverse = True)

[]

As we hoped, there are none.

## Sub-Graph Analysis - People & Crime
Next, we wanted to create sub-graphs for both people (victim/suspect) and crimes and analyze them separately.

In [116]:
top_nodes = {n for n, d in G.nodes(data=True) if d["bipartite"] == 0}
bottom_nodes = set(G) - top_nodes

P = bipartite.projected_graph(G, top_nodes)
C = bipartite.projected_graph(G, bottom_nodes)

Let's visualize them both -

In [117]:
np = net.Network(height = "1000px", width = "100%", notebook = True, 
                bgcolor = "#ffffff", font_color = "black",
                heading = 'Network - People (Victim/Suspect)', directed = True)
nx_graph = nx.Graph(P)
np.from_nx(nx_graph, default_node_size = 50, default_edge_weight = 1)
np.show_buttons(filter_=['physics'])
np.show("people.html")

From the graph above, it looks like there are quite a few people who are not connected with others which could be as a result of some isolated crime occurrences. But there are quite few cluster of nodes where multiple individuals are closely associated. This brings up a 

In [118]:
nc = net.Network(height = "1000px", width = "100%", notebook = True, 
                bgcolor = "#ffffff", font_color = "black",
                heading = 'Network - Crime', directed = True)
nx_graph = nx.Graph(C)
nc.from_nx(nx_graph, default_node_size = 50, default_edge_weight = 1)
nc.show_buttons(filter_=['physics'])
nc.show("crime.html")

### People (Suspect/Victim)
Next, we wanted to deep dive on people associated with teh crime and figure out whether we can surface any specific pattern 

In [141]:
person_df = pd.DataFrame(sorted(nx.degree(P), key = lambda x: x[1], reverse=True), columns=['Person','Degree'])
person_df.head(10)

Unnamed: 0,Person,Degree
0,WillisJenny,15
1,AbramsChad,12
2,TatumAnna,11
3,GodfreyTalia,11
4,CarverJustin,10
5,JonesEzekial,10
6,HemphillBud,9
7,DicksonCarter,9
8,JeffersonArnold,8
9,CandyCarol,7


In [152]:
# Compute centrality measures for women
pcc = pd.DataFrame(nx.closeness_centrality(P).items(), columns=['Person','Closeness Centrality'])
pbc = pd.DataFrame(nx.betweenness_centrality(P).items(), columns=['Person','Betweenness Centrality'])
pdc = pd.DataFrame(nx.degree_centrality(P).items(), columns=['Person','Degree Centrality'])

# Display all measures
data_frames = [person_df, pcc, pbc, pdc]
pmeasures = reduce(lambda left,right: pd.merge(left,right,on=['Person']), data_frames)

pmeasures['Rank'] = pmeasures['Degree Centrality'].rank(ascending=False)
pmeasures.sort_values(by=['Rank']).head(10)

Unnamed: 0,Person,Degree,Closeness Centrality,Betweenness Centrality,Degree Centrality,Rank
0,WillisJenny,15,0.001412,4e-05,0.021186,1.0
1,AbramsChad,12,0.013242,0.00018,0.016949,2.0
2,TatumAnna,11,0.014242,1.4e-05,0.015537,3.5
3,GodfreyTalia,11,0.011299,0.00024,0.015537,3.5
4,CarverJustin,10,0.0,0.0,0.014124,5.5
5,JonesEzekial,10,0.01538,0.0,0.014124,5.5
6,HemphillBud,9,0.001412,4e-05,0.012712,7.5
7,DicksonCarter,9,0.004237,4e-05,0.012712,7.5
8,JeffersonArnold,8,0.007062,2.8e-05,0.011299,9.0
18,CarverJason,7,0.0,0.0,0.009887,14.5


### Crime Analysis


In [153]:
crime_df = pd.DataFrame(sorted(nx.degree(C), key = lambda x: x[1], reverse=True), columns=['Crime','Degree'])
crime_df.head(10)

Unnamed: 0,Crime,Degree
0,C23,27
1,C525,25
2,C550,23
3,C119,20
4,C34,18
5,C7,17
6,C5,17
7,C19,17
8,C431,15
9,C189,15


In [154]:
# Compute centrality measures for women
ccc = pd.DataFrame(nx.closeness_centrality(C).items(), columns=['Crime','Closeness Centrality'])
cbc = pd.DataFrame(nx.betweenness_centrality(C).items(), columns=['Crime','Betweenness Centrality'])
cdc = pd.DataFrame(nx.degree_centrality(C).items(), columns=['Crime','Degree Centrality'])

# Display all measures
data_frames = [crime_df, ccc, cbc, cdc]
pmeasures = reduce(lambda left,right: pd.merge(left,right,on=['Crime']), data_frames)

pmeasures['Rank'] = pmeasures['Degree Centrality'].rank(ascending=False)
pmeasures.sort_values(by=['Rank']).head(10)

Unnamed: 0,Crime,Degree,Closeness Centrality,Betweenness Centrality,Degree Centrality,Rank
0,C23,27,0.007286,0.000346,0.04918,1.0
1,C525,25,0.003643,0.000219,0.045537,2.0
2,C550,23,0.0,0.0,0.041894,3.0
3,C119,20,0.001821,7e-05,0.03643,4.0
4,C34,18,0.018735,0.000179,0.032787,5.0
5,C7,17,0.0,0.0,0.030965,7.0
6,C5,17,0.0,0.0,0.030965,7.0
7,C19,17,0.0,0.0,0.030965,7.0
9,C189,15,0.02743,0.0,0.027322,10.0
10,C426,15,0.0,0.0,0.027322,10.0
