# Assignment 6

**GROUP: Forhad Akbar, Adam Douglas, and Soumya Ghosh**

## Background

In the 1990s Rick Rosenfeld and Norm White used police records to collect data on crime in St. Louis. They began with five homicides and recorded the names of all the individuals who had been involved as victims, suspects or witnesses. They then explored the files and recorded all the other crimes in which those same individuals appeared. This snowball process was continued until they had data on 557 crime events. Those events involved 870 participants of which: 569 appeared as victims 682 appeared as suspects 195 appeared as witnesses, and 41 were dual (they were recorded both as victims and suspects in the same crime. Their data appear, then, as an 870 by 557, individual by crime event matrix. Victims are coded as 1, suspects as 2, witnesses as 3 and duals as 4. In addition Rosenfeld and White recorded the sex of each individual.

Data Sources:

https://github.com/nderzsy/Network-Analysis-in-Python---Tutorial-JupyterCon18-ODSCEast18/tree/master/datafiles/social/crime

http://moreno.ss.uci.edu/data.html#crime

## Data Import

We begin by importing our data from the source site (github). The data is provided in several files, so there are a few steps that need to be taken to put the data together into a single bipartite graph.

In [125]:
# Import all necessary packages

import pandas as pd
import numpy as np
import math
import networkx as nx
import matplotlib.pyplot as plt
%matplotlib inline
import networkx.algorithms.bipartite as bipartite
from pyvis import network as net
import matplotlib.pyplot as plt

In [126]:
# Read in persons and their associate sex

person = pd.read_csv('ent.moreno_crime_crime.person.name', sep='\t', header = None, names = ['Name'])
person['Sex'] = pd.read_csv('ent.moreno_crime_crime.person.sex', header = None)

person.loc[person.Sex == 0, ['Sex']] = 'F'
person.loc[person.Sex == 1, ['Sex']] = 'M'

In [127]:
# Create the crime dataframe that associates a person with a crime in some manner

crime = pd.read_csv("out.moreno_crime_crime", delim_whitespace = True, skiprows = [0,1], names = ['Person', 'Crime'])

In [128]:
# Add the roles of each person (e.g. Person 1 is the suspect in Crime 1)

crime['Role'] = pd.read_csv("rel.moreno_crime_crime.person.role", header = None)

In [129]:
# Add the name and sex to the main crime dataframe

crime["Name"] = ""
crime["Sex"] = ""

for i in range(0, len(person)):
    crime.loc[crime.Person == i+1, ['Sex']] = person.iloc[i]["Sex"]
    crime.loc[crime.Person == i+1, ['Name']] = person.iloc[i]["Name"]

# Change the crime event to a string and append "C"
crime = crime.astype({"Crime": str})
crime["Crime"] = "C" + crime["Crime"]

# Remove witness entries
crime.drop(crime[crime['Role'] == "Witness"].index, inplace = True) 

#Final dataset
crime.head()

Unnamed: 0,Person,Crime,Role,Name,Sex
0,1,C1,Suspect,AbelDennis,M
1,1,C2,Victim,AbelDennis,M
2,1,C3,Victim,AbelDennis,M
3,1,C4,Suspect,AbelDennis,M
4,2,C5,Victim,AbramsChad,M


Now that we have the data in a combined dataframe, we can begin to put it together into a graph format.

## Creating the Graph

In [120]:
G = nx.DiGraph()

for n, r, c, s in zip(crime['Name'],crime['Role'],crime['Crime'],crime['Sex']):
    if n not in G.nodes() and r != 'Witness':
        G.add_node(n, Sex = s, bipartite = 0)
    if c not in G.nodes():
        G.add_node(c, bipartite = 1)
    if r == 'Suspect':
        G.add_edge(n,c)
    elif r == 'Victim':
        G.add_edge(c,n)

print(nx.info(G))

Name: 
Type: DiGraph
Number of nodes: 1259
Number of edges: 1240
Average in degree:   0.9849
Average out degree:   0.9849


In [121]:
nx.is_bipartite(G)

True

In [122]:
n1 = net.Network(height = "800px", width = "100%", notebook = True,
               heading = 'Crimes and People', directed = True)

n1.add_nodes(G.nodes())
for u,v in G.edges():
    n1.add_edge(u,v)

n1.show("graph.html")

The first thing that jumps out is the number of nodes where there are no connections. Because we specifically excluded the witnesses, this should not be.

Let's look and see why that is: 

In [134]:
unconn = [n for n in nx.isolates(G)]

isol = [(n, c, r) for n, c, r in zip(crime["Name"],crime["Crime"],crime["Role"]) if c in unconn or n in unconn]

sorted(isol, key = lambda x: x[1], reverse = True)

[('BeckerMax', 'C86', 'Victim Suspect'),
 ('TraskBenjie', 'C86', 'Victim Suspect'),
 ('ReddickJohn', 'C82', 'Victim Suspect'),
 ('AndrewsSally', 'C54', 'Victim Suspect'),
 ('KirklandRudy', 'C439', 'Victim Suspect'),
 ('KetterPercy', 'C436', 'Victim Suspect'),
 ('StanleyMaurice', 'C390', 'Victim Suspect'),
 ('GleesonMatt', 'C341', 'Victim Suspect'),
 ('TillieNigel', 'C324', 'Victim Suspect'),
 ('FindlayGary', 'C318', 'Victim Suspect'),
 ('NoblesCary', 'C318', 'Victim Suspect'),
 ('EvansJay', 'C311', 'Victim Suspect'),
 ('RosenHenry', 'C271', 'Victim Suspect'),
 ('CoreyAlonzo', 'C241', 'Victim Suspect'),
 ('GuntherMatt', 'C241', 'Victim Suspect'),
 ('KirklandNiles', 'C241', 'Victim Suspect'),
 ('ConrackCarol', 'C228', 'Victim Suspect'),
 ('CanfieldAristides', 'C192', 'Victim Suspect'),
 ('SprintMelody', 'C192', 'Victim Suspect'),
 ('StithCarlton', 'C169', 'Victim Suspect'),
 ('BoyleAlice', 'C151', 'Victim Suspect'),
 ('ForesterCarol', 'C146', 'Victim Suspect')]

It appears that these unconnected nodes are where the person is both a victim and a suspect. Why would such a thing occur?

Well, one situation might be when the crime is a fight or sorts where both parties are responsible. In those cases, we would see more than one person in that role (e.g. C86). However we see a few that only have a single person. So, unless that person is fighting themselves (see *Fight Club*), that makes no sense.

Let's look at an example:

In [138]:
crime[crime["Crime"]=="C82"]

Unnamed: 0,Person,Crime,Role,Name,Sex
570,319,C82,Suspect,GreenByron,M
1028,567,C82,Suspect,OneilLinda,F
1120,632,C82,Victim Suspect,ReddickJohn,M
1365,772,C82,Victim,TylerOwen,M


Now we see a bit more clearly. In the above example the person "ReddickJohn" is not the only participant in the crime. Perhaps the authorities suspect that this person was "in on" the crime despite their attempts to present themselves as another victim?

We should split these entries into 2, one as a suspect and one as a victim.

In [140]:
# Split the role column

crime = crime.drop('Role', axis=1).join(crime['Role'].str.split(' ', expand=True).stack().reset_index(level=1, drop=True).rename('Role'))

# Check our example from above
crime[crime["Crime"]=="C82"]

Unnamed: 0,Person,Crime,Name,Sex,Role
570,319,C82,GreenByron,M,Suspect
1028,567,C82,OneilLinda,F,Suspect
1120,632,C82,ReddickJohn,M,Victim
1120,632,C82,ReddickJohn,M,Suspect
1365,772,C82,TylerOwen,M,Victim


Now we see that "ReddickJohn" is listed twice, once as a Victim and once as a Suspect. We will need to regenerate the graph to show this update:

In [141]:
G = nx.DiGraph()

for n, r, c, s in zip(crime['Name'],crime['Role'],crime['Crime'],crime['Sex']):
    if n not in G.nodes() and r != 'Witness':
        G.add_node(n, Sex = s, bipartite = 0)
    if c not in G.nodes():
        G.add_node(c, bipartite = 1)
    if r == 'Suspect':
        G.add_edge(n,c)
    elif r == 'Victim':
        G.add_edge(c,n)

print(nx.info(G))

Name: 
Type: DiGraph
Number of nodes: 1259
Number of edges: 1322
Average in degree:   1.0500
Average out degree:   1.0500


In [142]:
n2 = net.Network(height = "800px", width = "100%", notebook = True,
               heading = 'Crimes and People v2', directed = True)

n2.add_nodes(G.nodes())
for u,v in G.edges():
    n2.add_edge(u,v)

n2.show("graph.html")

This looks MUCH better. Let's check for isolates again:

In [143]:
unconn = [n for n in nx.isolates(G)]

isol = [(n, c, r) for n, c, r in zip(crime["Name"],crime["Crime"],crime["Role"]) if c in unconn or n in unconn]

sorted(isol, key = lambda x: x[1], reverse = True)

[]

As we hoped, there are none.