# Lab 10: Social Network Analysis
## Sam Bacon - March 22, 2021
#### Building Social Networks (Society of Friends Activity)

In [None]:
# Install packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import networkx as nx

In [1]:
# Create visualization
G = nx.Graph()

# Add nodes
G.add_nodes_from(nodes)

# Add edges
my_edges = []
for i, row in edges.iterrows():
    my_edges.insert(2,(row['Source'], row['Target']))

G.add_edges_from(my_edges)

# Specifications
plt.figure(figsize=(15, 15))
pos = nx.spring_layout(G, k=1.8) 
nx.draw(G, pos=pos, with_labels=True,
        node_color='#33ddff', node_size=500,
        edge_color='grey', width=0.5)
plt.show()

NameError: ignored

# Responses

Based on the visualization, there are a few key takeaways about the early Quaker social network. The network appears to consist of a few "highly connected" people, but the overwhelming majority of people do not have many connections. The visualization also suggests that this network is rather sparse, because the number of edges is very small compared to the amount of possible edges. A dense network would have significantly more edges. Lastly, there appear to be a few individuals who know people from all different areas of the community because they are located more centrally in the web. 

In [None]:
# Calculating density

density = nx.density(G)
print("Network density:", density)

Network density: 0.022451612903225806


The density of this network is 0.0225, which is quite sparse. A value of 0 means there are no connections, and 1 indicates that every potential connections exists. Obviously, 0.0225 is relatively low.

In [None]:
# 5 most connected people

degree_dict = dict(G.degree(G.nodes()))
nx.set_node_attributes(G, degree_dict, 'degree')
sorted_degree = sorted(degree_dict.items(), key=itemgetter(1), reverse=True)

print("Top 5 most connected people:")
for d in sorted_degree[:5]:
    print(d)

Top 5 most connected people:
('George Fox', 22)
('William Penn', 18)
('James Nayler', 16)
('Margaret Fell', 13)
('George Whitehead', 13)


The five most connected individuals are Fox, Penn, Nayler, Fell, and Whitehead.

In [None]:
# Howgill - Leavens shortest path

howgill_leavens_path = nx.shortest_path(G, source="Francis Howgill", target="Elizabeth Leavens")
print("Shortest path between Howgill and Leavens:", howgill_leavens_path)
print("Length:", len(howgill_leavens_path)-1)

Shortest path between Howgill and Leavens: ['Francis Howgill', 'Richard Farnworth', 'Margaret Fell', 'Elizabeth Leavens']
Length: 3


In [None]:
# Degree vs. Betweenness Centrality

# Degree
print("Top 20 nodes by degree:")
for d in sorted_degree[:20]:
    print(d)

print()

# Betweenness Centrality
betweenness_dict = nx.betweenness_centrality(G) # Run betweenness centrality
nx.set_node_attributes(G, betweenness_dict, 'betweenness')
sorted_betweenness = sorted(betweenness_dict.items(), key=itemgetter(1), reverse=True)

print("Top 20 nodes by betweenness centrality:")
for b in sorted_betweenness[:20]:
    print(b)


Top 20 nodes by degree:
('George Fox', 22)
('William Penn', 18)
('James Nayler', 16)
('Margaret Fell', 13)
('George Whitehead', 13)
('Benjamin Furly', 10)
('Edward Burrough', 9)
('George Keith', 8)
('Thomas Ellwood', 8)
('John Perrot', 7)
('Francis Howgill', 7)
('John Story', 6)
('Alexander Parker', 6)
('Richard Farnworth', 6)
('John Audland', 6)
('Anthony Pearson', 5)
('Thomas Curtis', 5)
('William Caton', 5)
('John Wilkinson', 5)
('John Stubbs', 5)

Top 20 nodes by betweenness centrality:
('William Penn', 0.21724133859263658)
('George Fox', 0.2143791346486075)
('George Whitehead', 0.11434417456250663)
('Margaret Fell', 0.10958980699342626)
('James Nayler', 0.09455667376595782)
('Benjamin Furly', 0.058109991459716084)
('Thomas Ellwood', 0.041811418394817314)
('George Keith', 0.040739615965815816)
('John Audland', 0.03770070227583995)
('Alexander Parker', 0.035245274584377644)
('John Story', 0.026241627431635318)
('John Burnyeat', 0.026227161465163042)
('John Perrot', 0.025613034356150

There are multiple people who appear on the betweenness centrality list who are not on the degree list. Degree is simply a measure of how many edges branch out from a specific node. In the context of this network, it indicates how many people a specific person knew. Betweenness centrality, on the other hand, is a numerical indication of how many degrees of separate someone from every other person in the network. Therefore, it makes sense for some people to have high scores in both categories, but one can have a high betweenness centrality without a high degree. A person who knows a few highly-conncted people could have a very high betweenness centrality and a relatively low degree. 

Elizabeth Leavens epitomizes the scenario that I described above. She has a low degree, meaning that she does not know many people. However, because of her high betweenness centrality, it is evident that the people she does know are highly connected. I'm willing to bet that she knows Penn, Fox, or both.

