# Lab 07 Tasks - Solution

The dataset used for this lab is based on records concerning leading Irish companies and their directors from 2003 to 2013, which was originally compiled from the Irish Companies Registration Office website by Friel et al (2016). The data was compiled to study the overlap between directorships among many prominent Irish companies, and the frequency of occurrence of “interlocks”, where a director simultaneously sits on more than one company board. 

For analysis purposes, edge lists and company name metadata are provided in text format. Director names are not included here.

In [1]:
import networkx as nx
from networkx.algorithms import bipartite
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

### Task 1

Load the (director, company) numeric pairs from the file *irish-directorates.edges*, and the list of company names from *irish-directorates-companies.txt*. Note that each line number in the second file corresponds to the company number in the first file. 

Now construct a bipartite network so that:

- There are two node sets – directors and companies. You can assume that company names are unique.
- Edges are between directors and companies.

In [2]:
# Load the (director,company) numeric pairs
pairs = []
company_names = []
with open("irish-directorates.edges", "r") as fin:
    for line in fin.readlines():
        parts = line.strip().split(" ")
        if len(parts) != 2:
            continue
        director_num, company_num = int(parts[0]), int(parts[1])
        pairs.append( (director_num, company_num) )
print("Read %d director-company pairs" % len(pairs))

Read 1126 director-company pairs


In [3]:
# Load company names
company_names = []
with open("irish-directorates-companies.txt", "r") as fin:
    for line in fin.readlines():
        company_names.append(line.strip())
print("Read names for %d companies" % len(company_names))

Read names for 91 companies


In [4]:
# Create the bipartite network
b = nx.Graph()
director_nodes, company_nodes = set(), set()
for p in pairs:
    # we don't have the actual names, so create a dummy name
    director = "director-%03d" % p[0]
    # map to the company name
    company = company_names[p[1]-1]    
    # create the nodes, if necessary
    if not director in director_nodes:
        b.add_node(director, bipartite=0)
        director_nodes.add(director)
    if not company in company_nodes:
        b.add_node(company, bipartite=1)
        company_nodes.add(company)    
    # create the edge 
    company_nodes.add(company)
    b.add_edge( director, company )
print("Created bipartite network with %d nodes and %d edges" % ( b.number_of_nodes(), b.number_of_edges() ) )

Created bipartite network with 1099 nodes and 1126 edges


### Task 2

Using the bipartite network from Task 1:
    
- Verify that the network is indeed bipartite.
- Identify the companies with the highest number of director records.
- Identify the number of directors sitting on more than one board.
- Identify the directors sitting on the largest number of company boards.

In [5]:
# Verify it is bipartite
bipartite.is_bipartite(b)

True

In [6]:
# Identify the companies with the highest number of director records
# note: only include the degrees for the company nodes
degrees = dict( b.degree(company_nodes) )
s_deg = pd.Series( degrees )
df_summary = pd.DataFrame( {"degree" : s_deg} )
df_summary.sort_values( by="degree", ascending=False ).head(10)

Unnamed: 0,degree
Glanbia Plc,43
Bank Of Ireland,42
Kerry Group Plc,40
Independent News & Media Plc,37
Elan Corporation Plc,32
Donegal Investment Group Plc - Esm,31
Crh Plc,28
Allied Irish Banks Plc,25
Waterford Wedgwood Plc,24
Readymix Plc,24


In [7]:
# Identify the number of directors sitting on more than one board
# note: only include the degrees for the director nodes
count = 0
for node, deg in b.degree(director_nodes):
    if deg > 1:
        count += 1
print("%d directors sit on more than one board" % count)

92 directors sit on more than one board


In [8]:
# Identify the directors sitting on the largest number of company boards
director_deg = dict(  b.degree(director_nodes) )
s_deg = pd.Series( director_deg )
df_summary = pd.DataFrame( {"degree" : s_deg} )
df_summary.sort_values( by="degree", ascending=False ).head(10)

Unnamed: 0,degree
director-089,5
director-462,5
director-478,4
director-816,4
director-904,4
director-199,3
director-445,3
director-512,3
director-577,3
director-828,3


### Task 3

Using the bipartite network from Task 1, create a new weighted projected network such that:

- Each node is a company.
- Each edge indicates the number of directors which sit on the boards of two companies (i.e. the number of "co-directorships").

In [9]:
g = bipartite.weighted_projected_graph(b, company_nodes)
print ("Created projected network %d nodes, %d edges" % ( len(g), g.number_of_edges() ))

Created projected network 90 nodes, 130 edges


### Task 4

Using the projected network from Task 3, identify the most frequent company overlaps in the network, and plot the edge weight distribution for the network.

In [10]:
# get all the edge weights
weights = {}
for e in g.edges(data=True):
    pair = (e[0],e[1])
    weights[pair] = e[2]["weight"]
s_weights = pd.Series( weights ) 
df_weights = pd.DataFrame( {"weight" : s_weights} )
# display the most frequent overlaps (i.e. top weights)
df_weights.sort_values(by="weight",ascending=False).head(10)

Unnamed: 0,Unnamed: 1,weight
Conroy Gold & Natural Resources Plc - Esm,Karelian Diamond Resources Plc - Esm,5
Total Produce Plc - Esm,Fyffes Plc - Esm,3
Norkom Group Plc - Esm,Tvc Holdings Plc - Esm,3
Origin Enterprises Plc - Esm,Iaws Group Plc,3
Allied Irish Banks Plc,Crh Plc,2
United Drug Plc,Kerry Group Plc,2
Allied Irish Banks Plc,Irish Continental Group Plc,2
Crh Plc,Bank Of Ireland,2
Bank Of Ireland,Elan Corporation Plc,2
Total Produce Plc - Esm,Balmoral International Land Plc - Esm,2
