# Oil & Gas as a Social Network

Almost all data types that we work with have many to many relationships.
* A well top has relationships to biostratigraph picks, wells, and seismic surfaces.
* A seismic cube has relationships to a processing report, a velocity cube, and the wells inside them.
* A block has relationships to companies, lease terms, and the basin it sits within.

Graphs are a way to show these relationships and should be used to supplement maps, wellsections, etc. so that we can see the larger story of how different pieces of data are connected.

As an example, let's look at the results of the latest lease sale in the U.S. GOM.  Different companies can bid large sums of money together or separately on different blocks all of which isn't immediately apparent when looking at a map of thousands of little squares. 

Concepts and code for this exercise taken from:
https://github.com/rtidatascience/connected-nx-tutorial

![alt text](../images/gom_ls_252_map.png)

### Step 1. Load and Connect Data
Lease sale data is available from the US BOEM:</br>
https://www.boem.gov/Sale-252/</br>
However, these data are a __mess!__  Data is divided into 5 different files with different names and indexes.  It takes a bit of study to connect them all into a single dataframe.  Let's start:

In [None]:
#import libraries

import numpy as np
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from matplotlib import cm

The first table contains the bids, companies (as a code) and blocks (as a lease number).

In [None]:
#names for columns
names = ['sale', 'lease', 'bid_amount', 'company_num', 'precentage_bid']

#load data
bids = pd.read_csv('../data/BID.txt', sep='\s+', names=names)

#clean lease name to match with another table
bids['lease']=bids['lease'].str[1:].astype(int)

#show first 5 rows
bids.head()

The next table connects the company name to company code.  We'll only need a portion of this dataframe to build the graph.

In [None]:
#load data
companies = pd.read_csv('../data/Company2.TXT', usecols=[0], header=None)

#rename columns
companies['company_num']=companies[0].str[:6].astype(int)
companies['company']=companies[0].str[6:]

#remove redundant columns
companies = companies.drop([0], axis=1)

#show first 5 rows
companies.head()

The last table to load allows us to connect their lease number to a block name.

In [None]:
#names of columns
pnames = ['lease', 'protraction_id', 'block_num', 'acreage', 'size', 'percentage', 'rcode', 'term', 'number_bids']

#load data
prebid = pd.read_csv('../data/PREBID.txt', sep='\s+', names=pnames, index_col=None)

#show first 5 rows
prebid.head()

### Step 2. Merge to Single Table
One of Pandas strengths is merging, joining, and concatenating different dataframes.  We'll merge files twice: 
1. Bids to Company to tie company name to their bid amounts and blocks
2. Bids to Blocks to tie official block names to companies and bid amounts

In [None]:
#Merge bids and companies to get official names
dfa = pd.merge(bids, companies, on=['company_num', 'company_num'])

#show first 5 rows
dfa.head()

In [None]:
#Merge bids and blocks
dfb = pd.merge(prebid[['lease','protraction_id', 'block_num']], dfa)

#Create single column for block name
dfb['block_code']=dfb.protraction_id+' - '+dfb.block_num

#Convert bid amount to $mlns
dfb['bid_amount']=dfb['bid_amount']/1000000

#show first 5 rows
dfb.head()

We are ready to go.

### Step 3. Build the Graph

We'll be using two different libraries here:
* Networkx - the engine to build and analyze graphs
* Holoviews - creates nice looking, interactive plots in Jupyter environment

First let's build the graph in a series of steps:
1. Create list of nodes.
2. Create attributes for those nodes which will aid in visualization.
3. Connect edges to nodes and create edge attributes directly from Pandas

In [None]:
#Build lists of nodes
comps = list(set(dfb.company.values))
blocks =  list(set(dfb.block_code.values))

In [None]:
#Create dictionaries of attributes for each node

#For Companies
cdicts = {}
cvalues = {"status":"company"}
for i in comps:
    cdicts[i]=cvalues

#For Blocks
bdicts = {}
bvalues = {"status":"block"}
for i in blocks:
    bdicts[i]=bvalues

In [None]:
#Build Edges from Pandas
G = nx.from_pandas_edgelist(dfb, 'block_code', 'company', edge_attr=['bid_amount', 'precentage_bid'])

#Add node attributes
nx.set_node_attributes(G, cdicts)
nx.set_node_attributes(G, bdicts)

#Metrics of graph
print(nx.info(G))

Before we plot the big graph let's see what we built on a subgraph to understand what data is available.

In [None]:
#Generate small dataset
focus_exploration = dfb[dfb['company'].str.contains('Focus')]
focus_exploration

In [None]:
H = nx.from_pandas_edgelist(focus_exploration, 'block_code', 'company', edge_attr=['bid_amount', 'precentage_bid'])
pos= nx.spring_layout(H)
nx.draw(H, pos, with_labels=True)
nx.draw_networkx_edge_labels(H, pos,
                             edge_labels={('Focus Exploration','LA10A - 295'):0.1824, 
                                          ('Focus Exploration', 'LA6A - 267'):0.1650})
plt.show()

This is a graph of Focus Explorations bids. They were the only bidders on two blocks 295, 267 and spend `$`0.18, `$`0.16 mln respectively.  


### Step 4. Draw Lease Sale 252 Graph

In [None]:
#Load another set of libraries for visualization

import holoviews as hv
from holoviews import opts, dim
import networkx as nx

import hvplot.networkx as hvnx

In [None]:
#Create graph with holoviews

kowargs = dict(width=900, height=900, xaxis=None, yaxis=None)
opts.defaults(opts.Nodes(**kowargs), opts.Graph(**kowargs))

color_cycle = hv.Cycle(['red', 'blue']).values
graph = hv.Graph.from_networkx(G, nx.layout.kamada_kawai_layout)

graph.opts(cmap=color_cycle, node_size=10, edge_line_width=hv.dim('bid_amount')/2,
              node_line_color='gray', node_color="status")

#### What is this graph telling us?  Total vs. Fieldwood
![alt text](../images/graph_explain.png)

In this case Total bid quite a bit more for blocks 693 and 737 and won them.  There are a lot more interesting relationships in this graph take a moment to explore more.

#### Building a true Social Network
The graph above shows the relationships between companies and blocsk but doesn't capture who was bidding with who.  Let's build a second graph but with only this information.  This new graph will need to be built from the original DataFrame, but instead of showing all these steps I've created a __.py__ file.  This is a text file that has a block of code that will be all run at once.  This is a quick and useful way to share code with others instead of them having open Jupyter.

In [None]:
#run a .py file, creating a new graph called "H"
%run -i "cobidders.py"

In [None]:
#generate plot of H
cobidders = hv.Graph.from_networkx(H, nx.layout.circular_layout)
cobidders.opts(node_size=25, edge_line_width=hv.dim('count'), node_line_color='gray')

This graph is being plotted in a circular layout to show relationships better.  The width of the line is a count of how many blocks were bid on by the partnership.  Beacon and Houston Energy have the thickest line (8 blocks total).  Shell only bid 100% for blocks so is not in this graph.

### Step 5. Analysis of the Graphs

Graphs aren't just used for pretty diagrams, the connections can be analyzed by different techniques to describe relationships and groups of similiar companies.  Let's experiment with three types of analyses:
1. Degree Centrality
2. Shortest Path
3. Modularity

#### Degree Centrality
Centrality is a measure for each node that says what fraction of nodes within the graph it is connected to.  We can use this figure out who were the major and minor players without having to look at the graph or table.

In [None]:
#Calculate degree centrality
G_degree_centralilty = nx.degree_centrality(G)

#Dataframe of top 5
dG = pd.DataFrame.from_records(sorted(G_degree_centralilty.items(), key=lambda x: x[1], reverse=True)[:5], columns=['company', 'degree_cent'])

#Quick plot of top 5
ax = dG.plot.barh(x='company', y='degree_cent')

This plot effectively shows the top 5 bidders for LS 252, as the graph is only company-block.  The Social network should show us a different set of players however.

In [None]:
#Calculate degree centrality
H_degree_centralilty = nx.degree_centrality(H)

#Dataframe of top 5
dH = pd.DataFrame.from_records(sorted(H_degree_centralilty.items(), key=lambda x: x[1], reverse=True)[:5], columns=['company', 'degree_cent'])

#Quick plot of top 5
ax = dH.plot.barh(x='company', y='degree_cent')

These five companies are the most connected and all have the same number of relationships.  This graph speaks to the different strategies that companies have in the GOM: Shell goes it alone; Beacon loves a party.

#### Shortest Path

This is the same concept as Six Degrees of Kevin Bacon, except now we are looking for the nodes (companies or blocks) that are furthest away from a given node, in this case Shell.

In [None]:
Shell_shortest_path = nx.shortest_path_length(G, target='Shell Offshore Inc.', weight=None)
sorted(Shell_shortest_path.items(), key=lambda x: x[1], reverse=True)[:10]

#Dataframe of top 5
dH = pd.DataFrame.from_records(sorted(Shell_shortest_path.items(), key=lambda x: x[1], reverse=True)[:10], columns=['node', 'Number_of_hops'])

#Quick plot of top 5
ax = dH.plot.barh(x='node', y='Number_of_hops')

This chart tells us that the blocks of interest for Ridgewood, Venari, and Talos are furthest away from Shell.  Or to put it another way, we are after different things from each other.

#### Communities

Another tool that graphs provide is identificaiton of communities, where nodes have similiar connections even though they may not be directly connected.  In the case of company-block graph, communities would show which companies are interested in which kinds of blocks (i.e. play).  

In [None]:
#import community algorithms
from networkx.algorithms.community import greedy_modularity_communities

#generate modulatiry dictionary per node
mod_communities=greedy_modularity_communities(G)

In [None]:
def map_communities(G, communities):
    """Return a mapping of community membership from a community set tuple"""
    community_map = {}
    for node in G.nodes():
        for i, comm in enumerate(communities):
            if node in comm:
                community_map[node] = i
        if community_map.get(node, None) is None:
            community_map[node] = None
    return community_map

In [None]:
#Map each node to its community
community_map = map_communities(G, mod_communities)

#Add community to attribute of node
nx.set_node_attributes(G, community_map, 'community')

In [None]:
#Create graph with holoviews of communities

kowargs = dict(width=900, height=900, xaxis=None, yaxis=None)
opts.defaults(opts.Nodes(**kowargs), opts.Graph(**kowargs))

graph = hv.Graph.from_networkx(G, nx.layout.spring_layout)
graph.opts(cmap=cm.tab20c, node_size=10, edge_line_width=hv.dim('bid_amount')/2,
              node_line_color='gray', node_color="community")

In the plot above we see that Shell is so distinct from other companies that it forms its own community.  Chevron, Total, Fieldwood (orange) all have interest in the same blocks so they form another community.