__Triadic Closure__ : The tendency for people who share connections in a social network to become connected.


__Clustering Coefficient__: Measures the degree to which nodes in a network tend to "cluster" or form triangles.

__Robust Networks__: have large minimum node and edge cuts

## Sources

- https://jakevdp.github.io/PythonDataScienceHandbook/04.13-geographic-data-with-basemap.html
- https://stackoverflow.com/questions/19915266/drawing-a-graph-with-networkx-on-a-basemap

- https://rabernat.github.io/research_computing/intro-to-basemap.html
- https://matplotlib.org/stable/gallery/text_labels_and_annotations/custom_legends.html

- https://matplotlib.org/stable/tutorials/colors/colormaps.html

- https://stackoverflow.com/questions/30914462/matplotlib-how-to-force-integer-tick-labels


- https://www.tutorialspoint.com/how-to-insert-a-small-image-on-the-corner-of-a-plot-with-matplotlib

- https://stackoverflow.com/questions/2553521/setting-axes-linewidth-without-changing-the-rcparams-global-dict

Install basemap on Colab

```
apt-get install -q libgeos
apt-get install -q libgeos-dev
pip install -q https://github.com/matplotlib/basemap/archive/master.zip
pip install -q pyproj
```

In [1]:
### installations

# !apt-get install -q libgeos
# !apt-get install -q libgeos-dev
# !pip install -q https://github.com/matplotlib/basemap/archive/master.zip
# !pip install -q pyproj
#!pip install pycountry-convert

In [2]:
# imports
import os
import pandas as pd
import numpy as np
from scipy import io


import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.lines import Line2D

from matplotlib import pyplot, patches
from mpl_toolkits.basemap import Basemap as Basemap
from matplotlib.ticker import MaxNLocator

import networkx as nx
from networkx.algorithms import bipartite

import missingno as msno  
import seaborn as sns
sns.set_context('talk')


%matplotlib inline

In [3]:
## specify current directory

running_in_drive = True
if running_in_drive:
  os.chdir("/content/drive/MyDrive/GA/capstone/code")

# environment variables

DATA_PATH = '../data'
ORIGINAL_DATA_PATH = f'{DATA_PATH}/original/csv_offshore_leaks'
CLEAN_DATA_PATH = f'{DATA_PATH}/clean'
NODE_EDGES_PATH = f'{CLEAN_DATA_PATH}/nodes_edges'
GRAPH_PATH = f'{CLEAN_DATA_PATH}/graphs'


MD_PATH = '../presentation/tables'
IMAGE_PATH = '../presentation/images'
IMAGE_PATH_MAPS = f'{IMAGE_PATH}/maps'
IMAGE_PATH_GRAPHS = f'{IMAGE_PATH}/graphs'
IMAGE_PATH_DEGREES = f'{IMAGE_PATH}/degrees'

In [4]:
%%writefile tools/graph.py

### A class including methods for handling graphs

import pandas as pd
import numpy as np
from scipy import io


import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.lines import Line2D

from matplotlib import pyplot, patches
from mpl_toolkits.basemap import Basemap as Basemap
from matplotlib.ticker import MaxNLocator

import networkx as nx
from networkx.algorithms import bipartite

import missingno as msno  
import seaborn as sns
sns.set_context('talk')

IMAGE_PATH = '../presentation/images'
IMAGE_PATH_MAPS = f'{IMAGE_PATH}/maps'
IMAGE_PATH_GRAPHS = f'{IMAGE_PATH}/graphs'
IMAGE_PATH_DEGREES = f'{IMAGE_PATH}/degrees'


def change_width(ax, new_value) :
    """Take from https://newbedev.com/changing-width-of-bars-in-bar-chart-created-using-seaborn-factorplot"""
    try:
      for patch in ax.patches :
          current_width = patch.get_width()
          diff = current_width - new_value

          # we change the bar width
          patch.set_width(new_value)

          # we recenter the bar
          patch.set_x(patch.get_x() + diff * .5)
    except Exception as e:
      print(e)


class GraphH:
  def __init__(self):
    pass

  def get_local_clustering_coefficient(self, G, node):
    """
    Number of paris of a node's friends who are friends divided by 
    the number of node's friends.
    """
    coef = nx.clustering(G, node)
    if coef == 0:
      print(f'Local Clustering Coefficient of node "{node}" is {coef}')
    else:
      print(f'Local Clustering Coefficient of node "{node}" is {coef: .2e}')
    return coef


  def get_global_clustering_coefficient(self, G):
    """
    measures the tendency for edges to form traingles
    """
    coef = nx.average_clustering(G)
    print(f'Global Clustering Coefficient is {coef: .2e}')
    return coef


  def get_transitivity(self, G):
    """
    A measure of average clustering coefficient
    ratio of number of traingles and number of open triads in a network.
    
    measures the tendency for edges to form traingles

    weights nodes with large degree higher (compared to average_clustering)
    """
    transitivity = nx.transitivity(G)
    print(f'Global Clustering Coefficient - Transitivity is {transitivity: .2e}.')
    return transitivity


  def compare_transitivity_avgclusteringcoef(self, transitivity, avg_clustering_coef):
    ratio = avg_clustering_coef/transitivity 
    print(f'Ratio of Average clustering coefficient to Transitivity is {round(ratio, 2)}.')
    if ratio > 1:
      print('\tMost nodes have high LCC (local clustering coefficient).')
      print('\tThe high degree node has low LCC (Local clustering coefficient).')
    else:
      print('\tMost nodes have low LCC (local clustering coefficient).')
      print('\tThe high degree node has high LCC (Local clustering coefficient).')

  def check_if_bipartite(self, edges):
    B = nx.Graph()
    start_nodes = edges['START_ID'].values.tolist()
    end_nodes = edges['END_ID'].values.tolist()
    edge_values = edges[['START_ID', 'END_ID']].values.tolist()
    B.add_nodes_from(start_nodes, bipartite=0)
    B.add_nodes_from(end_nodes, bipartite=1)
    B.add_edges_from(edge_values)
    is_bipartite = bipartite.is_bipartite(B)
    print('is bipartite: ', is_bipartite)
    common = len(set(start_nodes).intersection(set(end_nodes)))
    print(f'There are {common:,} common nodes among start and end nodes.')
    return B


  def plot_bfs_tree(self, G, key, alpha = 1, with_labels=True):
    plt.figure()
    g = nx.bfs_tree(G, key)
    color_map = ['red' if node == key else '0.7' for node in g]  
    nx.draw(g, with_labels=with_labels, node_size=1200, node_color=color_map, alpha=alpha)
    plt.title(f'Node with degree {degrees[key]}');


  def get_average_shortest_path_length(self, G):
    try:
      l = nx.average_shortest_path_length(G)
      print(f'Avereage distance between every pair of nodes {l: .2f}')
    except Exception as e:
      print('average shortest path length:', e)

  def get_maximum_distance_between_any_pair_of_nodes(self, G):
    try:
      diameter = nx.diameter(G)
      print(f'Diameter - maximum distance between any pair of nodes {diameter}.')
    except Exception as e:
      print('diameter:', e)


  def get_largest_distance_between_node_and_all_other_nodes(self, G):
    try:
      eccentricity = nx.eccentricity(G)
      print(f'Eccentricity - the largest distance between node and all other nodes')
      return eccentricity
    except Exception as e:
      print('eccentricity:', e)

  def get_radius_minimum_eccentricity(self, G):
    try:
      radius = nx.radius(G)
      print(f'Radius - the minimum eccentricity is {radius}.')
    except Exception as e:
      print('radius:', e)


  def get_periphery(self, G):
    try:
      periphery = nx.periphery(G)
      print(f'The periphery of a graph - the set of nodes othat have eccentricity equal to the diameter.')
      return periphery
    except Exception as e:
      print('periphery:', e)

  def get_center_of_graph(self, G):
    try:
      center = nx.center(G)
      print(f'The center of a graph - the set of nodes othat have eccentricity equal to the radius.')
      return center
    except Exception as e:
      print('center:', e)

  #H = nx.convert_node_labels_to_integers(G, first_label=1)
  def get_number_of_connected_components(self, G):
    if G.is_directed():
      n = nx.number_connected_components(G.to_undirected())
    else:
      n = nx.number_connected_components(G)

    print(f'Number of connected components in this graph are {n}.')


  def get_connected_components_sets_of_nodes(self, G):
    if G.is_directed():
      return sorted(nx.weakly_connected_components(G))
    else:
      return sorted(nx.connected_components(G))

  def get_connected_components_to_a_node(self, G, node):
    if G.is_directed():
      return nx.node_connected_component(G.to_undirected(), node)
    else:
      return nx.node_connected_component(G, node)


  def is_graph_strongly_connected(self, G):
    """
    A directed graph is strongly connected if for every pair nodes u & v, there is
    a directed path from u to v and a directed path from v to u.
    """
    return nx.is_strongly_connected(G)


  def is_graph_weakly_connected(self, G):
    """
    A directed graph is weakly connected if replacing all directed edges with 
    undirected edges produces a connected undirected graph.
    """
    return nx.is_weakly_connected(G)



  def general_checks(self, G, extended=False):
    num_nodes = G.number_of_nodes()
    num_edges = G.number_of_edges()

    print(f'Graph has {num_nodes:,} nodes and {num_edges:,} edges.')
    if extended:
      print(f'The graph is{"" if G.is_directed() else " NOT"} directed.')
      if not G.is_directed():
        print(f'The graph is{"" if G.is_connected() else " NOT"} connected.')
      print(f'The graph is{"" if G.is_multigraph() else " NOT"} a multigraph.')

      if G.is_directed():
        print(f'The graph is{"" if nx.is_strongly_connected(G) else " NOT"} strongly connected.')
        print(f'The graph is{"" if nx.is_weakly_connected(G) else " NOT"} weakly connected.')



  def get_largest_connected_component(self, G):
    largest_cc = max(nx.weakly_connected_components(G), key=len)
    return G.subgraph(largest_cc)


  def plot_degree_distribution(self, G, degrees, name='Degrees', figsize=(8,5), link='', extra=''):
    L = list(degrees.values())
    degrees_df = pd.DataFrame(L, columns=[name])
    plt.figure(figsize=figsize)
    g = sns.countplot(x=name,
                      data = degrees_df)
    x = list(set(L))
  
    if len(x) > 10:
      g.set_yscale("log")
      nint = 5
      xticks = np.linspace(0, round(len(x), -int(np.log10(len(x)))), nint)
      xticklabels = map(int, np.linspace(min(set(L)), round(max(L), -int(np.log10(max(L)))), nint))
      

    else:
      xticks = range(len(x))
      xticklabels = sorted(list(set(L)))


    g.set_xticks(xticks)
    g.set_xticklabels(xticklabels)

    if link!='':
      g.set_title(f'Distribution of Node {name} - {link.title()}')
    else:
      g.set_title(f'Distribution of Node {name}')
    sns.despine()
    plt.savefig(f'{IMAGE_PATH_DEGREES}/distribution_degrees_{name.replace("-", "_")}_{link.replace("/", "_or_")}_{extra}.png', dpi=300, bbox_inches='tight')
    

  def node_connectivity(self, G):
    if G.is_directed():
      n = nx.node_connectivity(G.to_undirected())
    else:
      n = nx.node_connectivity(G)

    ## nodes to remove to disconnect the graph
    # nx.minimum_node_cut(G.to_undirected())
    print(f"""The smallest number of nodes that can be removed from the graph 
    in order to disconnect it is {n}""")

  def edge_connectivity(G):
    if G.is_directed():
      n = nx.edge_connectivity(G.to_undirected())
    else:
      n = nx.edge_connectivity(G)

    ## edges to remove to disconnect the graph
    # nx.minimum_edge_cut(G.to_undirected())
    print(f"""The smallest number of edges that can be removed from the graph 
    in order to disconnect it is {n}""")


  def my_adjacency_matrix(self, G, name, figsize=(5,5)):
    A = nx.adjacency_matrix(G)
    f, ax = plt.subplots(figsize=figsize)
    with sns.axes_style("white"):
      g = sns.heatmap(A.todense(), square=True, cbar=False, cmap='Greys', vmin=0, vmax=1)
      g.set_xticks([])
      g.set_yticks([])
      title_name = name.replace('_', ' ').title()
      g.set_title(f'Adjacency Matrix - {title_name}')
      sns.despine(left=False, right=False, top=False, bottom=False)
    plt.savefig(f'{IMAGE_PATH_GRAPHS}/adjacency_matrix_{name}_color.png', dpi=300, bbox_inches='tight')



  def get_graph(self, nodes, edges, return_orig = False):
    col_edges_conv = {'START_ID': 'source', 
                    'END_ID': 'target',
                    'link': 'Type',
                    'active_days': 'Active Days'
                    }
    col_nodes_conv = {'country': 'Country', 
                      'continent': 'Region',
                      'jurisdiction': 'Jurisdiction', 
                      'service_provider': 'Service Provider',
                      'company_type': 'Company Type'}

    edges_cols = ['START_ID', 'END_ID', 'active_days', 'link']
    nodes_cols = ['node_id', 'country', 'continent','jurisdiction', 'service_provider', 'company_type', 'location']
    linkData = edges[edges_cols].rename(columns = col_edges_conv)
    nodeData = nodes[nodes_cols].rename(columns = col_nodes_conv )
    bigG = nx.from_pandas_edgelist(linkData, 'source', 'target', True, nx.DiGraph())
    nx.set_node_attributes(bigG, nodeData.set_index('node_id').to_dict('index'))
    if return_orig:
      # self.general_checks(bigG)
      return bigG
    
    G = self.get_largest_connected_component(bigG)
    # self.general_checks(G)
    return G

  def plot_location_layout_link(self, G, link, figsize=(12, 8)):
    # max_degree = max(dict(G.to_undirected().degree()).values())
    # node_size = [1000*G.degree(v)/max_degree for v in G]
    def get_width(d, field, default):
      value = d[field]
      if np.isnan(value):
        return default
      return value
      
    # edge_width = [get_width(G[u][v], 'Active Days', 1) for u,v in G.edges()]
    # max_edge_width = max(edge_width)
    # edge_width = [10* e/max_edge_width for e in edge_width]

    fig = plt.figure(num=None, figsize=figsize)
    m = Basemap(projection='cyl',llcrnrlat=-60, urcrnrlat=90,llcrnrlon=-180, urcrnrlon=180,resolution='c')
    pos = nx.get_node_attributes(G, 'location')
    nx.draw_networkx(G, pos, with_labels=False, alpha=.3, edge_color='.3', cmap=plt.cm.Blues, 
                    #  node_size=node_size, 
                    #  width=edge_width
                    )
    plt.title(f'Edge Type: "{link.title()}"')
    plt.axis('off');
    m.drawcoastlines(linewidth=.15)
    plt.tight_layout();
    plt.savefig(f'{IMAGE_PATH_GRAPHS}/graphs_{link.replace(" ", "_")}_locations_map.png', dpi=300, bbox_inches='tight')

  def plot_layouts(self, G, link):
    layouts = ['circular_layout',
    'kamada_kawai_layout',
    'random_layout',
    'shell_layout',
    'spring_layout',
    'spectral_layout',
    'fruchterman_reingold_layout',
    'spiral_layout']


    for layout in layouts:
      try:
        pos = eval(f'nx.{layout}')(G)
        plt.figure(figsize=(10, 9))
        nx.draw_networkx(G, pos, with_labels=False, alpha=.4)
        plt.title(f'Edge Type: "{link.title()}"  ({layout.replace("_", " ").replace("layout", "")})')
        plt.axis('off');
        plt.savefig(f'{IMAGE_PATH_GRAPHS}/graphs_{link.replace(" ", "_").replace("/", "_or_")}_{layout}.png', dpi=300, bbox_inches='tight')
      except Exception as e:
        print(layout, e)

    plot_location_layout_link(G, link) 



  def plot_location_layout_general(self, nodes, edges, links, figsize=(12, 8), test=False, extra=''):
    fig = plt.figure(num=1, figsize=figsize)

    viridis = cm.get_cmap('tab20', len(links))
    colors = viridis(np.linspace(0, 1, len(links)))

    custom_lines = []
    legend_text = []

    m = Basemap(projection='cyl',llcrnrlat=-60, urcrnrlat=90,llcrnrlon=-180, urcrnrlon=180,resolution='c')
    m.drawcoastlines(linewidth=.15) 
    plt.axis('off');
    plt.tight_layout();
    line_width = 0.05
    if not test:
      c = 0
      for link, color in zip(links, colors): 
        c +=1
        legend_text.append(link)
        custom_lines.append(Line2D([0], [0], color=color, lw=2)) 
        sub_edges = edges[edges['link'] == link]
        G = self.get_graph(nodes, sub_edges)
        max_degree = max(dict(G.to_undirected().degree()).values())
        node_size = [500*G.degree(v)/max_degree for v in G]
        pos = nx.get_node_attributes(G, 'location')
        nx.draw_networkx(G, pos, with_labels=False, alpha=.2, edge_color='.3', cmap=plt.cm.Blues, 
                        width=line_width,
                        node_size=node_size,
                        )
        link_arr = [x for x in G.edges(data=True) if x[2]['Type']==link]
        nx.draw_networkx_edges(G, pos, edgelist=link_arr, edge_color=color, alpha=0.2, width=line_width)
        plt.legend(custom_lines, links, bbox_to_anchor=(1, 1), loc='upper left', frameon=False)
        plt.savefig(f'{IMAGE_PATH_GRAPHS}/graphs_locations_map_{len(links)}_c{c}_{extra}.png', dpi=300, bbox_inches='tight')

    if not test:
      plt.savefig(f'{IMAGE_PATH_GRAPHS}/graphs_locations_map_{len(links)}_mod_{extra}.png', dpi=300, bbox_inches='tight')
    else:
      custom_lines = [Line2D([0], [0], color=color, lw=4) for color in colors]
      legend_text = links
      plt.legend(custom_lines, links, bbox_to_anchor=(1, 1), loc='upper left', frameon=False)



  def get_degree_df(self, G):
    df = pd.DataFrame(list(dict(G.degree()).values()), columns=['Degree'])
    df_in = pd.DataFrame(list(dict(G.in_degree()).values()), columns=['In-Degree'])
    df_out = pd.DataFrame(list(dict(G.out_degree()).values()), columns=['Out-Degree'])
    return {'Degree': df, 'In-Degree': df_in, 'Out-Degree': df_out }

  def plot_degree_distributions_subplots(self, G, figsize=(20, 5), suptitle='', extra='', ind='',
                                         to_save=True):
    degree_df_dicts = self.get_degree_df(G)

    fig, axs = plt.subplots(nrows=1, ncols=len(degree_df_dicts), sharey=True,figsize=figsize)

    uniques = {name: df[name].unique() for name, df in degree_df_dicts.items()}

    log_y = True if degree_df_dicts['Degree']['Degree'].value_counts().head(1).values[0]>=100 else False

    for ax, (name, df) in zip(axs, degree_df_dicts.items()):
      x = uniques[name]
      g = sns.countplot(x = name, data = df, facecolor=(0, 0, 0, 0), edgecolor=sns.color_palette("colorblind", len(x)), ax=ax)
      
      if len(x) > 10:
        nint = 5
        xticks = np.linspace(0, round(len(x), -int(np.log10(len(x)))), nint)
        xticklabels = map(int, np.linspace(min(x), round(max(x), -int(np.log10(max(x)))), nint))
      else:
        xticks = range(len(x))
        xticklabels = sorted(x)

      if log_y:
        g.set_yscale("log")
      else:
        ax.yaxis.set_major_locator(MaxNLocator(integer=True))

      if name!='Degree':
        ax.set_ylabel('')

      g.set_xticks(xticks)
      g.set_xticklabels(xticklabels)
    plt.suptitle(f'{suptitle}', fontsize=22)
    sns.despine()
    mod_suptitle = suptitle.lower().replace('\n', ' ').replace('/', ' or ').replace(':', '_').replace(' ', '_').replace('__', '_')
    plt.tight_layout();
    fname = f'{mod_suptitle}'
    if ind!='':
      fname = f'{ind:02}_'+fname
    if extra!='':
      fname +=f'_{extra}'
    if to_save:
      plt.savefig(f'{IMAGE_PATH_DEGREES}/distributions_{fname}.png', dpi=300, bbox_inches='tight')
    #print(f'file saved in '+ f'{IMAGE_PATH_DEGREES}/distributions_{fname}{nstr}.png')
  
  def plot_degree_distributions_subplots_normalized(self, G, figsize=(20, 5), suptitle='', extra='', ind='',
                                         to_save=True):
    degree_df_dicts = GraphH().get_degree_df(G)

    ndegree_df_dicts = {name: (pd.DataFrame(df.value_counts(), 
                      columns=['Normalized Count'])/G.number_of_nodes()).reset_index()
                        for name, df in degree_df_dicts.items()}

    fig, axs = plt.subplots(nrows=1, ncols=len(degree_df_dicts), sharey=True,figsize=figsize)

    uniques = {name: df[name].unique() for name, df in degree_df_dicts.items()}


    for ax, (name, df) in zip(axs, ndegree_df_dicts.items()):
      x = uniques[name]
      g = sns.barplot(x = name, data = df, 
                        y = 'Normalized Count',
                        facecolor=(0, 0, 0, 0), 
                        edgecolor=sns.color_palette("colorblind", len(x)), 
                        ax=ax)


      g.set_yscale("log")
      g.set_xscale("log")

      if name!='Degree':
        ax.set_ylabel('')

    plt.suptitle(f'{suptitle}', fontsize=22)
    sns.despine()
    mod_suptitle = suptitle.lower().replace('\n', ' ').replace('/', ' or ').replace(':', '_').replace(' ', '_').replace('__', '_')
    plt.tight_layout();
    fname = f'{mod_suptitle}'
    if ind!='':
      fname = f'{ind:02}_'+fname
    if extra!='':
      fname +=f'_{extra}'
    if to_save:

      plt.savefig(f'{IMAGE_PATH_DEGREES}/distributions_{fname}_norm_loglog.png', dpi=300, bbox_inches='tight')

  def plot_degree_distribution_for_all_links(self, nodes, edges, normalized=True,to_save=False):
    for i, link in enumerate(edges['link'].value_counts().index.tolist()):
      sub_edges = edges[edges['link'] == link]
      print(f'\n{link}')
      G = self.get_graph(nodes, sub_edges, return_orig = True)
      suptitle = f'edge type:\n"{link.title().replace("Of", "of")}"'
      if normalized:
        self.plot_degree_distributions_subplots_normalized(G, suptitle=suptitle,
                                                           extra='orig', ind=i, 
                                                           to_save=to_save)
      else:
        self.plot_degree_distributions_subplots(G, suptitle=suptitle,
                                                           extra='orig', ind=i, 
                                                           to_save=to_save)
        

  def general_plot_degree_distribution(self, nodes, edges, figsize=(20, 5)):
    G = self.get_graph(nodes, edges, return_orig = True)
    suptitle = f'Degree Distribution'
    degree_df_dicts = GraphH().get_degree_df(G)

    ndegree_df_dicts = {name: (pd.DataFrame(df.value_counts(), 
                      columns=['Normalized Count'])/G.number_of_nodes()).reset_index()
                        for name, df in degree_df_dicts.items()}

    fig, axs = plt.subplots(nrows=1, ncols=len(degree_df_dicts), sharey=True,figsize=figsize)

    uniques = {name: df[name].unique() for name, df in degree_df_dicts.items()}


    for ax, (name, df) in zip(axs, ndegree_df_dicts.items()):
      x = uniques[name]
      g = sns.barplot(x = name, data = df, 
                        y = 'Normalized Count',
                        facecolor=(0, 0, 0, 0), 
                        edgecolor=sns.color_palette("colorblind", len(x)), 
                        ax=ax)

      g.set_yscale("log")
      g.set_xscale("log")

      if name!='Degree':
        ax.set_ylabel('')

    plt.suptitle(f'{suptitle}', fontsize=22)
    sns.despine()
    plt.tight_layout();

    plt.savefig(f'{IMAGE_PATH_DEGREES}/distributions_norm_loglog.png', dpi=300, bbox_inches='tight')



  def create_country_weight_graph(self, G):
    field = 'Country'
    out = pd.DataFrame([(G.nodes[u][field], G.nodes[v][field]) for u,v in G.edges()], columns=['source', 'target'])
    out['weight'] = 0
    out = out.groupby(['source', 'target']).count().reset_index()
    g = nx.from_pandas_edgelist(out, 'source', 'target', 'weight', nx.DiGraph())
    g.edges(data=True)
    return g

  def plot_nodes_locations_individual_links(self,nodes, edges, figsize=(12, 8), weakly_connected=False, test=False):
    return_orig = not weakly_connected
    links = edges['link'].value_counts().index.tolist()[-1:1:-1]


    colormaps = cm.get_cmap('tab20', len(links))
    colors = colormaps(np.linspace(0, 1, len(links)))
    line_width = 1
    arrowsize = 20
    node_alpha = .15
    arrow_fancy_alpha = .1
    edge_alpha = .2


    i = len(links)
    for link, color in zip(links, colors): 
      fig = plt.figure(figsize=figsize)
      m = Basemap(projection='cyl',llcrnrlat=-60, urcrnrlat=90,llcrnrlon=-180, urcrnrlon=180,resolution='c')
      m.drawcoastlines(linewidth=.15) 
      m.drawcountries(linewidth=.1)
      plt.axis('off');
      plt.tight_layout();
    

      sub_edges = edges[edges['link'] == link]
      G = self.get_graph(nodes, sub_edges, return_orig = return_orig)
      ###


      if not test:
        ### plot graph
        degree_centralities = nx.in_degree_centrality(G)
        node_size = [degree_centralities[n] for n in G]
        min_s = min(node_size)
        max_s = max(node_size)
        if max_s == min_s:
          r = 0
        else:
          r = int((1000-100)/(max_s-min_s))
        node_size = [r * (s-min_s)+ 100 for s in node_size]


        pos = nx.get_node_attributes(G, 'location')
        nx.draw_networkx(G, pos, with_labels=False, 
                        alpha=node_alpha, 
                        cmap=plt.cm.Blues, 
                        width=0,
                        node_size=node_size,
                        style=":"
                        )
        unique_node_attr = {}
        seen = {}
        for k, v in nx.get_node_attributes(G, 'Country').items():
          if v not in seen:
            seen.update({v: 1})
            unique_node_attr.update({k: v})
          else:
            seen[v]+=1
        seen_sorted = sorted([k for k, v in seen.items()], key=lambda l: l[1], reverse=True)

        l = min(25, len(seen_sorted))

        unique_node_attr = {k: v for k, v in unique_node_attr.items() if v in seen_sorted[:l]}

      
        link_arr = [x for x in G.edges(data=True) if x[2]['Type']==link]
        try:
          nx.draw_networkx_edges(G, pos, edgelist=link_arr, edge_color=color, alpha=arrow_fancy_alpha, width=line_width,arrowsize=arrowsize, arrowstyle='fancy', style=":")
        except Exception as e:
          nx.draw_networkx_edges(G, pos, edgelist=link_arr, edge_color=color, alpha=edge_alpha, width=line_width,arrowsize=arrowsize, style=":")


        nx.draw_networkx_labels(G, pos, labels=unique_node_attr, font_size=8, font_color='k', verticalalignment='center', horizontalalignment='center')


      legend_text = link.title().replace("Of", "of").split('/')[0]
      plt.legend([Line2D([0], [0], color=color, lw=2)], [legend_text],  loc='lower center', frameon=False)
    
      try:
        ### show litte bar plot
        #  [left, bottom, width, height]
        newax = fig.add_axes([.1,.25,.1,0.05], anchor='NE', zorder=1)
        degree_df_dicts = self.get_degree_df(G)
        degree_type = 'In-Degree'
        deg_df = degree_df_dicts[degree_type]
        x = deg_df[degree_type].unique()

        n_deg_df = (pd.DataFrame(deg_df.value_counts(), 
                      columns=['Normalized Count'])/G.number_of_nodes()).reset_index()
        g = sns.barplot(x = degree_type, y = 'Normalized Count', data = n_deg_df, ax=newax, alpha=.5)
        newax.set_ylabel('')
        little_title= f'edges: {G.number_of_edges():,}\nnodes: {G.number_of_nodes():,}'
        newax.set_title(little_title, fontsize=11)
        change_width(newax, .5)
        g.set_xticklabels([])
        g.set_xticks([])
        newax.axes.yaxis.set_visible(False)
        sns.despine(left=True, bottom=False)
        newax.spines['bottom'].set_linewidth(0.5)
        g.set_yscale("log")
        g.set_xscale("log")
        g.set_yticklabels([])
        g.set_yticks([])
        g.set_xticklabels([])
        g.set_xticks([])
        g.tick_params(bottom=False)
        newax.axes.xaxis.set_minor_locator(plt.FixedLocator([]))
        #newax.axes.xaxis.set_visible(False)
        g.set_xlabel(degree_type, fontsize=11)
        
      except Exception as e:
        print(e)
      

      #####

      extra = 'orig' if return_orig else 'wcon'

      fname = f'link_{legend_text.lower().replace(" ", "_")}'
      if extra!='':
        fname = fname+f'_{extra}'



      plt.savefig(f'{IMAGE_PATH_MAPS}/{i:02}_{fname}_loglog.png', dpi=300, bbox_inches='tight')
      i-=1

    return G

Overwriting tools/graph.py


In [5]:
def create_country_weight_graph(G):
  field = 'Country'
  out = pd.DataFrame([(G.nodes[u][field], G.nodes[v][field]) for u,v in G.edges()], columns=['source', 'target'])
  out['weight'] = 0
  out = out.groupby(['source', 'target']).count().reset_index()
  g = nx.from_pandas_edgelist(out, 'source', 'target', 'weight', nx.DiGraph())
  g.edges(data=True)
  return g


def create_location_weight_graph(G):
  field = 'location'
  out = pd.DataFrame([(G.nodes[u][field], G.nodes[v][field]) for u,v in G.edges()], columns=['source', 'target'])
  out['weight'] = 0
  out = out.groupby(['source', 'target']).count().reset_index()
  g = nx.from_pandas_edgelist(out, 'source', 'target', 'weight', nx.DiGraph())
  g.edges(data=True)
  labels = dict(set([(G.nodes[u]['location'], G.nodes[u]['Country']) for u in G]))
  weights = [g[u][v]['weight'] for u,v in g.edges()]
  seen = {}
  for k, v in nx.get_node_attributes(G, 'location').items():
    if v not in seen:
      seen[v]=1
    else:
      seen[v]+=1
  seen_sorted = sorted([k for k, v in seen.items()], key=lambda l: l[1], reverse=True)
  l = min(len(seen_sorted), 2)
  labels = {k: v for k, v in labels.items() if k in seen_sorted[:l]}
  return g, labels

def plot_bubbles_continents():

  field = 'continents'
  prop_name='Region'
  out = create_weight_df(edges, field, prop_name=prop_name)
  data = pd.DataFrame(nodes[field].value_counts())
  data.index.name=prop_name
  data.columns=['Count']
  data.reset_index(inplace=True)
  data = data.set_index(prop_name).join(out['degree_df'].set_index(prop_name)).reset_index()
  data.sort_values(by='Count', inplace=True, ascending=False)
  data.reset_index(inplace=True)
  data['Degree to Count'] = data['Degree']/data['Count']
  display(data.head())
  # libraries
  # https://stackoverflow.com/questions/51579215/remove-seaborn-lineplot-legend-title

  # https://www.python-graph-gallery.com/bubble-plot-with-seaborn
  import matplotlib.pyplot as plt
  import seaborn as sns
  sns.set_context('talk')
  sns.set_style("darkgrid")

  # Control figure size for this notebook:
  plt.rcParams['figure.figsize'] = [8, 8]

  # use the scatterplot function to build the bubble map
  g = sns.scatterplot(data=data, x="Count", y='Degree', size="Degree to Count", hue=prop_name, legend=True, 
                      sizes=(10, 1000), alpha=0.4)
  g.set_yscale('log')
  g.set_xscale('log')
  # handles, labels = g.get_legend_handles_labels()
  # plt.legend(handles=handles[:7], labels=labels[:7], bbox_to_anchor=(1, 1), loc='upper left')
  plt.legend(bbox_to_anchor=(1, 1), loc='upper left')
  # g.set_xticks([])
  plt.show()
  return data

def plot_bubbles_countries():
  tuples = nodes[['countries', 'continents']].set_index(['countries', 'continents']).index.unique().values
  country_continent = {k: v for k, v in tuples}

  field = 'countries'
  prop_name='Country'
  out = create_weight_df(edges, field, prop_name=prop_name)
  data = pd.DataFrame(nodes[field].value_counts())
  data.index.name=prop_name
  data.columns=['Count']
  data.reset_index(inplace=True)
  data['Region'] = data['Country'].map(country_continent)
  data = data.set_index(prop_name).join(out['degree_df'].set_index(prop_name)).reset_index()
  data.sort_values(by='Count', inplace=True, ascending=False)
  data.reset_index(inplace=True)
  display(data.head())
  data = data.head(50)
  data['Degree to Count'] = data['Degree']/data['Count']
  # libraries
  # https://stackoverflow.com/questions/51579215/remove-seaborn-lineplot-legend-title

  # https://www.python-graph-gallery.com/bubble-plot-with-seaborn
  import matplotlib.pyplot as plt
  import seaborn as sns
  sns.set_context('talk')
  sns.set_style("darkgrid")

  # Control figure size for this notebook:
  plt.rcParams['figure.figsize'] = [8, 8]


  # use the scatterplot function to build the bubble map
  g = sns.scatterplot(data=data, x="Count", y='Degree', size="Degree to Count", hue='Region', legend=True, 
                      sizes=(1, 5000), alpha=0.4)
  g.set_xscale('log')
  g.set_yscale('log')
  handles, labels = g.get_legend_handles_labels()
  plt.legend(handles=handles[:7], labels=labels[:7], bbox_to_anchor=(1, 1), loc='upper left')
  #plt.legend(bbox_to_anchor=(1, 1), loc='upper left')
  #g.set_xticks([])
  plt.show()
  return g

def create_weight_df(data_frame, field, prop_name='', figsize=(6, 6)):
  source = f'{field}_source'
  target = f'{field}_target'
  if prop_name=='':
    prop_name=field.title()
  out = {}
  df = data_frame[[source, target]]
  df['weight'] = 0
  df = df.groupby([source, target]).count()
  df.reset_index(inplace=True)
  df.rename(columns={source: 'source', target: 'target'}, inplace=True)

  out.update({'df': df})
  G = nx.from_pandas_edgelist(df, source='source', target='target', 
                              edge_attr='weight', 
                              create_using=nx.DiGraph())
  
  out.update({'G': G})
  degree_df = pd.DataFrame(G.degree(weight='weight'), columns=[prop_name, 'Degree'])
  degree_df['In-Degree'] = degree_df[prop_name].map(G.in_degree(weight='weight'))
  degree_df['Out-Degree'] = degree_df[prop_name].map(G.out_degree(weight='weight'))

  out.update({'degree_df': degree_df})

  return out

In [6]:
from tools import GraphH

In [7]:
def import_single_nodes_edges():
  edges = pd.read_csv(f'{NODE_EDGES_PATH}/edges_single_country.csv')
  nodes = pd.read_csv(f'{NODE_EDGES_PATH}/nodes_single_country.csv')
  nodes['location'] = nodes[['lon', 'lat']].apply(lambda x: (x[0], x[1]), axis=1)
  edges['start_date'] = pd.to_datetime(edges['start_date'])
  edges['end_date'] = pd.to_datetime(edges['end_date'])
  active_period = edges['end_date'] - edges['start_date']
  edges['active_days'] = active_period.dt.days
  nodes.drop_duplicates(subset=['node_id'], inplace=True)
  print(f'Single nodes shape: {nodes.shape}')
  print(f'Single edges shape: {edges.shape}')
  return nodes, edges


In [8]:
nodes, edges = import_single_nodes_edges()

  if self.run_code(code, result):


Single nodes shape: (179914, 18)
Single edges shape: (279475, 6)


In [9]:
# links = edges['link'].value_counts().index.tolist()
# link = 'officer of'
# sub_edges = edges[edges['link'] == link]
# print(f'\n{link}')
# G = GraphH().get_graph(nodes, sub_edges, return_orig = True)


In [10]:
G = GraphH().plot_nodes_locations_individual_links(nodes, edges, figsize=(12, 8), weakly_connected=False, test=False)

Output hidden; open in https://colab.research.google.com to view.

In [11]:
G = GraphH().plot_nodes_locations_individual_links(nodes, edges, figsize=(12, 8), weakly_connected=True, test=False)

Output hidden; open in https://colab.research.google.com to view.

In [12]:
#GraphH().general_plot_degree_distribution(nodes, edges)

In [13]:
#g, labels = create_location_weight_graph(G)

```python
# simple paths - directed graphs
nx.all_simple_paths(G, node_1, node_2)
```


if you wanted to block the message from node 1 to node 2 by removing nodes, how many nodes would we need to remove

```
nx.node_connectivity(G, node_1, node_2)
nx.minimum_node_cut(G, node_1, node_2)
```


```
nx.edge_connectivity(G, node_1, node_2)
nx.minimum_edge_cut(G, node_1, node_2)
```

```
strong_connected_components  = sorted(nx.strongly_connected_components(G))
print(sorted(list(set([len(w) for w in strong_connected_components])), reverse=True))
```


```
weak_connected_components  = sorted(nx.weakly_connected_components(G))
print(sorted(list(set([len(w) for w in weak_connected_components])), reverse=True))

[138497, 53, 46, 32, 27, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2]
```


```
B = check_if_bipartite()

is bipartite:  False
There are 37,417 common nodes among start and end nodes.
```





```
get_global_clustering_coefficient(G);
Global Clustering Coefficient is  8.25e-03
```


```
get_transitivity(G);
Global Clustering Coefficient - Transitivity is  1.84e-04.
```


```
compare_transitivity_avgclusteringcoef(transitivity, avg_clustering_coef)

Ratio of Average clustering coefficient to Transitivity is 44.77.
	Most nodes have high LCC (local clustering coefficient).
	The high degree node has low LCC (Local clustering coefficient).
```



```
node "236724" has degree 35805.
Local Clustering Coefficient of node "236724" is  2.39e-06

node "123520" has degree 1.
Local Clustering Coefficient of node "123520" is 0

node "50622" has degree 1338.
Local Clustering Coefficient of node "50622" is  7.43e-04
```

```
diameter: Found infinite path length because the digraph is not strongly connected

average shortest path length: Graph is not weakly connected.

eccentricity: Found infinite path length because the digraph is not strongly connected

radius: Found infinite path length because the digraph is not strongly connected

periphery: Found infinite path length because the digraph is not strongly connected

center: Found infinite path length because the digraph is not strongly connected
```

```
Number of connected components in this graph are 13514.
```

```
Graph has 170,478 nodes and 236,434 edges.
The graph is directed.
The graph is NOT a multigraph.
The graph is NOT strongly connected.
The graph is NOT weakly connected.
diameter: Found infinite path length because the graph is not connected
average shortest path length: Graph is not connected.
eccentricity: Found infinite path length because the graph is not connected
radius: Found infinite path length because the graph is not connected
```

```
Graph has 138,497 nodes and 217,530 edges.
The graph is directed.
The graph is NOT a multigraph.
The graph is NOT strongly connected.
The graph is weakly connected.

```