<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Digraphs-and-Mathematical-Relations" data-toc-modified-id="Digraphs-and-Mathematical-Relations-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Digraphs and Mathematical Relations</a></span><ul class="toc-item"><li><span><a href="#The-Bow-Tie-Structure-of-the-WWW" data-toc-modified-id="The-Bow-Tie-Structure-of-the-WWW-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>The Bow-Tie Structure of the WWW</a></span></li></ul></li><li><span><a href="#Computing-Bow-Tie-Components" data-toc-modified-id="Computing-Bow-Tie-Components-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Computing Bow-Tie Components</a></span><ul class="toc-item"><li><span><a href="#Weakly-Connected-Components" data-toc-modified-id="Weakly-Connected-Components-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Weakly Connected Components</a></span></li></ul></li><li><span><a href="#Computing-WCCs-and-SCCs-in-networkx" data-toc-modified-id="Computing-WCCs-and-SCCs-in-networkx-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Computing WCCs and SCCs in <code>networkx</code></a></span><ul class="toc-item"><li><span><a href="#WCCS" data-toc-modified-id="WCCS-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>WCCS</a></span></li><li><span><a href="#Strongly-Connected-Components" data-toc-modified-id="Strongly-Connected-Components-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Strongly Connected Components</a></span></li><li><span><a href="#$G_{ER}$" data-toc-modified-id="$G_{ER}$-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>$G_{ER}$</a></span></li></ul></li></ul></div>

<h1>CS4423-Networks: Week 12 (2+3 April 2025) 

# Part 1: Bow Tie Components
Niall Madden, 
School of Mathematical and Statistical Sciences  
University of Galway


This Jupyter notebook, and PDF and HTML versions, can be found at https://www.niallmadden.ie/2425-CS4423/#Week12

<div class="rc"><font size="-1"><em>This notebook was adapted by Niall Madden from one developed by Angela Carnevale.</em></div>

In [None]:
import networkx as nx
import numpy as np
opts = { "with_labels": True,  "node_color": "#224488", "font_color": "white", "arrowsize":30 } 

import matplotlib.pyplot as plt

np.set_printoptions(precision=2)    # just display arrays to 2 decimal places
np.set_printoptions(suppress=True) 

from collections import Counter

## Digraphs and Mathematical Relations

When a directed graph $G$ is regarded as a **relation** on the set $X$, strongly connected components can be described as the **equivalence classes** of an equivalence relation, as we'll now explain.

First recall that
* $x \to y$ means that there is a (directed) edges from $x$ to $y$
* $x \leadsto y$ means that there is a path from $x$ to $y$.



We can see (right?) that the relation ${x \leadsto y}$
is the reflexive and transitive closure of the
edge relation $x \to y$.  

Thus, by construction it is reflexive and
transitive.  

(There might be nodes $x$ and $y$ with $x\leadsto y$ and $y 
\leadsto x$, though it won't be all of them). 

So this allows us to define a new relation as follows.

**Definition.** We set $x \equiv y$ if $x \leadsto y$ _and_ $y \leadsto x$.

This **is** an equivalence relation we get equivalence classes that partition our graph. These equivalence classes are the **strongly connected
components** of $G$.  We denote the class of $x \in X$ by $[x]$.

Moreover, there is a partial order relation
$\leq$ (a relation which is reflexive, transitive and anti-symmetric)
on the set of equivalence classes:

$[x] \leq [y]$ if $x \leadsto y$.

We can say something about the structure of the WWW in terms of these equivalence classes and of the partial order on them.

### The Bow-Tie Structure of the WWW

Like the giant component in a simple graph, it turns out that
a directed graph with sufficiently many edges
has  a **giant SCC**.

The remainder of the graph consists of four more sets of components of nodes, as follows:

1. IN: upstream components, the set of all components
$C$ with $C <$ SCC.

2. OUT: downstream components,
the set of all components $C$ with $C >$ SCC.


3. tendrils: the set of all components $C$ with either $C >$ IN and $C \not<$ OUT
or $C <$ OUT and $C \not>$ IN; <BR />
and tubes: components $C$ with $C >$ IN, $C <$ OUT but $C \not <$ SCC.

4. disconnected components.

Thus, in any directed graph with a distinguished SCC,
the WCC in which it is contained
necessarily has the following global bow-tie structure:

![bow tie](bowtie.png)

Research on available data on the Web in 1999 has confirmed this
bow tie structure for the World Wide Web, with a **large Giant SCC**
covering less than $\frac13$ of the vertex set,
and the
three parts IN, OUT and the tendrils and tubes roughly of the same
size.  One can assume that this proportion has not changed much over
time, although the advent of social media
has brought many new types of links and
documents to the Web.


## Computing Bow-Tie Components

**Example.**  Let's start with a reasonably large random **directed graph**,
using the Erdős-R&eacute;nyi $G(n, m)$ model:

In [None]:
n, m = 100, 120
G = nx.gnm_random_graph(n, m, directed=True)

In [None]:
nx.draw(G)

### Weakly Connected Components

The weakly connected components of a directed graph $G$ can be determined by BFS, as before,
counting as "neighbors" of a node $x$ **both** its _successors_ and it _predecessors_ in the graph.

A single component, the weakly connected component of node $x$, is found as follows.

In [None]:
def weak_component(G, x):
    nodes = {x}
    queue = [x]
    for y in queue:
        G.nodes[y]["seen"] = True
        for z in set(G.successors(y)) | set(G.predecessors(y)): ## preds+succs are the neighbours of a node
            if z not in nodes:
                nodes.add(z)
                queue.append(z)
    return nodes

The list of all weakly connected components is computed by looping over all the  nodes of `G`,
computing the components of "unseen" nodes and collecting them in a list.
The final result is sorted by decreasing length before it is returned.

In [None]:
def weak_components(G):
    wccs = []     # initialize
    
    # find each node's wcc
    for x in G:
        if not G.nodes[x].get("seen"):
            wccs.append(weak_component(G, x))
            
    # clean up after yourself
    for x in G:
        del G.nodes[x]["seen"]
        
    # return sorted list of wccs
    return sorted(wccs, key=len, reverse=True)

Note let's check the number of Weakly Connected Components, and their order:

In [None]:
wccs = weak_components(G)
print(f"G has {len(wccs)} weakly connected components")

In [None]:
[len(c) for c in wccs]

## Computing WCCs and SCCs in `networkx`

As you might expect, there are algorithms for doing this in `networkx`. Let's try them, but first recall a variant on an examples from last week:

In [None]:
G = nx.DiGraph([(0, 1), (1, 2), (2, 3), (3,0), (2,4),(2,5),(4,5), (6,7), (7,6)])
nx.draw_circular(G, **opts)

### WCCS

We should have been able to see there are 2 WCCs:

In [None]:
C = nx.weakly_connected_components(G)   # returns and iterable
print(f"There are {len(list(C))} WCCs:")
for c in nx.weakly_connected_components(G):
    print(c)

### Strongly Connected Components

**Strongly** connected components are efficiently found by DFS.
[Tarjan's Algorithm](https://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm) cleverly
uses recursion and an additional stack for this. 

Have a look at the Wiki page. We'll use `networkx`:

In [None]:
C = nx.strongly_connected_components(G)   # returns and iterable
print(f"There are {len(list(C))} WCCs:")
for c in nx.strongly_connected_components(G):
    print(c)

### $G_{ER}$

We'll finish by checking the size of the components of a graph in $G_{ER}(n,p)$

In [None]:
n, m = 100, 120
G = nx.gnm_random_graph(n, m, directed=True)

In [None]:
C = nx.weakly_connected_components(G)   # returns and iterable
print(f"There are {len(list(C))} WCCs:")
k=-1
for c in sorted(nx.weakly_connected_components(G), key=len, reverse=True):
    if (len(c)>1):
        k+=1
        print(f"Component {k} has {len(c)}")
print(f"Other {n-k-1} components have order 1")

For SCCs, we get:

In [None]:
C = nx.strongly_connected_components(G)   # returns and iterable
print(f"There are {len(list(C))} SCCs:")
k=-1
for c in sorted(nx.strongly_connected_components(G), key=len, reverse=True):
    if (len(c)>1):
        k+=1
        print(f"Component {k} has {len(c)}")
print(f"Other {n-k-1} components have order 1")