###### Introduction to Network Analysis 2023/24 (x)

## Node mixing by (not) degree, graphlet degrees

In [None]:
import networkx as nx
import math
from scipy import stats
import utils

### I. Degree assortative and disassortative networks

Consider the following eight networks of different type and origin.

+ Zachary karate club network ([karate_club.net](http://lovro.fri.uni-lj.si/ina/nets/karate_club.net))
+ Java class dependency network ([java.net](http://lovro.fri.uni-lj.si/ina/nets/java.net))
+ Map of Darknet from Tor network ([darknet.net](http://lovro.fri.uni-lj.si/ina/nets/darknet.net))
+ Social network of unknown origin ([social.net](http://lovro.fri.uni-lj.si/ina/nets/social.net))
+ iMDB actors collaboration network ([collaboration_imdb.net](http://lovro.fri.uni-lj.si/ina/nets/collaboration_imdb.net))
+ Gnutella peer-to-peer sharing network ([gnutella.net](http://lovro.fri.uni-lj.si/ina/nets/gnutella.net))
+ Sample of Facebook social network ([facebook.net](http://lovro.fri.uni-lj.si/ina/nets/facebook.net))
+ *nec* overlay map of the Internet ([nec.net](http://lovro.fri.uni-lj.si/ina/nets/nec.net))



Node mixing measures how likely nodes with certain characteristics or attributes are connected to each other. These attributes can be the gender or age of individuals, degree centrality of nodes or other. In the labs we will focus on node mixing by degree, where we distinguish between:

- Assortative mixing, where nodes are more likely to connect with nodes that have similar attributes. In terms of degrees, nodes with similar degree should connect to one another.
- Disassortative mixing, where nodes are more likely to be connected to the nodes with different attributes. In terms of degrees, nodes with small degree will connect to hubs.

Node mixing is often measured using correlation coefficients, such as Pearson's correlation coefficient. These coefficients quantify the strength and direction of the correlation between node attributes.

**NOTE**: When calculating Pearson coefficient, we add both node degrees of a link to degree array, so our calculation will be independent of how the links are stored.

1. **(code)** Implement Newman's node degree mixing coefficient $r$ as a sample Pearson correlation coefficient between the linked nodes' degrees $k$ and $k'$.

	$$r(k,k')=\frac{\sum_i(k_i-\langle k\rangle)(k'_i-\langle k'\rangle)}{\sigma_k\sigma_{k'}}$$

	Treat all networks as undirected graphs and compute their undirected degree mixing coefficient $r$. Are the networks assortative $r>0$, disassortative $r<0$ or neutral $r\approx 0$?



In [None]:
Gs = []
for name in ["karate_club", "java", "darknet", "social", "collaboration_imdb", "gnutella", "facebook", "nec"]:
  G = utils.read_pajek(name)
  print(G)
  Gs.append(G)


MultiGraph named 'karate_club' with 34 nodes and 78 edges
MultiDiGraph named 'java' with 2378 nodes and 14727 edges
MultiGraph named 'darknet' with 7178 nodes and 25104 edges
MultiGraph named 'social' with 10680 nodes and 24316 edges
MultiGraph named 'collaboration_imdb' with 17577 nodes and 287074 edges
MultiDiGraph named 'gnutella' with 62586 nodes and 147892 edges
MultiGraph named 'facebook' with 63731 nodes and 817035 edges
MultiDiGraph named 'nec' with 75885 nodes and 357317 edges


In [None]:
def degree_mixing(G, source = None, target = None):
  x, y = [], []

  for i, j in G.edges():
    if source != None and target != None:
      x.append(G.out_degree(i) if source == 'out' else G.in_degree(i))
      y.append(G.in_degree(j) if target == 'in' else G.out_degree(j))
    else:
      x.append(G.degree(i))
      y.append(G.degree(j))
      x.append(G.degree(j))
      y.append(G.degree(i))

  return stats.pearsonr(x, y)[0]

print("{:>21s} | {:^7s}".format('Graph', 'r'))

for G in Gs:
  r = degree_mixing(G)
  print("{:>21s} | {:^7.3f}".format("'" + G.name + "'", r))
print()

| Graph                | r      |
| -------------------- | ------ |
| 'karate_club'        | -0.476 |
| 'java'               | -0.307 |
| 'darknet'            | -0.440 |
| 'social'             | 0.238  |
| 'collaboration_imdb' | 0.293  |
| 'gnutella'           | -0.093 |
| 'facebook'           | 0.177  |
| 'nec'                | -0.146 |


Rule of thumb is that social networks are degree assortative while technological networks (such as railway, roadway, ...) are degree neutral. Other networks are generally degree disassortative. Looking at Pearson coefficients, we can only see degree disassortative and assortative networks, which do follow our rule of thumb:
- Facebook, social and collaboration_imdb are assortative
- other non-social networks are degree disassortative

2. **(code)** Generate corresponding Erd&ouml;s-R&eacute;nyi random graphs and compute their undirected degree mixing coefficient $r$. Are random graphs assortative $r>0$, disassortative $r<0$ or neutral $r\approx 0$?



In [None]:
for G in Gs:
  n = G.number_of_nodes()
  m = G.number_of_edges()
  k = 2 * m / n
  p = k/(n-1)

  ER_G = nx.erdos_renyi_graph(n,p)
  r = nx.degree_assortativity_coefficient(ER_G)

  print("{:>21s} | {:^7.3f}".format("'" + ER_G.name + "'", r))
print()

| Graph                | r      |
| -------------------- | ------ |
| 'karate_club'        | -0.039 |
| 'java'               | -0.005 |
| 'darknet'            | -0.009 |
| 'social'             | 0.003  |
| 'collaboration_imdb' | -0.001 |
| 'gnutella'           | -0.001 |
| 'facebook'           | -0.002 |
| 'nec'                | 0.002  |

ER random graph mixing coefficient is neutral, which is to be expected, since edges will be uniformly randomly distributed, therefore nodes in the network will have similar degree.

3. **(code)** For directed networks, compute all four directed degree mixing coefficients $r_{(in,in)}$, $r_{(in,out)}$, $r_{(out,in)}$ and $r_{(out,out)}$. Are the networks assortative $r_{\cdot}>0$, disassortative $r_{\cdot}<0$ or neutral $r_{\cdot}\approx 0$?

In [None]:
print("{:>21s} | {:^7s} {:^7s} {:^7s} {:^7s} {:^7s}".format('Graph', 'r', 'r(ii)', 'r(io)', 'r(oi)', 'r(oo)'))

for G in Gs:
  r = degree_mixing(G)
  # r = nx.degree_assortativity_coefficient(G)

  rii = math.nan
  if isinstance(G, nx.DiGraph):
    rii = degree_mixing(G, 'in', 'in')

  rio = math.nan
  if isinstance(G, nx.DiGraph):
    rio = degree_mixing(G, 'in', 'out')

  roi = math.nan
  if isinstance(G, nx.DiGraph):
    roi = degree_mixing(G, 'out', 'in')

  roo = math.nan
  if isinstance(G, nx.DiGraph):
    roo = degree_mixing(G, 'out', 'out')

  print("{:>21s} | {:^7.3f} {:^7.3f} {:^7.3f} {:^7.3f} {:^7.3f}".format("'" + G.name + "'", r, rii, rio, roi, roo))
print()

| Graph                | r      | r(ii)  | r(io)  | r(oi)  | r(oo)  |
| -------------------- | ------ | ------ | ------ | ------ | ------ |
| 'karate_club'        | -0.476 | nan    | nan    | nan    | nan    |
| 'java'               | -0.307 | -0.027 | -0.017 | -0.321 | 0.065  |
| 'darknet'            | -0.440 | nan    | nan    | nan    | nan    |
| 'social'             | 0.238  | nan    | nan    | nan    | nan    |
| 'collaboration_imdb' | 0.293  | nan    | nan    | nan    | nan    |
| 'gnutella'           | -0.093 | 0.035  | 0.008  | -0.006 | -0.003 |
| 'facebook'           | 0.177  | nan    | nan    | nan    | nan    |
| 'nec'                | -0.146 | -0.072 | -0.009 | -0.102 | -0.012 |

Looking at Pearson coefficients of undirected networks, we see only degree disassortative and assortative networks. When taking into account their in and out degrees the results are different. In the java network, most of the disassortativness comes from out-degree to in-degree Pearson coefficient, which means that classes with a lot of classes use ones, that are rarely used and the other way around.