The aim of this notebook is to show the user how to compute the anallytical expectation of dyadic motifs given the solution of the Directed Configuration Model.
Tools to define the statistical significance of the observed numerosity of dyadic motifs are provided.
Because Dyadic motifs are well defined on networks binary and directed, the analytical results presented here work only on `dcm` and `dcm_exp` models.

## 1 Observed Dyads

FIrst let's give some context by defining what are dyadic motifs.
On a directed network $G(N, E)$, given two nodes $i,j \in E$ you can observe three disjoint possibilities:
- a __reciprocated dyad__: $(i,j), (j,i) \in E$, meaning a connection exists both from $i$ to $j$ and vice versa. We denote the total number of reciprocated dyads in the network as $L^{\leftrightarrow}$
$$
L^\leftrightarrow = \sum_{i\neq j}a_{ij}a_{ji}
$$
- non __reciprocated dyad__: $(i,j)\in E$ or $(j,i)\in E$, meaning only one of the two possible links exists. We denote the total number of non reciprocated dyads as $L^{\rightarrow}$
$$
L^\rightarrow = \sum_{i\neq j}a_{ij}(1-a_{ji})
$$
- __empty dyad__$(i,j), (j,i) \notin E$, meaning no link exists between the two nodes.
    The total number of empty dyads is denoted by $L^{\nleftrightarrow}$
$$
L^\nleftrightarrow = \sum_{i\neq j}(1-a_{ij})(1-a_{ji})
$$


Now, we load the module `NEMtropy.matrix_generator` in order to work with networks on python in general and, in the specific, in order to load a graph to start experimenting on:

In [27]:
import NEMtropy.matrix_generator as mg

# 
A = mg.random_binary_matrix_generator_custom_density(5,
                                                           0.6,
                                                           sym=False)

print(A)

[[0. 1. 0. 1. 1.]
 [0. 0. 1. 1. 1.]
 [0. 1. 0. 1. 1.]
 [1. 0. 0. 0. 1.]
 [1. 1. 1. 1. 0.]]


Now, the first interesting thing we can do is to compute the numerosity of the three types of dyads we described before. 

`NEMtropy` provides us functions to achieve this: functions related to computing measures directly on a given networks are contained in `network_functions`:
- `NEMtropy.network_functions.dyads_count()`
- `NEMtropy.network_functions.singles_count()`
- `NEMtropy.network_functions.zeros_count()`

In [52]:
import NEMtropy.network_functions as nf

full_dyads = nf.dyads_count(A)
print(f' number of reciprocated dyads = {full_dyads}')
single_dyads = nf.singles_count(A)
print(f' number of non reciprocated dyads = {single_dyads}')
zeros_dyads = nf.zeros_count(A)
print(f' number of empty dyads = {zeros_dyads}')

 number of reciprocated dyads = 12
 number of non reciprocated dyads = 3
 number of empty dyads = 2


# 2 Expected dyads

Now let's solve the `dcm` problem associated to `A`

In [35]:
import NEMtropy as nem  # we finally need the whole module
import sys

# load adjacency matrix A as a Directed Graph object in NEMtropy
G = nem.DirectedGraph(A)

# solve the dcm problem
G.solve_tool(model="dcm_exp",
                method="newton",
                initial_guess="random")



solution error = 4.214967042059925e-09


In [8]:
# We read one of the edgelists
edgelist_ens = np.loadtxt("sample/0.txt")

# and build the adjacency matrix
ens_adj = build_adjacency_from_edgelist(edgelist = edgelist_ens,
                                        is_directed = True,
                                        is_sparse = False,
                                        is_weighted = False)

Nice! the solution error is close to zero, meaning we have a good solution.

### 2.1 Analytical z-score

NEMtropy class DirectedGraph offers us function `motifs_2_zscore()` to compute the analytical z-score for each of the three dyads categories. 
Remind that the z-score of a r.v. $X$ is defined as $z[X]:

$$
z[X] = \frac{X^*[A] - \mu[x]}{\sigma[X]}
$$

where:
- $X^*[A]$ is the value of computed on the actual adjacency matrix $A$
- $\mu[X]$ is $X$ expected value
- $\sigma[X]$ is $X$ standard deviation

In [38]:
G.motifs_2_zscore()

{'dyads': 0.19844804186930978,
 'singles': -0.36418523782950074,
 'zeros': 0.2940475594491965}

### 2.2 Empirical z-score

Of course we can use the sampling function in order to compute the dyads z-score empirically. The results obtained should be close to the analytical results as the numerosity of the sample increases.

First we sample 100 graphs from the DCM ensemble enerated by the solution:

In [47]:
# We generate an ensemble copy
n = 1000
G.ensemble_sampler(n, cpu_n=2, output_dir="sample/")

now, for each graph in the sampling we compute empirical expected value and standard deviation of the dyads variables.

To compute the dyads we can resort to the network functions we saw in the first paragraph

In [None]:
import networkx as nx

# memory pre-allocation
emp_dyads = np.zeros(n)
emp_singles = np.zeros(n)
emp_zeros = np.zeros(n)

for i in range(n):
    # Read the graph as an edgelist
    edgelist_ens = np.loadtxt(f"sample/{i}.txt")
    # read the graph in networkx 
    graph = nx.DiGraph()
    graph.add_edges_from(edgelist_ens.astype(int))
    # create adjacency matrix as numpy array
    a = nx.to_numpy_array(graph)
    # compute dyads numerosity
    emp_dyads[i] = nf.dyads_count(a)
    emp_singles[i] = nf.singles_count(a)
    emp_zeros[i] = nf.zeros_count(a)

We procude three arrays containing for each dyad the numerosity computed on each sampled graph.
We are now able to compute the z-score for each dyad category

In [51]:
z_score = {}
emp_mu = np.mean(emp_dyads)
emp_std = np.std(emp_dyads)
z_score['dyads'] = (nf.dyads_count(A) - emp_mu)/emp_std
emp_mu = np.mean(emp_singles)
emp_std = np.std(emp_singles)
z_score['singles'] = (nf.singles_count(A) - emp_mu)/emp_std
emp_mu = np.mean(emp_zeros)
emp_std = np.std(emp_zeros)
z_score['zeros'] = (nf.zeros_count(A) - emp_mu)/emp_std

print(z_score)

{'dyads': 0.22710883844423013, 'singles': -0.38927849101120254, 'zeros': 0.3173233126331804}


As you can observe, while the empirical results is not perfectly coincident with the analytical one, they are close enough to be considered satisfactory.