# Simplicial Complex Generators Test

## Erdos-Renyi SC

### 1. 
$G(N, p_1, p_2)$ with number of filled triangles (2-simplices) is $p_2 \, \binom{N}{3}$ with downward closure property: if $i, j, k \in SC$ then all pairs in SC.

Approaches:
  - Create a $G(N, p_1)$ skeleton first, then fill in triangles (add any "open" triangle as 2-simplex with prob $p_2$)

  - Independent, filter / project: generate $G(N, p_1, p_2)$ independently, then if $i, j, k$ was picked but not one of its edges: discard it or add mising edges. Here, the actual number of triangles can be too low.


### 2. 
Build clique complexes on E-R random graph

- Create $G(N, p_1)$ graph
- Find cliques (3-cliques only), and fill-in all triangles (add the 2-simplices), downward closure guaranteed by construction. The prob of triangle existing is not independent, but it's prob of all edges of triangle existing ($p_1^3$ for a triangle) summed over all possible triangles. Makes it hard to tune the density..

### 3. 
A random geometric constuction, e.g. Vietoris-Rips based on point placement. Input: disntace threshold eps, and point distribution e.g. uniform on a unit square. Approach: Draw the points in a $[0, 1]^{d}$ then connect them if distance below threshold, it's a clique complex of the graph for triangles. Less pure less combinatorial approach..

In [4]:
import sys
sys.path.append('../src/')
sys.path.append('../scripts/')

from Hypergraphs import EmptyHypergraph

import numpy as np
import matplotlib.pyplot as plt
import os

from scipy.stats import zipf # for power-law Zeta distribution
# from itertools import combinations
from scipy.special import comb

## E-R SC

In [3]:
from generators_sc import *

In [19]:
# test ER SC
N = 1000

k1_k2_list = [(3, 1), (6, 2), (10, 3), (20, 6)]
(k1, k2) = k1_k2_list[2]
print(f"Using (k1, k2) = {(k1, k2)}")

p1, p2 = p1_p2_ER_like_simplicial_complex(k1,k2,N)

print(f"p1 = {p1:.4f}")
print(f"p2 = {p2:.8f}")

_, edges, triangles = ER_like_simplicial_complex(N,p1,p2)

g_edges = []
all_edges = edges.tolist() + triangles.tolist()
for edge in all_edges:
    g_edges.append(tuple(edge))
print(f"g_edges: {g_edges[:5]}, ..., {g_edges[-5:]}")

Using (k1, k2) = (10, 3)
p1 = 0.0040
p2 = 0.00000602
not connected, but GC has order  975 and size 1997
g_edges: [(0, 48), (0, 325), (0, 353), (1, 244), (1, 390)], ..., [(541, 621, 777), (69, 220, 556), (432, 464, 721), (62, 192, 200), (126, 190, 473)]


In [20]:
inter_order_overlap(edges, triangles)

1.0

In [12]:
print(p1, np.log(N) / N)
p1 > np.log(N) / N

0.004028197381671702 0.006907755278982137


np.False_

In [None]:
# TODO: increase to get connected
0.0097 > np.log(N) / N # simulated p1_est

np.True_

In [21]:
test_p1_p2 = True
inter_order_overlaps = []
if test_p1_p2:
    max_pw_edges = N * (N - 1) / 2
    max_ho_edges = N * (N - 1) * (N - 2) / 6

    p1_est = []
    p2_est = []
    nsims = 1000
    for _ in range(nsims):
        p1, p2 = p1_p2_ER_like_simplicial_complex(k1,k2,N)
        _, edges, triangles = ER_like_simplicial_complex(N,p1,p2)
        p1_est.append(len(edges) / max_pw_edges)
        p2_est.append(len(triangles) / max_ho_edges)

        inter_order_overlaps.append(inter_order_overlap(edges, triangles))

    print(f"p1_est = {np.mean(p1_est):.4f}")
    print(f"p2_est = {np.mean(p2_est):.8f}")
    # p1_est = 0.0097
    # p2_est = 0.00000568     
    # p1 = 0.0040
    # p2 = 0.00000602
sum(inter_order_overlaps) # == nsims

not connected, but GC has order  985 and size 2008
not connected, but GC has order  971 and size 1954
not connected, but GC has order  988 and size 2079
not connected, but GC has order  980 and size 2041
not connected, but GC has order  982 and size 2010
not connected, but GC has order  982 and size 2031
not connected, but GC has order  980 and size 2005
not connected, but GC has order  981 and size 2018
not connected, but GC has order  980 and size 2011
not connected, but GC has order  983 and size 2026
not connected, but GC has order  978 and size 1989
not connected, but GC has order  980 and size 2064
not connected, but GC has order  987 and size 2058
not connected, but GC has order  989 and size 2044
not connected, but GC has order  975 and size 1929
not connected, but GC has order  981 and size 1955
not connected, but GC has order  983 and size 2012
not connected, but GC has order  987 and size 2003
not connected, but GC has order  979 and size 1959
not connected, but GC has order

1000.0

In [None]:
test_k1_k2 = True
inter_order_overlaps = []
if test_k1_k2:
    k1_est = []
    k2_est = []
    nsims = 1000
    for _ in range(nsims):
        p1, p2 = p1_p2_ER_like_simplicial_complex(k1,k2,N)
        _, edges, triangles = ER_like_simplicial_complex(N,p1,p2)

        g_type = "random_ER"
        g = EmptyHypergraph(N)
        g.name = g_type
        g.set_edges(g_edges)

        k1_sim = np.mean([len(g.neighbors(i, 1)) for i in list(g.nodes.keys())])
        k2_sim = np.mean([len(g.neighbors(i, 2)) for i in list(g.nodes.keys())])

        k1_est.append(k1_sim)
        k2_est.append(k2_sim)
    
    print(f"k1_est = {np.mean(k1_est):.4f}")
    print(f"k2_est = {np.mean(k2_est):.4f}")
    # Using (k1, k2) = (10, 3)
    # k1_est = 9.6480
    # k2_est = 2.8440

not connected, but GC has order  980 and size 2041
not connected, but GC has order  981 and size 2013
not connected, but GC has order  970 and size 1925
not connected, but GC has order  981 and size 2014
not connected, but GC has order  987 and size 1944
not connected, but GC has order  984 and size 1984
not connected, but GC has order  987 and size 2014
not connected, but GC has order  986 and size 2037
not connected, but GC has order  985 and size 2046
not connected, but GC has order  980 and size 1941
not connected, but GC has order  985 and size 2063
not connected, but GC has order  969 and size 1949
not connected, but GC has order  975 and size 2002
not connected, but GC has order  982 and size 2031
not connected, but GC has order  980 and size 2067
not connected, but GC has order  977 and size 1943
not connected, but GC has order  977 and size 2076
not connected, but GC has order  986 and size 1993
not connected, but GC has order  982 and size 1987
not connected, but GC has order