In [1]:
import numpy as np
import xgi ## pip install xgi is required
import simplicial as SR
np.random.seed(321)


# Computing simpliciality

Below we compute the three simpliciality measures from https://epjdatascience.springeropen.com/articles/10.1140/epjds/s13688-024-00458-1:
* SF: simplicial fraction
* ES: edit simpliciality
* FES: face edit simpliciality

as well as the measure from https://arxiv.org/abs/2408.11806:
* SR: simplicial ratio

We only retain edges of size 2 to 11 inclusively.


In [2]:
def compute_all(fn=""):
    print('\033[1m'+'\nResults for',fn)
    H = xgi.load_xgi_data(fn).cleanup()
    ## build edge list - remove duplicates and keep size < 12 only
    E = [set(sorted(e)) for e in H.edges.members() if len(e)<=11 and len(e)>=2]
    H = xgi.hypergraph.Hypergraph(E)
    print('number of edges:',len(E))
    print('SF:  %.2f'%xgi.simplicial_fraction(H),
          '\nES:  %.2f'%xgi.edit_simpliciality(H),
          '\nFES: %.2f'%xgi.face_edit_simpliciality(H))        
    ## Simplicial ratio
    S = SR.Simplicial(E)
    print('SR: %.2f'%S.ratio)
    

## Datasets

Those are the 10 datasets used in the above papers. 
Uncomment the first line to run all 10 (the last two take a few minutes to run).


In [3]:
## 10 datasets
#Datasets = ["contact-primary-school", "contact-high-school", "hospital-lyon", "email-enron", "email-eu", "diseasome", "disgenenet", "ndc-substances", "congress-bills", "tags-ask-ubuntu"]

## subset - 3 contact hypergraphs
Datasets = ["contact-primary-school", "contact-high-school", "hospital-lyon"]

for fn in Datasets:
    compute_all(fn)
    

[1m
Results for contact-primary-school
number of edges: 12704
SF:  0.85 
ES:  0.88 
FES: 0.94
SR: 2.68
[1m
Results for contact-high-school
number of edges: 7818
SF:  0.81 
ES:  0.91 
FES: 0.92
SR: 6.70
[1m
Results for hospital-lyon
number of edges: 1824
SF:  0.91 
ES:  0.94 
FES: 0.97
SR: 0.95


## Simplicial matrix and counts

We can also look at the simplicial ratio (SR) separately for each combination of edge sizes.
We illustrate this for one of the datasets used above.

The upper triangle of matrices shown below contains respectively the simplicial ratio and the number of simplicial pairs for nodes of size $i$ and $j$ where $2 \le i < j \le 5$.


In [4]:
from utils import make_matrix_pretty

fn = "hospital-lyon"
H = xgi.load_xgi_data(fn)
E = list(set([tuple(sorted(e)) for e in H.edges.members() if len(e)<=11 and len(e)>=2])) ## keep only edges of size 2 to 5
E = [set(e) for e in E]

print('Simplicial matrix:')
X = SR.Simplicial(E)
print(make_matrix_pretty(X.matrix))

print('\nSimplicial counts:')
X = SR.Simplicial(E)
print(make_matrix_pretty(X.counts))


Simplicial matrix:
[['0.0' '0.9' '0.9' '0.9']
 ['0.0' '0.0' '19.0' '14.3']
 ['0.0' '0.0' '0.0' '0.0']
 ['0.0' '0.0' '0.0' '0.0']]

Simplicial counts:
[['0.0' '>1k' '347' '20.0']
 ['0.0' '0.0' '190' '12.0']
 ['0.0' '0.0' '0.0' '0.0']
 ['0.0' '0.0' '0.0' '0.0']]


## Sample size

Note that **sampling** is used when computing the simplicial ratio, so results can vary a little. The sample size can be set by the user as we show below.


In [5]:
fn = "hospital-lyon"
H = xgi.load_xgi_data(fn).cleanup()
E = [set(sorted(e)) for e in H.edges.members() if len(e)<=11 and len(e)>=2]

#E = list(set([tuple(sorted(e)) for e in H.edges.members() if len(e)<=11 and len(e)>=2])) ## keep only edges of size 2 to 5
#E = [set(e) for e in E]

## using default sample size 1000
S = SR.Simplicial(E)
print('SR: %.2f'%S.ratio)

## using larger sample size 10000
S = SR.Simplicial(E, top_edge_sample_size=10000)
print('SR: %.2f'%S.ratio)


SR: 0.98
SR: 0.97
