## Counting extended simplices on some simplices of the full matrix.

In this notebook, we will try to count all simplices whose sink has a bidirectional connection to another neuron. We will compare results between ER graphs and an actual column graph. For computational reasons, 10000 simplices are taken at a single time and the results are compared.

The algorithm is as follows:
1. Simplices are stored in a dictionary according to their last neuron (this takes a lot of time with a lot of simplices, so this step will be removed in more accurate counts)
2. For each key in the dictionary (i.e. each possible simplex sink) the list of neurons with a bidirectional connection is retrieved.
3. We add to the count the number of simplices with a given sink k times the number of neurons with a bidirectional connection to k.

This way, bi-simplices (i.e. n-nodes motifs that contain two n-1 simplices with a bidirectional connection between their sinks) are counted twice. This was done for simplicity/computational reasons; since the ratio extended_simplices/simplices is more than 4 times higher in in the real column, even in the worst case scenario this overcounting does not explain the overexpression. 

For computational reasons, only 10000 random simplices per dimension were taken in consideration. We have done:
1. Ten trials with different simplices on a single ER instance with same n_nodes and density of column.
2. Ten trials with ten different ER instance with same n_nodes and density of column.
3. Ten trials with different simplices on the same column.

The average of the ER instances is in both case 1 and 2 around 19000 extended simplices per 10000 simplices for all dimensions <4 (where this count made sense), while for the real column (case 3) it depends on the dimension, and it grows with it, with a minimum in dimension one of 82000 - 83000 extended simplices per 10000 simplices, getting to 130000 extended simplices per 10000 simplices in dim 5.

NB: only case 1 is annotated, but the rest was done in a similar fashion. 

NB2: results have so low variance that I didn't average them.

### Imports 

Core imports

In [1]:
import multiprocessing as mp
from robust_motifs.custom_mp import prepare_shared_memory
from robust_motifs.counting import get_dag2_signature, get_element_targets
from robust_motifs.data import import_connectivity_matrix, save_er_graph, load_sparse_matrix_from_pkl

File tools

In [2]:
from pathlib import Path
import pickle
import h5py

Drawing tools

In [3]:
from robust_motifs.plot import plot_matrices
from robust_motifs.utilities import get_pos

Other tools

In [4]:
import scipy.sparse as sp
import numpy as np
from itertools import product
from time import time
import os
from tqdm import tqdm

In [5]:
pool = mp.Pool()

## Comparison with Erdos-Reyni graph, random simplices

### Creating ER graph

In [6]:
import_connectivity_matrix(dataframe = False, type = 'csr')

100%|██████████| 55/55 [00:12<00:00,  4.53it/s]


<31346x31346 sparse matrix of type '<class 'numpy.bool_'>'
	with 7648079 stored elements in Compressed Sparse Row format>

In [7]:
n_nodes = 31346
density = 7648079/31346/31346

In [8]:
density

0.007783736164455195

In [9]:
path = Path("data/extended_simplices/full/ER_" + str(n_nodes) + ".flag")
path.parent.mkdir(parents=True, exist_ok = True)
save_er_graph(path, n_nodes, density)
# flagser call
os.system("flagser-count data/extended_simplices/full/ER_" +str(n_nodes)+".flag --out data/extended_simplices/full/ER_"+str(n_nodes)+"-count.h5")

100%|██████████| 31346/31346 [00:00<00:00, 3479730.39it/s]
7647973it [00:11, 674130.10it/s]


0

In [10]:
for _ in range(10):
    result_dictionary = {} # this stores extended simplices counts.
    aux_dictionary = {} # this stores simplex count
    ##### Data import ######
    file_path = Path("data/extended_simplices/full/ER_" + str(n_nodes) + ".flag")
    matrix_path = Path("data/extended_simplices/full/ER_" + str(n_nodes) + ".pkl")
    complex_path = Path("data/extended_simplices/full/ER_" + str(n_nodes) + "-count.h5")
    complex_file = h5py.File(complex_path, 'r')
    matrix = load_sparse_matrix_from_pkl(matrix_path)
    arrays, links = prepare_shared_memory(matrix, str(n_nodes))    
    ######## iteration #####
    for dimension in tqdm(range(1, 7)):
        try:
            ####### step 1: compiling the dictionary ######
            random_selection = np.random.choice(complex_file["Cells_" + str(dimension)].shape[0],
                                                min(10000,complex_file["Cells_" + str(dimension)].shape[0]),
                                                replace = False)
            random_selection.sort()
            simplex_iterator = iter(complex_file["Cells_" + str(dimension)][random_selection])
            simplex_dictionary = {}
            for simplex in simplex_iterator:
                simplex_dictionary[simplex[-1]] = simplex_dictionary.get(simplex[-1], []) + [simplex]
            ###### step 2: getting bidirectional targets ######
            mp_iterator = product(simplex_dictionary.keys(), [arrays]) # fictitious product to satisfy imap reqs of a single arugment.
            results = pool.imap(get_element_targets, mp_iterator)
            ###### step 3: counting extended simplices ######
            for elem, key in zip(results, simplex_dictionary.keys()):
                result_dictionary[dimension] = result_dictionary.get(dimension, 0) + len(elem)*len(simplex_dictionary[key])
        except KeyError: # If there are no simplices in h5 file..
                result_dictionary[dimension] = 0
        try:
            aux_dictionary[dimension] = len(complex_file["Cells_" + str(dimension)])
        except KeyError: # If there are no simplices in h5 file...
            aux_dictionary[dimension] = 0
    print("Extended simplices for 10000 simplices per dimension")
    print(result_dictionary)
    print("Total simplices per dimension")
    print(aux_dictionary)
    # free shared memory
    for link in links:
        link.unlink()

100%|██████████| 6/6 [01:22<00:00, 13.83s/it]
  0%|          | 0/6 [00:00<?, ?it/s]

Extended simplices for 10000 simplices per dimension
{1: 18843, 2: 19359, 3: 19088, 4: 60, 5: 0, 6: 0}
Total simplices per dimension
{1: 7647973, 2: 14517680, 3: 214190, 4: 32, 5: 0, 6: 0}


100%|██████████| 6/6 [01:26<00:00, 14.38s/it]
  0%|          | 0/6 [00:00<?, ?it/s]

Extended simplices for 10000 simplices per dimension
{1: 19190, 2: 19195, 3: 19018, 4: 60, 5: 0, 6: 0}
Total simplices per dimension
{1: 7647973, 2: 14517680, 3: 214190, 4: 32, 5: 0, 6: 0}


100%|██████████| 6/6 [01:22<00:00, 13.74s/it]
  0%|          | 0/6 [00:00<?, ?it/s]

Extended simplices for 10000 simplices per dimension
{1: 19055, 2: 19106, 3: 19044, 4: 60, 5: 0, 6: 0}
Total simplices per dimension
{1: 7647973, 2: 14517680, 3: 214190, 4: 32, 5: 0, 6: 0}


100%|██████████| 6/6 [01:26<00:00, 14.39s/it]
  0%|          | 0/6 [00:00<?, ?it/s]

Extended simplices for 10000 simplices per dimension
{1: 18915, 2: 19026, 3: 19060, 4: 60, 5: 0, 6: 0}
Total simplices per dimension
{1: 7647973, 2: 14517680, 3: 214190, 4: 32, 5: 0, 6: 0}


100%|██████████| 6/6 [01:21<00:00, 13.65s/it]
  0%|          | 0/6 [00:00<?, ?it/s]

Extended simplices for 10000 simplices per dimension
{1: 19172, 2: 19074, 3: 18854, 4: 60, 5: 0, 6: 0}
Total simplices per dimension
{1: 7647973, 2: 14517680, 3: 214190, 4: 32, 5: 0, 6: 0}


100%|██████████| 6/6 [01:25<00:00, 14.26s/it]
  0%|          | 0/6 [00:00<?, ?it/s]

Extended simplices for 10000 simplices per dimension
{1: 19107, 2: 19271, 3: 19076, 4: 60, 5: 0, 6: 0}
Total simplices per dimension
{1: 7647973, 2: 14517680, 3: 214190, 4: 32, 5: 0, 6: 0}


100%|██████████| 6/6 [01:24<00:00, 14.16s/it]
  0%|          | 0/6 [00:00<?, ?it/s]

Extended simplices for 10000 simplices per dimension
{1: 18941, 2: 19127, 3: 18963, 4: 60, 5: 0, 6: 0}
Total simplices per dimension
{1: 7647973, 2: 14517680, 3: 214190, 4: 32, 5: 0, 6: 0}


100%|██████████| 6/6 [01:23<00:00, 13.92s/it]
  0%|          | 0/6 [00:00<?, ?it/s]

Extended simplices for 10000 simplices per dimension
{1: 19352, 2: 19167, 3: 19270, 4: 60, 5: 0, 6: 0}
Total simplices per dimension
{1: 7647973, 2: 14517680, 3: 214190, 4: 32, 5: 0, 6: 0}


100%|██████████| 6/6 [01:24<00:00, 14.13s/it]
  0%|          | 0/6 [00:00<?, ?it/s]

Extended simplices for 10000 simplices per dimension
{1: 19129, 2: 19278, 3: 19090, 4: 60, 5: 0, 6: 0}
Total simplices per dimension
{1: 7647973, 2: 14517680, 3: 214190, 4: 32, 5: 0, 6: 0}


100%|██████████| 6/6 [01:25<00:00, 14.32s/it]

Extended simplices for 10000 simplices per dimension
{1: 18663, 2: 19261, 3: 18826, 4: 60, 5: 0, 6: 0}
Total simplices per dimension
{1: 7647973, 2: 14517680, 3: 214190, 4: 32, 5: 0, 6: 0}





#### NB

For each instance, we print two dictionaries: the first dictionary contains the number of extended simplices per dimension based on the 10000 simplices instance. The second dictonary contains the total number of simplices of the instance.

## Comparison with ER graph: random simplices, random instances. 

In [12]:
for _ in range(10):
    path = Path("data/extended_simplices/full/ER_" + str(n_nodes) + ".flag")
    path.parent.mkdir(parents=True, exist_ok = True)
    save_er_graph(path, n_nodes, density)
    os.system("rm data/extended_simplices/full/ER_"+str(n_nodes)+"-count.h5")
    os.system("flagser-count data/extended_simplices/full/ER_" +str(n_nodes)+".flag --out data/extended_simplices/full/ER_"+str(n_nodes)+"-count.h5")
    result_dictionary = {}
    aux_dictionary = {}
    file_path = Path("data/extended_simplices/full/ER_" + str(n_nodes) + ".flag")
    matrix_path = Path("data/extended_simplices/full/ER_" + str(n_nodes) + ".pkl")
    complex_path = Path("data/extended_simplices/full/ER_" + str(n_nodes) + "-count.h5")
    complex_file = h5py.File(complex_path, 'r')
    matrix = load_sparse_matrix_from_pkl(matrix_path)
    arrays, links = prepare_shared_memory(matrix, str(n_nodes))    
    for dimension in tqdm(range(1, 7)):
        try:
            random_selection = np.random.choice(complex_file["Cells_" + str(dimension)].shape[0],
                                                min(10000,complex_file["Cells_" + str(dimension)].shape[0]),
                                                replace = False)
            random_selection.sort()
            simplex_iterator = iter(complex_file["Cells_" + str(dimension)][random_selection])
            simplex_dictionary = {}
            for simplex in simplex_iterator:
                simplex_dictionary[simplex[-1]] = simplex_dictionary.get(simplex[-1], []) + [simplex]
            mp_iterator = product(simplex_dictionary.keys(), [arrays])
            results = pool.imap(get_element_targets, mp_iterator)
            for elem, key in zip(results, simplex_dictionary.keys()):
                result_dictionary[dimension] = result_dictionary.get(dimension, 0) + len(elem)*len(simplex_dictionary[key])
        except KeyError:
                result_dictionary[dimension] = 0
        try:
            aux_dictionary[dimension] = len(complex_file["Cells_" + str(dimension)])
        except KeyError:
            aux_dictionary[dimension] = 0      
    print("Extended simplices for 10000 simplices per dimension")
    print(result_dictionary)
    print("Total simplices per dimension")
    print(aux_dictionary)
    for link in links:
        link.unlink()

100%|██████████| 31346/31346 [00:00<00:00, 3417056.17it/s]
7645597it [00:11, 675795.40it/s]
100%|██████████| 6/6 [01:23<00:00, 13.89s/it]


Extended simplices for 10000 simplices per dimension
{1: 18948, 2: 19137, 3: 19193, 4: 39, 5: 0, 6: 0}
Total simplices per dimension
{1: 7645597, 2: 14509140, 3: 214707, 4: 21, 5: 0, 6: 0}


100%|██████████| 31346/31346 [00:00<00:00, 3231844.18it/s]
7648592it [00:11, 665618.29it/s]
100%|██████████| 6/6 [01:22<00:00, 13.73s/it]


Extended simplices for 10000 simplices per dimension
{1: 19108, 2: 18802, 3: 19212, 4: 36, 5: 0, 6: 0}
Total simplices per dimension
{1: 7648592, 2: 14525817, 3: 214282, 4: 22, 5: 0, 6: 0}


100%|██████████| 31346/31346 [00:00<00:00, 3397365.65it/s]
7643476it [00:11, 680546.09it/s]
100%|██████████| 6/6 [01:21<00:00, 13.55s/it]


Extended simplices for 10000 simplices per dimension
{1: 18915, 2: 18858, 3: 19213, 4: 53, 5: 0, 6: 0}
Total simplices per dimension
{1: 7643476, 2: 14500328, 3: 215005, 4: 30, 5: 0, 6: 0}


100%|██████████| 31346/31346 [00:00<00:00, 3371231.40it/s]
7653868it [00:11, 661600.24it/s]
100%|██████████| 6/6 [01:23<00:00, 13.86s/it]


Extended simplices for 10000 simplices per dimension
{1: 19125, 2: 19358, 3: 19164, 4: 50, 5: 0, 6: 0}
Total simplices per dimension
{1: 7653868, 2: 14560590, 3: 216335, 4: 25, 5: 0, 6: 0}


100%|██████████| 31346/31346 [00:00<00:00, 1264945.62it/s]
7644770it [00:11, 681688.69it/s]
100%|██████████| 6/6 [01:22<00:00, 13.74s/it]


Extended simplices for 10000 simplices per dimension
{1: 18873, 2: 18962, 3: 19726, 4: 27, 5: 0, 6: 0}
Total simplices per dimension
{1: 7644770, 2: 14510014, 3: 214070, 4: 13, 5: 0, 6: 0}


100%|██████████| 31346/31346 [00:00<00:00, 3371317.84it/s]
7649423it [00:11, 684933.64it/s]
100%|██████████| 6/6 [01:22<00:00, 13.82s/it]


Extended simplices for 10000 simplices per dimension
{1: 19028, 2: 19274, 3: 19066, 4: 58, 5: 0, 6: 0}
Total simplices per dimension
{1: 7649423, 2: 14538725, 3: 215936, 4: 26, 5: 0, 6: 0}


100%|██████████| 31346/31346 [00:00<00:00, 3312705.43it/s]
7652972it [00:11, 670265.20it/s]
100%|██████████| 6/6 [01:25<00:00, 14.32s/it]


Extended simplices for 10000 simplices per dimension
{1: 19144, 2: 19403, 3: 19279, 4: 67, 5: 0, 6: 0}
Total simplices per dimension
{1: 7652972, 2: 14548697, 3: 215735, 4: 34, 5: 0, 6: 0}


100%|██████████| 31346/31346 [00:00<00:00, 3423373.34it/s]
7647405it [00:11, 670086.21it/s]
100%|██████████| 6/6 [01:21<00:00, 13.50s/it]


Extended simplices for 10000 simplices per dimension
{1: 19134, 2: 19287, 3: 19449, 4: 46, 5: 0, 6: 0}
Total simplices per dimension
{1: 7647405, 2: 14515822, 3: 215021, 4: 27, 5: 0, 6: 0}


100%|██████████| 31346/31346 [00:00<00:00, 3432400.09it/s]
7653156it [00:11, 659956.47it/s]
100%|██████████| 6/6 [01:21<00:00, 13.55s/it]


Extended simplices for 10000 simplices per dimension
{1: 18951, 2: 19071, 3: 19278, 4: 58, 5: 0, 6: 0}
Total simplices per dimension
{1: 7653156, 2: 14547079, 3: 215439, 4: 27, 5: 0, 6: 0}


100%|██████████| 31346/31346 [00:00<00:00, 3435180.24it/s]
7648561it [00:11, 676944.76it/s]
100%|██████████| 6/6 [01:21<00:00, 13.57s/it]

Extended simplices for 10000 simplices per dimension
{1: 19586, 2: 19351, 3: 19275, 4: 61, 5: 0, 6: 0}
Total simplices per dimension
{1: 7648561, 2: 14532687, 3: 214577, 4: 30, 5: 0, 6: 0}





## Comparison with ER graph: random simplices in column

In [13]:
complex_path = Path("data/tesi/test_instance/column-count.h5")
complex_file = h5py.File(complex_path, 'r')
matrix = import_connectivity_matrix(dataframe = False, type = 'csr')
arrays, links = prepare_shared_memory(matrix, "full")    
for _ in range(10):
    result_dictionary = {}
    aux_dictionary = {}
    for dimension in tqdm(range(1, 7)):
        try:
            random_selection = np.random.choice(complex_file["Cells_" + str(dimension)].shape[0],
                                                min(10000,complex_file["Cells_" + str(dimension)].shape[0]),
                                                replace = False)
            random_selection.sort()
            simplex_iterator = iter(complex_file["Cells_" + str(dimension)][random_selection])
            simplex_dictionary = {}
            for simplex in simplex_iterator:
                simplex_dictionary[simplex[-1]] = simplex_dictionary.get(simplex[-1], []) + [simplex]
            mp_iterator = product(simplex_dictionary.keys(), [arrays])
            results = pool.imap(get_element_targets, mp_iterator)
            for elem, key in zip(results, simplex_dictionary.keys()):
                result_dictionary[dimension] = result_dictionary.get(dimension, 0) + len(elem)*len(simplex_dictionary[key])
        except KeyError:
                result_dictionary[dimension] = 0
        try:
            aux_dictionary[dimension] = len(complex_file["Cells_" + str(dimension)])
        except KeyError:
            aux_dictionary[dimension] = 0
    print("Extended simplices per dimension")
    print(result_dictionary)
    print("Total simplices per dimension")
    print(aux_dictionary)
for link in links:
    link.unlink()

100%|██████████| 55/55 [00:11<00:00,  4.68it/s]
100%|██████████| 6/6 [02:00<00:00, 20.03s/it]
  0%|          | 0/6 [00:00<?, ?it/s]

Extended simplices per dimension
{1: 84111, 2: 100665, 3: 113947, 4: 123106, 5: 133058, 6: 7383}
Total simplices per dimension
{1: 7648079, 2: 73036616, 3: 59945205, 4: 6599529, 5: 133115, 6: 529}


100%|██████████| 6/6 [02:00<00:00, 20.05s/it]
  0%|          | 0/6 [00:00<?, ?it/s]

Extended simplices per dimension
{1: 83927, 2: 100126, 3: 113933, 4: 124033, 5: 133330, 6: 7383}
Total simplices per dimension
{1: 7648079, 2: 73036616, 3: 59945205, 4: 6599529, 5: 133115, 6: 529}


100%|██████████| 6/6 [02:02<00:00, 20.42s/it]
  0%|          | 0/6 [00:00<?, ?it/s]

Extended simplices per dimension
{1: 83691, 2: 100423, 3: 115697, 4: 125084, 5: 135657, 6: 7383}
Total simplices per dimension
{1: 7648079, 2: 73036616, 3: 59945205, 4: 6599529, 5: 133115, 6: 529}


100%|██████████| 6/6 [02:00<00:00, 20.12s/it]
  0%|          | 0/6 [00:00<?, ?it/s]

Extended simplices per dimension
{1: 83725, 2: 101108, 3: 113931, 4: 125780, 5: 134187, 6: 7383}
Total simplices per dimension
{1: 7648079, 2: 73036616, 3: 59945205, 4: 6599529, 5: 133115, 6: 529}


100%|██████████| 6/6 [02:06<00:00, 21.07s/it]
  0%|          | 0/6 [00:00<?, ?it/s]

Extended simplices per dimension
{1: 82965, 2: 100510, 3: 112990, 4: 126839, 5: 133837, 6: 7383}
Total simplices per dimension
{1: 7648079, 2: 73036616, 3: 59945205, 4: 6599529, 5: 133115, 6: 529}


100%|██████████| 6/6 [02:14<00:00, 22.34s/it]
  0%|          | 0/6 [00:00<?, ?it/s]

Extended simplices per dimension
{1: 82710, 2: 100134, 3: 110251, 4: 125121, 5: 133185, 6: 7383}
Total simplices per dimension
{1: 7648079, 2: 73036616, 3: 59945205, 4: 6599529, 5: 133115, 6: 529}


100%|██████████| 6/6 [02:02<00:00, 20.45s/it]
  0%|          | 0/6 [00:00<?, ?it/s]

Extended simplices per dimension
{1: 82520, 2: 100374, 3: 112693, 4: 124216, 5: 134583, 6: 7383}
Total simplices per dimension
{1: 7648079, 2: 73036616, 3: 59945205, 4: 6599529, 5: 133115, 6: 529}


100%|██████████| 6/6 [02:03<00:00, 20.54s/it]
  0%|          | 0/6 [00:00<?, ?it/s]

Extended simplices per dimension
{1: 81816, 2: 100346, 3: 114954, 4: 122619, 5: 134463, 6: 7383}
Total simplices per dimension
{1: 7648079, 2: 73036616, 3: 59945205, 4: 6599529, 5: 133115, 6: 529}


100%|██████████| 6/6 [02:01<00:00, 20.29s/it]
  0%|          | 0/6 [00:00<?, ?it/s]

Extended simplices per dimension
{1: 82661, 2: 100529, 3: 112969, 4: 126354, 5: 133303, 6: 7383}
Total simplices per dimension
{1: 7648079, 2: 73036616, 3: 59945205, 4: 6599529, 5: 133115, 6: 529}


100%|██████████| 6/6 [02:02<00:00, 20.35s/it]

Extended simplices per dimension
{1: 82980, 2: 99655, 3: 113840, 4: 124214, 5: 134424, 6: 7383}
Total simplices per dimension
{1: 7648079, 2: 73036616, 3: 59945205, 4: 6599529, 5: 133115, 6: 529}



