## Generating the combinations

In [1]:
from itertools import permutations, combinations
def generate_all_permutations(seq, pad = None) :
    
    res = []
    
    for i in range(5) :
        for sseq in combinations(seq, i):
            if pad : [res.append(vseq + (pad,) * (4 - i)) for vseq in permutations(sseq) ]
            else   : res.extend(permutations(sseq))
    return res

## Data Volume

Generating all possible pairs of len <= 4, in seq [1 .. 128] gives total unique sets with cardinality of **258,096,641**

In [2]:
# len(generate_all_permutations(range(128))) # --> 258096641

## Data Masking
We need to treat all combination as 4 length set, so all ordered set with cardinality < 4 need to pad.

We will go with approach to mask the remaining records to make carinality of ordered set 4
e.g. if 
|Ordered Set      | Masked Ordered Set|
| ---             | ---               |
|S = [1, 2],      |mS = [1, 2, p, p]  |
|S = [1, 2, 3],   |mS = [1, 2, 3, p]  |
|S = [1, 2, 3, 4] |mS = [1, 2, 3, 4]  |


we can choose p as any value not in initail set, for consitency across model we will choose $p = -1$, and restrict initial set to be of positive integers

In [3]:
import numpy
def write_all_permutations(start, end):
    with open(f"data/all_s{start}_e{end}_permutations", "wb") as f:
        numpy.save(f, generate_all_permutations(range(start, end), pad = -1))
        

def read_all_permutations(filepath):
    with open(filepath, "rb") as f:
        return numpy.load(f)
    

# Timing

Hardware : Apple Mac M1 Pro, 16 GB Ram, 512 GB SSD

Generate takes around 
30.4 s ± 858 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [4]:
start, end = 1, 128
# %time generate_all_permutations(range(start, end), pad = -1)

Generating and writing for 128 all combination tooks around 127.14764189720154 sec


In [13]:
# import time
# start = time.time()
# write_all_permutations(1, 128)
# print(f"took around {time.time() - start}")

Reading for 128 all combination took around 1.74102783203125 sec

In [19]:
start = time.time()
read_all_permutations(f"data/all_s1_e128_permutations")
print(f"took around {time.time() - start}")

took around 1.74102783203125


In [20]:
res = read_all_permutations(f"data/all_s1_e128_permutations")

In [24]:
res[1000 : 1200]

array([[  4,  66,  -1,  -1],
       [ 66,   4,  -1,  -1],
       [  4,  67,  -1,  -1],
       [ 67,   4,  -1,  -1],
       [  4,  68,  -1,  -1],
       [ 68,   4,  -1,  -1],
       [  4,  69,  -1,  -1],
       [ 69,   4,  -1,  -1],
       [  4,  70,  -1,  -1],
       [ 70,   4,  -1,  -1],
       [  4,  71,  -1,  -1],
       [ 71,   4,  -1,  -1],
       [  4,  72,  -1,  -1],
       [ 72,   4,  -1,  -1],
       [  4,  73,  -1,  -1],
       [ 73,   4,  -1,  -1],
       [  4,  74,  -1,  -1],
       [ 74,   4,  -1,  -1],
       [  4,  75,  -1,  -1],
       [ 75,   4,  -1,  -1],
       [  4,  76,  -1,  -1],
       [ 76,   4,  -1,  -1],
       [  4,  77,  -1,  -1],
       [ 77,   4,  -1,  -1],
       [  4,  78,  -1,  -1],
       [ 78,   4,  -1,  -1],
       [  4,  79,  -1,  -1],
       [ 79,   4,  -1,  -1],
       [  4,  80,  -1,  -1],
       [ 80,   4,  -1,  -1],
       [  4,  81,  -1,  -1],
       [ 81,   4,  -1,  -1],
       [  4,  82,  -1,  -1],
       [ 82,   4,  -1,  -1],
       [  4,  

2142760440