## `tf.data.Dataset`
In previous notebooks, we have this code cell which is a memory hog (the `X`) and took long time to run.
Here in this notebook, our objective is to construct the same dataset by using `tf` operations
instead of `numpy` ones, hoping to reduce both memory usage and time (i.e. dataset construction time.)
```python
%%time
S = set(range(0, 9+1))
index_instance = 0
for length in range(2, max_length+1):    
    n_permutations = factorial(length)
    for c in combinations(S, length):
        for p in permutations(c):
            X[index_instance, :length, :] = one_hot(np.array(p))
            Y[index_instance, :] = np.concatenate((np.argsort(p), np.arange(length, max_length)))
            index_instance += 1
```

In [1]:
import tensorflow as tf
import tensorflow.keras as keras
import numpy as np
from functools import reduce
from itertools import combinations, permutations
from math import factorial
import sys

In [2]:
n_classes = 10
max_length = 10
n_instances = sum([reduce(lambda x, y: x*y, range(n_classes,n_classes-length,-1)) for length in range(2, max_length+1)])
n_instances

9864090

The following `X` will be our dataset (including training/validation/test sets).

In [3]:
help(sys.getsizeof)

Help on built-in function getsizeof in module sys:

getsizeof(...)
    getsizeof(object, default) -> int
    
    Return the size of object in bytes.



`3.9` billion bytes! That's more than `3GB`. Let's verify this number.

In [4]:
n_instances * max_length * n_classes * (32//8)

3945636000

**(?)** To dive even deeper: Where went the extra `128` bytes?

About right: The numbers are quite consistent.

By contrast, `tf.zeros` does not allocate the memory immediately, taking only a memory of `184` bytes.

In [8]:
!free -h

              total        used        free      shared  buff/cache   available
Mem:          3.8Gi       808Mi       2.2Gi       125Mi       789Mi       2.7Gi
Swap:         975Mi        55Mi       920Mi


In [7]:
X = tf.zeros((n_instances, max_length, n_classes), dtype=tf.float32)

ResourceExhaustedError: OOM when allocating tensor with shape[9864090,10,10] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator mklcpu [Op:Fill]

**(?)** Why my 4GB-RAM Thinkpad X61s still unable to allocate for this `X` using `tf`? Isn't that allocation just a mere 148 bytes?

In [None]:
sys.getsizeof(X)

In [None]:
# labels
Y = np.empty((n_instances, max_length), dtype=np.float32)  

In [None]:
%%time
#X[...] = 0
S = set(range(0, 9+1))
index_instance = 0
#for length in tqdm(range(2, max_length+1)):
for length in range(2, max_length+1):    
    n_permutations = factorial(length)
    #n_combinations = n_instances // n_permutations
    #for i, c in enumerate(combinations(S, length)):
    for c in combinations(S, length):
        #for j, p in enumerate(permutations(c)):
        for p in permutations(c):
            #print(f"(index_instance/n_instances = {index_instance}/{n_instances})", end="\r")
            #print(f"np.array(p) = {np.array(p)}")
            X[index_instance, :length, :] = one_hot(np.array(p))#[..., np.newaxis]
            Y[index_instance, :] = np.concatenate((np.argsort(p), np.arange(length, max_length)))
            index_instance += 1

### Train/Validation/Test Split

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train_val, X_test, Y_train_val, Y_test = train_test_split(X, Y, test_size=0.2)
X_train_val.shape, X_test.shape

## Model

We might be able to use less neurons and still arrive at a similar performance. Running out of time, I had not tried to tune the model; instead, I had spent most of the time trying to implement more solutions.

In [None]:
model = keras.models.load_model("vanilla_NN_model.h5")
model.summary()

In [None]:
model.evaluate(X_test, Y_test)

## Evaluation on `X_test`
We certainly would like to have performance measures like accuracy, precision/recall, etc. But we must first write some convenience functions to facilitate the operations.

In [None]:
class Sorter:
    def __init__(self, model):
        self.model = model

    def lenlen(self, x):
        somme = np.sum(x, axis=-1)
        first_zero_index = -1
        for i, s in enumerate(somme):
            if s > 10**(-6):
                first_zero_index = i
        if first_zero_index == -1:
            length = 10
        else:
            length = first_zero_index + 1
        return length

    def prettier(self, x, y):
        """
        x.shape = (10,10)
        """
        length = self.lenlen(x)
        xx = np.argmax(x[:length], axis=-1)
        sort_indices = y.astype(int)[:length]
        yy = xx[sort_indices]
        return xx, yy
    
    def evaluate(self, X, Y):
        Y_pred = self.model.predict(X)  # of shape (n_instances, 10, 10)
        Y = Y.astype(int)               # of shape (n_instances, 10)
        m = X.shape[0]
        n_correct = 0
        for i, x in enumerate(X):
            length = self.lenlen(x)
            y_pred = Y_pred[i]
            y_pred_sparse = np.argmax(y_pred, axis=-1)
            n_correct += np.array_equal(Y[i], y_pred_sparse)
        print(f"acc = {n_correct/m}")


In [None]:
sorter = Sorter(model)

In [None]:
%%time
sorter.evaluate(X_test, Y_test)