# Symbol-lab Dataset Playground
This notebook contains a tutorial playground for making changes to data generation settings and visualizing some samples. As was described [here](../README.md), Symbol-lab currently has the implementation for two environments (grid and controlled decoder datasets).
Note that the root `config.yaml` file in [configs/config.yaml](../configs/config.yaml) orchestrates how the final combined configuration of all modules should be.
The general template for each environment and its corresponding datasets is as follows: 
The root file `config.yaml` is read and from there any parameter can be overriden, either directly (if it is in `config.yaml`) or by going down the hierarchy through dot notation (`.`)

## 1.1. Grid Dataset (Constant number of elements)
Again as mentioned [here](../README.md), Grid dataset can contain either grids with a fixed number of objects, or grids with variable number of objects. Again one can either directly modify [`config.yaml`](../configs/config.yaml) or override the default parameters here below (with hydra's `override` argument) and generate samples from all splits.
For instance here we have overriden the number of columns and set it to 3 (and number of rows to 2). We did not change the default target class of datasets, so in this case all should have the same constant number of objects.

In [1]:
import sys
sys.path.append('../')

import hydra
from omegaconf import OmegaConf
configs_path = "../configs"
config_name = "config.yaml"
with hydra.initialize(config_path=configs_path):
    config = hydra.compose(config_name=config_name,
                           overrides=[f"data_dir=../",
                                      f"work_dir=../",
                                      f"datamodule=grid",
                                      f"datamodule.num_rows=2",
                                      f"datamodule.num_cols=2",
                                      f"datamodule.num_objects_to_place=3"
#                                       f"datamodule.dataset_parameters.test.dataset._target_=discrete_bottleneck.datamodule.grid_datasets.VariableElementCountGridDataset"
                                     ]
                          )
    datamodule = hydra.utils.instantiate(config.datamodule, _recursive_=False)
    datamodule.setup()
    
n = 5
print(f'{n} samples from train split: {[x[1] for x in datamodule.data_train.data[:n]]}')
print(f'{n} samples from valid split: {[x[1] for x in datamodule.data_val.data[:n]]}')
print(f'{n} samples from test split:  {[x[1] for x in datamodule.data_test.data[:n]]}')


5 samples from train split: ['1 1 0 2', '2 2 1 0', '2 2 0 2', '0 1 2 1', '1 1 1 0']
5 samples from valid split: ['1 0 2 1', '1 2 0 1', '0 2 2 2', '1 1 0 1', '2 1 1 0']
5 samples from test split:  ['2 1 0 2', '0 2 1 2', '0 2 2 2', '1 1 2 0', '2 2 1 0']


## 1.2. Grid Dataset (Variable number of elements)
However below, instead of having all the splits of our dataset to have a constant number of objects, we override the target class of test split. This way for instance, we can evaluate the generalization capabilities of our model (We could obviously override the target for `train` and `val` splits as well). We can do that in the following way, a) use [`grid.yaml`](../configs/datamodule/grid.yaml) and override the target class of test split. Additionally, since [`variable_element_count_grid.yaml`](../configs/datamodule/variable_element_count_grid.yaml) gets `min_num_objects_to_place` and `max_num_objects_to_place` as parameters, we need to provide them to test class. Note that this time, not all samples from test split have exactly 3 objects placed in the grid.

In [2]:
import sys
sys.path.append('../')


import hydra
from omegaconf import OmegaConf
configs_path = "../configs"
config_name = "config.yaml"
with hydra.initialize(config_path=configs_path):
    config = hydra.compose(config_name=config_name,
                           overrides=[f"data_dir=../",
                                      f"work_dir=../",
                                      f"datamodule=grid",
                                      f"datamodule.num_rows=2",
                                      f"datamodule.num_cols=2",
                                      f"datamodule.num_objects_to_place=3",
                                      f"datamodule.dataset_parameters.test.dataset._target_=discrete_bottleneck.datamodule.grid_datasets.VariableElementCountGridDataset",
                                      f"+datamodule.dataset_parameters.test.dataset.min_num_objects_to_place=0",
                                      f"+datamodule.dataset_parameters.test.dataset.max_num_objects_to_place=4"
                                     ]
                          )
    datamodule = hydra.utils.instantiate(config.datamodule, _recursive_=False)
    datamodule.setup()
    
n = 5
print(f'{n} samples from train split: {[x[1] for x in datamodule.data_train.data[:n]]}')
print(f'{n} samples from valid split: {[x[1] for x in datamodule.data_val.data[:n]]}')
print(f'{n} samples from test split:  {[x[1] for x in datamodule.data_test.data[:n]]}')


5 samples from train split: ['1 1 0 2', '2 2 1 0', '2 2 0 2', '0 1 2 1', '1 1 1 0']
5 samples from valid split: ['1 0 2 1', '1 2 0 1', '0 2 2 2', '1 1 0 1', '2 1 1 0']
5 samples from test split:  ['1 0 1 2', '0 2 1 0', '0 0 0 2', '1 0 2 2', '0 2 2 0']


## 2 Controlled Decoder
The properties of the discrete latent and decoder were mentioned [here](../README.md). Below we will instantiate datasets for several combinations of these properties and give several samples to illustrate the discrete latent structure and its decoding in each scenario. For stochastic vs. deterministic (S vs. D) and invertible vs. non-invertible (I vs. NI), we have created 4 config files and will use them below.

### 2.1 $f$ is deterministic, invertible
$f$ outputs a single real number with 2 possibilities (whenever possible, i.e. sometimes we need to increase either the number of possible values or the output length to keep the decoder invertible)

In [3]:
import sys
sys.path.append('../')

import hydra
from omegaconf import OmegaConf
configs_path = "../configs"
config_name = "config.yaml"



"""
Z is one discrete variable with 2 possible values.
"""

with hydra.initialize(config_path=configs_path):
    config = hydra.compose(config_name=config_name,
                           overrides=[f"data_dir=../",
                                      f"work_dir=../",
                                      f"datamodule=controlled_decoder_DI",
                                      f"datamodule.dataset_parameters.train.dataset.discrete_bottleneck_parameters.sequence_length=1",
                                      f"datamodule.dataset_parameters.train.dataset.discrete_bottleneck_parameters.num_possible_values=2",
                                      f"datamodule.dataset_parameters.train.dataset.decoder_parameters.output_length=1",
                                      f"datamodule.dataset_parameters.val.dataset.discrete_bottleneck_parameters.sequence_length=1",
                                      f"datamodule.dataset_parameters.val.dataset.discrete_bottleneck_parameters.num_possible_values=2",
                                      f"datamodule.dataset_parameters.val.dataset.decoder_parameters.output_length=1",
                                      f"datamodule.dataset_parameters.test.dataset.discrete_bottleneck_parameters.sequence_length=1",
                                      f"datamodule.dataset_parameters.test.dataset.discrete_bottleneck_parameters.num_possible_values=2",
                                      f"datamodule.dataset_parameters.test.dataset.decoder_parameters.output_length=1",
                                      ]
                          )
#     print(config.datamodule)
    datamodule = hydra.utils.instantiate(config.datamodule, _recursive_=False)
    datamodule.setup()
    
    
n = 4
# [["%.2f"%x for x in datamodule.data_train.sampled_latents[i][1]] for i in range(n)]
print('Scenario: \nf is deterministic, invertible, and outputs a single real number\nZ is one discrete variable with 2 possible values.\n')
print(f'\n{n} discrete latent samples from train split:\n {[[x for x in datamodule.data_train.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from train split:\n {[["%.2f"%x for x in datamodule.data_train.data[i][1]] for i in range(n)]}')
print(f'\n{n} discrete latent samples from valid split:\n {[[x for x in datamodule.data_val.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from valid split:\n {[["%.2f"%x for x in datamodule.data_val.data[i][1]] for i in range(n)]}')
print(f'\n{n} discrete latent samples from test split:\n {[[x for x in datamodule.data_test.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from test split: \n {[["%.2f"%x for x in datamodule.data_test.data[i][1]] for i in range(n)]}')



"""
Z is a sequence of discrete variables each with > 2 possible values. Again for this case to remain invertible, we need to increase the number of possible outputs, either by
increasing the possible output values but keeping its sequence length to 1, or increasing the sequence length but keeping the possible values to 2.
"""

with hydra.initialize(config_path=configs_path):
    config = hydra.compose(config_name=config_name,
                           overrides=[f"data_dir=../",
                                      f"work_dir=../",
                                      f"datamodule=controlled_decoder_DI",
                                      f"datamodule.dataset_parameters.train.dataset.discrete_bottleneck_parameters.sequence_length=1",
                                      f"datamodule.dataset_parameters.train.dataset.discrete_bottleneck_parameters.num_possible_values=4",
                                      f"datamodule.dataset_parameters.train.dataset.decoder_parameters.output_length=3",
                                      f"datamodule.dataset_parameters.val.dataset.discrete_bottleneck_parameters.sequence_length=1",
                                      f"datamodule.dataset_parameters.val.dataset.discrete_bottleneck_parameters.num_possible_values=4",
                                      f"datamodule.dataset_parameters.val.dataset.decoder_parameters.output_length=3",
                                      f"datamodule.dataset_parameters.test.dataset.discrete_bottleneck_parameters.sequence_length=1",
                                      f"datamodule.dataset_parameters.test.dataset.discrete_bottleneck_parameters.num_possible_values=4",
                                      f"datamodule.dataset_parameters.test.dataset.decoder_parameters.output_length=3",
                                      ]
                          )
#     print(config.datamodule)
    datamodule = hydra.utils.instantiate(config.datamodule, _recursive_=False)
    datamodule.setup()
    
    

    
print('-------------------------------------------------------------------------------')
n = 4
# [["%.2f"%x for x in datamodule.data_train.sampled_latents[i][1]] for i in range(n)]
print('Scenario: \nf is deterministic, invertible, and outputs a sequence\nZ is one discrete variable with > 2 possible values.\n')
print(f'\n{n} discrete latent samples from train split:\n {[[x for x in datamodule.data_train.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from train split:\n {[["%.2f"%x for x in datamodule.data_train.data[i][1]] for i in range(n)]}')
print(f'\n{n} discrete latent samples from valid split:\n {[[x for x in datamodule.data_val.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from valid split:\n {[["%.2f"%x for x in datamodule.data_val.data[i][1]] for i in range(n)]}')
print(f'\n{n} discrete latent samples from test split:\n {[[x for x in datamodule.data_test.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from test split: \n {[["%.2f"%x for x in datamodule.data_test.data[i][1]] for i in range(n)]}')


"""
Z is one discrete variable with > 2 possible values. For this case to remain invertible, we need to increase the number of possible outputs, either by
increasing the possible output values but keeping its sequence length to 1, or increasing the sequence length but keeping the possible values to 2.
"""

    
with hydra.initialize(config_path=configs_path):
    config = hydra.compose(config_name=config_name,
                           overrides=[f"data_dir=../",
                                      f"work_dir=../",
                                      f"datamodule=controlled_decoder_DI",
                                      f"datamodule.dataset_parameters.train.dataset.discrete_bottleneck_parameters.sequence_length=3",
                                      f"datamodule.dataset_parameters.train.dataset.discrete_bottleneck_parameters.num_possible_values=3",
                                      f"datamodule.dataset_parameters.train.dataset.decoder_parameters.output_length=5",
                                      f"datamodule.dataset_parameters.val.dataset.discrete_bottleneck_parameters.sequence_length=3",
                                      f"datamodule.dataset_parameters.val.dataset.discrete_bottleneck_parameters.num_possible_values=3",
                                      f"datamodule.dataset_parameters.val.dataset.decoder_parameters.output_length=5",
                                      f"datamodule.dataset_parameters.test.dataset.discrete_bottleneck_parameters.sequence_length=3",
                                      f"datamodule.dataset_parameters.test.dataset.discrete_bottleneck_parameters.num_possible_values=3",
                                      f"datamodule.dataset_parameters.test.dataset.decoder_parameters.output_length=5",
                                      ]
                          )
#     print(config.datamodule)
    datamodule = hydra.utils.instantiate(config.datamodule, _recursive_=False)
    datamodule.setup()
    
    
print('-------------------------------------------------------------------------------')
n = 4
# [["%.2f"%x for x in datamodule.data_train.sampled_latents[i][1]] for i in range(n)]
print('Scenario: \nf is deterministic, invertible, and outputs a sequence\nZ is a sequence of discrete variables each with > 2 possible values.\n')
print(f'\n{n} discrete latent samples from train split:\n {[[x for x in datamodule.data_train.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from train split:\n {[["%.2f"%x for x in datamodule.data_train.data[i][1]] for i in range(n)]}')
print(f'\n{n} discrete latent samples from valid split:\n {[[x for x in datamodule.data_val.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from valid split:\n {[["%.2f"%x for x in datamodule.data_val.data[i][1]] for i in range(n)]}')
print(f'\n{n} discrete latent samples from test split:\n {[[x for x in datamodule.data_test.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from test split: \n {[["%.2f"%x for x in datamodule.data_test.data[i][1]] for i in range(n)]}')




Scenario: 
f is deterministic, invertible, and outputs a single real number
Z is one discrete variable with 2 possible values.


4 discrete latent samples from train split:
 [[1], [1], [0], [0]]
4 corresponding samples from train split:
 [['1.00'], ['1.00'], ['0.00'], ['0.00']]

4 discrete latent samples from valid split:
 [[0], [1], [0], [1]]
4 corresponding samples from valid split:
 [['0.00'], ['1.00'], ['0.00'], ['1.00']]

4 discrete latent samples from test split:
 [[1], [0], [1], [1]]
4 corresponding samples from test split: 
 [['1.00'], ['0.00'], ['1.00'], ['1.00']]
-------------------------------------------------------------------------------
Scenario: 
f is deterministic, invertible, and outputs a sequence
Z is one discrete variable with > 2 possible values.


4 discrete latent samples from train split:
 [[0], [2], [3], [2]]
4 corresponding samples from train split:
 [['0.00', '1.00', '1.00'], ['0.00', '0.00', '0.00'], ['1.00', '1.00', '1.00'], ['0.00', '0.00', '0.00']]

4 di

### 2.2 $f$ is deterministic, non-invertible


In [4]:
import sys
sys.path.append('../')

import hydra
from omegaconf import OmegaConf
configs_path = "../configs"
config_name = "config.yaml"



"""
Z is one discrete variable with 2 possible values.
"""

with hydra.initialize(config_path=configs_path):
    config = hydra.compose(config_name=config_name,
                           overrides=[f"data_dir=../",
                                      f"work_dir=../",
                                      f"datamodule=controlled_decoder_DNI",
                                      f"datamodule.dataset_parameters.train.dataset.discrete_bottleneck_parameters.sequence_length=1",
                                      f"datamodule.dataset_parameters.train.dataset.discrete_bottleneck_parameters.num_possible_values=2",
                                      f"datamodule.dataset_parameters.train.dataset.decoder_parameters.output_length=1",
                                      f"datamodule.dataset_parameters.val.dataset.discrete_bottleneck_parameters.sequence_length=1",
                                      f"datamodule.dataset_parameters.val.dataset.discrete_bottleneck_parameters.num_possible_values=2",
                                      f"datamodule.dataset_parameters.val.dataset.decoder_parameters.output_length=1",
                                      f"datamodule.dataset_parameters.test.dataset.discrete_bottleneck_parameters.sequence_length=1",
                                      f"datamodule.dataset_parameters.test.dataset.discrete_bottleneck_parameters.num_possible_values=2",
                                      f"datamodule.dataset_parameters.test.dataset.decoder_parameters.output_length=1",
                                      ]
                          )
#     print(config.datamodule)
    datamodule = hydra.utils.instantiate(config.datamodule, _recursive_=False)
    datamodule.setup()
    
    
n = 4
# [["%.2f"%x for x in datamodule.data_train.sampled_latents[i][1]] for i in range(n)]
print('Scenario: \nf is deterministic, non-invertible, and outputs a single real number\nZ is one discrete variable with 2 possible values.\n')
print(f'\n{n} discrete latent samples from train split:\n {[[x for x in datamodule.data_train.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from train split:\n {[["%.2f"%x for x in datamodule.data_train.data[i][1]] for i in range(n)]}')
print(f'\n{n} discrete latent samples from valid split:\n {[[x for x in datamodule.data_val.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from valid split:\n {[["%.2f"%x for x in datamodule.data_val.data[i][1]] for i in range(n)]}')
print(f'\n{n} discrete latent samples from test split:\n {[[x for x in datamodule.data_test.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from test split: \n {[["%.2f"%x for x in datamodule.data_test.data[i][1]] for i in range(n)]}')



"""
Z is a sequence of discrete variables each with > 2 possible values. Here we don't need to keep the output possibilities large because we
do not have invertibility.
"""

with hydra.initialize(config_path=configs_path):
    config = hydra.compose(config_name=config_name,
                           overrides=[f"data_dir=../",
                                      f"work_dir=../",
                                      f"datamodule=controlled_decoder_DNI",
                                      f"datamodule.dataset_parameters.train.dataset.discrete_bottleneck_parameters.sequence_length=1",
                                      f"datamodule.dataset_parameters.train.dataset.discrete_bottleneck_parameters.num_possible_values=4",
                                      f"datamodule.dataset_parameters.train.dataset.decoder_parameters.output_length=3",
                                      f"datamodule.dataset_parameters.val.dataset.discrete_bottleneck_parameters.sequence_length=1",
                                      f"datamodule.dataset_parameters.val.dataset.discrete_bottleneck_parameters.num_possible_values=4",
                                      f"datamodule.dataset_parameters.val.dataset.decoder_parameters.output_length=3",
                                      f"datamodule.dataset_parameters.test.dataset.discrete_bottleneck_parameters.sequence_length=1",
                                      f"datamodule.dataset_parameters.test.dataset.discrete_bottleneck_parameters.num_possible_values=4",
                                      f"datamodule.dataset_parameters.test.dataset.decoder_parameters.output_length=3",
                                      ]
                          )
#     print(config.datamodule)
    datamodule = hydra.utils.instantiate(config.datamodule, _recursive_=False)
    datamodule.setup()
    
    

    
print('-------------------------------------------------------------------------------')
n = 4
# [["%.2f"%x for x in datamodule.data_train.sampled_latents[i][1]] for i in range(n)]
print('Scenario: \nf is deterministic, non-invertible, and outputs a sequence\nZ is one discrete variable with > 2 possible values.\n')
print(f'\n{n} discrete latent samples from train split:\n {[[x for x in datamodule.data_train.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from train split:\n {[["%.2f"%x for x in datamodule.data_train.data[i][1]] for i in range(n)]}')
print(f'\n{n} discrete latent samples from valid split:\n {[[x for x in datamodule.data_val.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from valid split:\n {[["%.2f"%x for x in datamodule.data_val.data[i][1]] for i in range(n)]}')
print(f'\n{n} discrete latent samples from test split:\n {[[x for x in datamodule.data_test.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from test split: \n {[["%.2f"%x for x in datamodule.data_test.data[i][1]] for i in range(n)]}')


"""
Z is one discrete variable with > 2 possible values. Again here we don't need to keep the output possibilities large because we
do not have invertibility
"""

    
with hydra.initialize(config_path=configs_path):
    config = hydra.compose(config_name=config_name,
                           overrides=[f"data_dir=../",
                                      f"work_dir=../",
                                      f"datamodule=controlled_decoder_DNI",
                                      f"datamodule.dataset_parameters.train.dataset.discrete_bottleneck_parameters.sequence_length=2",
                                      f"datamodule.dataset_parameters.train.dataset.discrete_bottleneck_parameters.num_possible_values=2",
                                      f"datamodule.dataset_parameters.train.dataset.decoder_parameters.output_length=3",
                                      f"datamodule.dataset_parameters.val.dataset.discrete_bottleneck_parameters.sequence_length=2",
                                      f"datamodule.dataset_parameters.val.dataset.discrete_bottleneck_parameters.num_possible_values=2",
                                      f"datamodule.dataset_parameters.val.dataset.decoder_parameters.output_length=3",
                                      f"datamodule.dataset_parameters.test.dataset.discrete_bottleneck_parameters.sequence_length=2",
                                      f"datamodule.dataset_parameters.test.dataset.discrete_bottleneck_parameters.num_possible_values=2",
                                      f"datamodule.dataset_parameters.test.dataset.decoder_parameters.output_length=3",
                                      ]
                          )
#     print(config.datamodule)
    datamodule = hydra.utils.instantiate(config.datamodule, _recursive_=False)
    datamodule.setup()
    
    
print('-------------------------------------------------------------------------------')
n = 4
# [["%.2f"%x for x in datamodule.data_train.sampled_latents[i][1]] for i in range(n)]
print('Scenario: \nf is deterministic, non-invertible, and outputs a sequence\nZ is a sequence of discrete variables each with > 2 possible values.\n')
print(f'\n{n} discrete latent samples from train split:\n {[[x for x in datamodule.data_train.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from train split:\n {[["%.2f"%x for x in datamodule.data_train.data[i][1]] for i in range(n)]}')
print(f'\n{n} discrete latent samples from valid split:\n {[[x for x in datamodule.data_val.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from valid split:\n {[["%.2f"%x for x in datamodule.data_val.data[i][1]] for i in range(n)]}')
print(f'\n{n} discrete latent samples from test split:\n {[[x for x in datamodule.data_test.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from test split: \n {[["%.2f"%x for x in datamodule.data_test.data[i][1]] for i in range(n)]}')


Scenario: 
f is deterministic, non-invertible, and outputs a single real number
Z is one discrete variable with 2 possible values.


4 discrete latent samples from train split:
 [[0], [1], [1], [1]]
4 corresponding samples from train split:
 [['1.00'], ['1.00'], ['1.00'], ['1.00']]

4 discrete latent samples from valid split:
 [[0], [0], [0], [0]]
4 corresponding samples from valid split:
 [['1.00'], ['1.00'], ['1.00'], ['1.00']]

4 discrete latent samples from test split:
 [[1], [0], [1], [0]]
4 corresponding samples from test split: 
 [['1.00'], ['1.00'], ['1.00'], ['1.00']]
-------------------------------------------------------------------------------
Scenario: 
f is deterministic, non-invertible, and outputs a sequence
Z is one discrete variable with > 2 possible values.


4 discrete latent samples from train split:
 [[2], [2], [2], [1]]
4 corresponding samples from train split:
 [['1.00', '0.00', '0.00'], ['1.00', '0.00', '0.00'], ['1.00', '0.00', '0.00'], ['0.00', '1.00', '0.00'

### 2.2 $f$ is stochastic, invertible

In [5]:
import sys
sys.path.append('../')

import hydra
from omegaconf import OmegaConf
configs_path = "../configs"
config_name = "config.yaml"



"""
Z is one discrete variable with 2 possible values.
"""

with hydra.initialize(config_path=configs_path):
    config = hydra.compose(config_name=config_name,
                           overrides=[f"data_dir=../",
                                      f"work_dir=../",
                                      f"datamodule=controlled_decoder_SI",
                                      f"datamodule.dataset_parameters.train.dataset.discrete_bottleneck_parameters.sequence_length=1",
                                      f"datamodule.dataset_parameters.train.dataset.discrete_bottleneck_parameters.num_possible_values=2",
                                      f"datamodule.dataset_parameters.train.dataset.decoder_parameters.output_length=1",
                                      f"datamodule.dataset_parameters.val.dataset.discrete_bottleneck_parameters.sequence_length=1",
                                      f"datamodule.dataset_parameters.val.dataset.discrete_bottleneck_parameters.num_possible_values=2",
                                      f"datamodule.dataset_parameters.val.dataset.decoder_parameters.output_length=1",
                                      f"datamodule.dataset_parameters.test.dataset.discrete_bottleneck_parameters.sequence_length=1",
                                      f"datamodule.dataset_parameters.test.dataset.discrete_bottleneck_parameters.num_possible_values=2",
                                      f"datamodule.dataset_parameters.test.dataset.decoder_parameters.output_length=1",
                                      ]
                          )
#     print(config.datamodule)
    datamodule = hydra.utils.instantiate(config.datamodule, _recursive_=False)
    datamodule.setup()
    
    
n = 4
# [["%.2f"%x for x in datamodule.data_train.sampled_latents[i][1]] for i in range(n)]
print('Scenario: \nf is stochastic, invertible, and outputs a single real number\nZ is one discrete variable with 2 possible values.\n')
print(f'\n{n} discrete latent samples from train split:\n {[[x for x in datamodule.data_train.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from train split:\n {[["%.2f"%x for x in datamodule.data_train.data[i][1]] for i in range(n)]}')
print(f'\n{n} discrete latent samples from valid split:\n {[[x for x in datamodule.data_val.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from valid split:\n {[["%.2f"%x for x in datamodule.data_val.data[i][1]] for i in range(n)]}')
print(f'\n{n} discrete latent samples from test split:\n {[[x for x in datamodule.data_test.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from test split: \n {[["%.2f"%x for x in datamodule.data_test.data[i][1]] for i in range(n)]}')



"""
Z is a sequence of discrete variables each with > 2 possible values. Again for this case to remain invertible, we need to increase the number of possible outputs, either by
increasing the possible output values but keeping its sequence length to 1, or increasing the sequence length but keeping the possible values to 2.
"""

with hydra.initialize(config_path=configs_path):
    config = hydra.compose(config_name=config_name,
                           overrides=[f"data_dir=../",
                                      f"work_dir=../",
                                      f"datamodule=controlled_decoder_SI",
                                      f"datamodule.dataset_parameters.train.dataset.discrete_bottleneck_parameters.sequence_length=1",
                                      f"datamodule.dataset_parameters.train.dataset.discrete_bottleneck_parameters.num_possible_values=4",
                                      f"datamodule.dataset_parameters.train.dataset.decoder_parameters.output_length=3",
                                      f"datamodule.dataset_parameters.val.dataset.discrete_bottleneck_parameters.sequence_length=1",
                                      f"datamodule.dataset_parameters.val.dataset.discrete_bottleneck_parameters.num_possible_values=4",
                                      f"datamodule.dataset_parameters.val.dataset.decoder_parameters.output_length=3",
                                      f"datamodule.dataset_parameters.test.dataset.discrete_bottleneck_parameters.sequence_length=1",
                                      f"datamodule.dataset_parameters.test.dataset.discrete_bottleneck_parameters.num_possible_values=4",
                                      f"datamodule.dataset_parameters.test.dataset.decoder_parameters.output_length=3",
                                      ]
                          )
#     print(config.datamodule)
    datamodule = hydra.utils.instantiate(config.datamodule, _recursive_=False)
    datamodule.setup()
    
    

    
print('-------------------------------------------------------------------------------')
n = 4
# [["%.2f"%x for x in datamodule.data_train.sampled_latents[i][1]] for i in range(n)]
print('Scenario: \nf is stochastic, invertible, and outputs a sequence\nZ is one discrete variable with > 2 possible values.\n')
print(f'\n{n} discrete latent samples from train split:\n {[[x for x in datamodule.data_train.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from train split:\n {[["%.2f"%x for x in datamodule.data_train.data[i][1]] for i in range(n)]}')
print(f'\n{n} discrete latent samples from valid split:\n {[[x for x in datamodule.data_val.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from valid split:\n {[["%.2f"%x for x in datamodule.data_val.data[i][1]] for i in range(n)]}')
print(f'\n{n} discrete latent samples from test split:\n {[[x for x in datamodule.data_test.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from test split: \n {[["%.2f"%x for x in datamodule.data_test.data[i][1]] for i in range(n)]}')


"""
Z is one discrete variable with > 2 possible values. For this case to remain invertible, we need to increase the number of possible outputs, either by
increasing the possible output values but keeping its sequence length to 1, or increasing the sequence length but keeping the possible values to 2.
"""

    
with hydra.initialize(config_path=configs_path):
    config = hydra.compose(config_name=config_name,
                           overrides=[f"data_dir=../",
                                      f"work_dir=../",
                                      f"datamodule=controlled_decoder_SI",
                                      f"datamodule.dataset_parameters.train.dataset.discrete_bottleneck_parameters.sequence_length=2",
                                      f"datamodule.dataset_parameters.train.dataset.discrete_bottleneck_parameters.num_possible_values=2",
                                      f"datamodule.dataset_parameters.train.dataset.decoder_parameters.output_length=3",
                                      f"datamodule.dataset_parameters.val.dataset.discrete_bottleneck_parameters.sequence_length=2",
                                      f"datamodule.dataset_parameters.val.dataset.discrete_bottleneck_parameters.num_possible_values=2",
                                      f"datamodule.dataset_parameters.val.dataset.decoder_parameters.output_length=3",
                                      f"datamodule.dataset_parameters.test.dataset.discrete_bottleneck_parameters.sequence_length=2",
                                      f"datamodule.dataset_parameters.test.dataset.discrete_bottleneck_parameters.num_possible_values=2",
                                      f"datamodule.dataset_parameters.test.dataset.decoder_parameters.output_length=3",
                                      ]
                          )
#     print(config.datamodule)
    datamodule = hydra.utils.instantiate(config.datamodule, _recursive_=False)
    datamodule.setup()
    
    
print('-------------------------------------------------------------------------------')
n = 4
# [["%.2f"%x for x in datamodule.data_train.sampled_latents[i][1]] for i in range(n)]
print('Scenario: \nf is stochastic, invertible, and outputs a sequence\nZ is a sequence of discrete variables each with > 2 possible values.\n')
print(f'\n{n} discrete latent samples from train split:\n {[[x for x in datamodule.data_train.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from train split:\n {[["%.2f"%x for x in datamodule.data_train.data[i][1]] for i in range(n)]}')
print(f'\n{n} discrete latent samples from valid split:\n {[[x for x in datamodule.data_val.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from valid split:\n {[["%.2f"%x for x in datamodule.data_val.data[i][1]] for i in range(n)]}')
print(f'\n{n} discrete latent samples from test split:\n {[[x for x in datamodule.data_test.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from test split: \n {[["%.2f"%x for x in datamodule.data_test.data[i][1]] for i in range(n)]}')


Scenario: 
f is stochastic, invertible, and outputs a single real number
Z is one discrete variable with 2 possible values.


4 discrete latent samples from train split:
 [[1], [1], [0], [1]]
4 corresponding samples from train split:
 [['0.85'], ['0.95'], ['-0.52'], ['0.61']]

4 discrete latent samples from valid split:
 [[1], [1], [1], [1]]
4 corresponding samples from valid split:
 [['0.50'], ['0.77'], ['0.99'], ['0.93']]

4 discrete latent samples from test split:
 [[1], [1], [1], [1]]
4 corresponding samples from test split: 
 [['0.68'], ['0.92'], ['0.73'], ['0.59']]
-------------------------------------------------------------------------------
Scenario: 
f is stochastic, invertible, and outputs a sequence
Z is one discrete variable with > 2 possible values.


4 discrete latent samples from train split:
 [[3], [2], [0], [0]]
4 corresponding samples from train split:
 [['1.74', '1.88', '-0.68'], ['1.72', '0.74', '-0.94'], ['-0.98', '-0.87', '1.56'], ['-0.55', '-0.73', '1.97']]

4 d

### 2.2 $f$ is stochastic, non-invertible

In [1]:
import sys
sys.path.append('../')

import hydra
from omegaconf import OmegaConf
configs_path = "../configs"
config_name = "config.yaml"



"""
Z is one discrete variable with 2 possible values.
"""

with hydra.initialize(config_path=configs_path):
    config = hydra.compose(config_name=config_name,
                           overrides=[f"data_dir=../",
                                      f"work_dir=../",
                                      f"datamodule=controlled_decoder_SNI",
                                      f"datamodule.dataset_parameters.train.dataset.discrete_bottleneck_parameters.sequence_length=1",
                                      f"datamodule.dataset_parameters.train.dataset.discrete_bottleneck_parameters.num_possible_values=2",
                                      f"datamodule.dataset_parameters.train.dataset.decoder_parameters.output_length=1",
                                      f"datamodule.dataset_parameters.val.dataset.discrete_bottleneck_parameters.sequence_length=1",
                                      f"datamodule.dataset_parameters.val.dataset.discrete_bottleneck_parameters.num_possible_values=2",
                                      f"datamodule.dataset_parameters.val.dataset.decoder_parameters.output_length=1",
                                      f"datamodule.dataset_parameters.test.dataset.discrete_bottleneck_parameters.sequence_length=1",
                                      f"datamodule.dataset_parameters.test.dataset.discrete_bottleneck_parameters.num_possible_values=2",
                                      f"datamodule.dataset_parameters.test.dataset.decoder_parameters.output_length=1",
                                      ]
                          )
#     print(config.datamodule)
    datamodule = hydra.utils.instantiate(config.datamodule, _recursive_=False)
    datamodule.setup()
    
    
n = 4
# [["%.2f"%x for x in datamodule.data_train.sampled_latents[i][1]] for i in range(n)]
print('Scenario: \nf is stochastic, non-invertible, and outputs a single real number\nZ is one discrete variable with 2 possible values.\n')
print(f'\n{n} discrete latent samples from train split:\n {[[x for x in datamodule.data_train.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from train split:\n {[["%.2f"%x for x in datamodule.data_train.data[i][1]] for i in range(n)]}')
print(f'\n{n} discrete latent samples from valid split:\n {[[x for x in datamodule.data_val.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from valid split:\n {[["%.2f"%x for x in datamodule.data_val.data[i][1]] for i in range(n)]}')
print(f'\n{n} discrete latent samples from test split:\n {[[x for x in datamodule.data_test.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from test split: \n {[["%.2f"%x for x in datamodule.data_test.data[i][1]] for i in range(n)]}')



"""
Z is a sequence of discrete variables each with > 2 possible values. Here we don't need to keep the output possibilities large because we
do not have invertibility.
"""

with hydra.initialize(config_path=configs_path):
    config = hydra.compose(config_name=config_name,
                           overrides=[f"data_dir=../",
                                      f"work_dir=../",
                                      f"datamodule=controlled_decoder_SNI",
                                      f"datamodule.dataset_parameters.train.dataset.discrete_bottleneck_parameters.sequence_length=1",
                                      f"datamodule.dataset_parameters.train.dataset.discrete_bottleneck_parameters.num_possible_values=4",
                                      f"datamodule.dataset_parameters.train.dataset.decoder_parameters.output_length=3",
                                      f"datamodule.dataset_parameters.val.dataset.discrete_bottleneck_parameters.sequence_length=1",
                                      f"datamodule.dataset_parameters.val.dataset.discrete_bottleneck_parameters.num_possible_values=4",
                                      f"datamodule.dataset_parameters.val.dataset.decoder_parameters.output_length=3",
                                      f"datamodule.dataset_parameters.test.dataset.discrete_bottleneck_parameters.sequence_length=1",
                                      f"datamodule.dataset_parameters.test.dataset.discrete_bottleneck_parameters.num_possible_values=4",
                                      f"datamodule.dataset_parameters.test.dataset.decoder_parameters.output_length=3",
                                      ]
                          )
#     print(config.datamodule)
    datamodule = hydra.utils.instantiate(config.datamodule, _recursive_=False)
    datamodule.setup()
    
    

    
print('-------------------------------------------------------------------------------')
n = 4
# [["%.2f"%x for x in datamodule.data_train.sampled_latents[i][1]] for i in range(n)]
print('Scenario: \nf is stochastic, non-invertible, and outputs a sequence\nZ is one discrete variable with > 2 possible values.\n')
print(f'\n{n} discrete latent samples from train split:\n {[[x for x in datamodule.data_train.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from train split:\n {[["%.2f"%x for x in datamodule.data_train.data[i][1]] for i in range(n)]}')
print(f'\n{n} discrete latent samples from valid split:\n {[[x for x in datamodule.data_val.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from valid split:\n {[["%.2f"%x for x in datamodule.data_val.data[i][1]] for i in range(n)]}')
print(f'\n{n} discrete latent samples from test split:\n {[[x for x in datamodule.data_test.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from test split: \n {[["%.2f"%x for x in datamodule.data_test.data[i][1]] for i in range(n)]}')


"""
Z is one discrete variable with > 2 possible values. Again here we don't need to keep the output possibilities large because we
do not have invertibility
"""

    
with hydra.initialize(config_path=configs_path):
    config = hydra.compose(config_name=config_name,
                           overrides=[f"data_dir=../",
                                      f"work_dir=../",
                                      f"datamodule=controlled_decoder_SNI",
                                      f"datamodule.dataset_parameters.train.dataset.discrete_bottleneck_parameters.sequence_length=2",
                                      f"datamodule.dataset_parameters.train.dataset.discrete_bottleneck_parameters.num_possible_values=2",
                                      f"datamodule.dataset_parameters.train.dataset.decoder_parameters.output_length=3",
                                      f"datamodule.dataset_parameters.val.dataset.discrete_bottleneck_parameters.sequence_length=2",
                                      f"datamodule.dataset_parameters.val.dataset.discrete_bottleneck_parameters.num_possible_values=2",
                                      f"datamodule.dataset_parameters.val.dataset.decoder_parameters.output_length=3",
                                      f"datamodule.dataset_parameters.test.dataset.discrete_bottleneck_parameters.sequence_length=2",
                                      f"datamodule.dataset_parameters.test.dataset.discrete_bottleneck_parameters.num_possible_values=2",
                                      f"datamodule.dataset_parameters.test.dataset.decoder_parameters.output_length=3",
                                      ]
                          )
#     print(config.datamodule)
    datamodule = hydra.utils.instantiate(config.datamodule, _recursive_=False)
    datamodule.setup()
    
    
print('-------------------------------------------------------------------------------')
n = 4
# [["%.2f"%x for x in datamodule.data_train.sampled_latents[i][1]] for i in range(n)]
print('Scenario: \nf is stochastic, non-invertible, and outputs a sequence\nZ is a sequence of discrete variables each with > 2 possible values.\n')
print(f'\n{n} discrete latent samples from train split:\n {[[x for x in datamodule.data_train.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from train split:\n {[["%.2f"%x for x in datamodule.data_train.data[i][1]] for i in range(n)]}')
print(f'\n{n} discrete latent samples from valid split:\n {[[x for x in datamodule.data_val.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from valid split:\n {[["%.2f"%x for x in datamodule.data_val.data[i][1]] for i in range(n)]}')
print(f'\n{n} discrete latent samples from test split:\n {[[x for x in datamodule.data_test.sampled_latents[i][1]] for i in range(n)]}')
print(f'{n} corresponding samples from test split: \n {[["%.2f"%x for x in datamodule.data_test.data[i][1]] for i in range(n)]}')


Scenario: 
f is stochastic, non-invertible, and outputs a single real number
Z is one discrete variable with 2 possible values.


4 discrete latent samples from train split:
 [[1], [0], [1], [1]]
4 corresponding samples from train split:
 [['-0.65'], ['-0.58'], ['-0.70'], ['-0.97']]

4 discrete latent samples from valid split:
 [[0], [0], [0], [1]]
4 corresponding samples from valid split:
 [['-0.66'], ['-0.77'], ['-0.52'], ['-0.59']]

4 discrete latent samples from test split:
 [[0], [0], [1], [1]]
4 corresponding samples from test split: 
 [['-0.96'], ['-0.91'], ['-0.82'], ['-0.56']]
-------------------------------------------------------------------------------
Scenario: 
f is stochastic, non-invertible, and outputs a sequence
Z is one discrete variable with > 2 possible values.


4 discrete latent samples from train split:
 [[1], [1], [0], [3]]
4 corresponding samples from train split:
 [['0.70', '0.89', '1.78'], ['0.80', '0.60', '1.68'], ['0.56', '1.54', '-0.73'], ['0.65', '1.52',