# Initializing a Dataset with Tabular Data

1. Initializing a Dataset with Tabular Data:
- Generate random tabular data for multiple scalars.
- Initialize a dataset with the tabular data.

2. Accessing and Manipulating Data in the Dataset:
- Retrieve and print the dataset and specific samples.
- Access and display the value of a particular scalar within a sample.
- Retrieve tabular data from the dataset based on scalar names.

This example demonstrates how to initialize a dataset with tabular data, access specific samples, retrieve scalar values, and extract tabular data based on scalar names.

In [2]:
# Import required libraries
import numpy as np

In [3]:
# Import necessary libraries and functions
from plaid.utils.init_with_tabular import initialize_dataset_with_tabular_data

In [4]:
# Print dict util
def dprint(name: str, dictio: dict):
    print(name, '{')
    for key, value in dictio.items():
	    print("    ", key, ':', value)

    print('}')

## Section 1: Initializing a Dataset with Tabular Data

In [5]:
# Generate random tabular data for multiple scalars
nb_scalars = 7
nb_samples = 10
names = [f"scalar_{j}" for j in range(nb_scalars)]

tabular_data = {}
for name in names:
    tabular_data[name] = np.random.randn(nb_samples)

dprint("tabular_data", tabular_data)

tabular_data {
     scalar_0 : [-0.19303611 -0.18965825  0.12534278  0.16366327 -1.06532803  0.16960836
 -0.50639747  0.35251503  1.61444411  0.20107186]
     scalar_1 : [-0.38257908 -0.82167722  1.23050277  1.17466345  1.22704241 -0.17093516
  0.30285162 -0.8562849  -1.27164055  0.34865076]
     scalar_2 : [ 2.26466948  0.77352161  1.82261031  0.08872893  0.39298522 -0.88340464
 -0.29684834  0.48175612 -1.86906676 -0.87729029]
     scalar_3 : [ 0.21884728 -0.7854321  -1.41677387 -0.89415003 -0.59955508 -0.65567448
 -0.98137585 -1.15201304 -1.28867388 -0.33766666]
     scalar_4 : [ 0.76753223  0.14741383  1.08377073  0.15641287 -0.69648491  0.0851449
 -0.64294282  2.56287175  0.52314472 -1.41328651]
     scalar_5 : [ 1.13479088 -0.65772577  0.71878731 -0.33928161  0.45507802 -0.16504924
 -1.05053809 -0.23645522 -2.18759612  1.12057703]
     scalar_6 : [-0.94867932 -0.61500724 -1.61546653  2.35936912 -0.20271597 -1.67890531
  0.45858461  1.73382506  0.71469664 -0.84691252]
}


In [6]:
# Initialize a dataset with the tabular data
dataset = initialize_dataset_with_tabular_data(tabular_data)
print("Initialized Dataset: ", dataset)

Initialized Dataset:  Dataset(10 samples, 7 scalars, 0 fields)


## Section 2: Accessing and Manipulating Data in the Dataset

In [7]:
# Retrieve and print the dataset and specific samples
sample_1 = dataset[1]
print(f"{sample_1 = }")


sample_1 = Sample(7 scalars, 0 timestamps, 0 fields, no tree)


In [8]:
# Access and display the value of a particular scalar within a sample
scalar_value = sample_1.get_scalar("scalar_0")
print("Scalar 'scalar_0' in Sample 1:", scalar_value)

Scalar 'scalar_0' in Sample 1: -0.1896582515164902


In [9]:
# Retrieve tabular data from the dataset based on scalar names
scalar_names = ["scalar_1", "scalar_3", "scalar_5"]
tabular_data_subset = dataset.get_scalars_to_tabular(scalar_names)
print("Tabular Data Subset for Scalars 1, 3, and 5:")
dprint("tabular_data_subset", tabular_data_subset)

Tabular Data Subset for Scalars 1, 3, and 5:
tabular_data_subset {
     scalar_1 : [-0.38257908 -0.82167722  1.23050277  1.17466345  1.22704241 -0.17093516
  0.30285162 -0.8562849  -1.27164055  0.34865076]
     scalar_3 : [ 0.21884728 -0.7854321  -1.41677387 -0.89415003 -0.59955508 -0.65567448
 -0.98137585 -1.15201304 -1.28867388 -0.33766666]
     scalar_5 : [ 1.13479088 -0.65772577  0.71878731 -0.33928161  0.45507802 -0.16504924
 -1.05053809 -0.23645522 -2.18759612  1.12057703]
}
