# Dataset Creation

Package for creating synthetic PIV images:

In [None]:
import synpivimage as spi 
spi.__version__

In [None]:
import h5rdmtoolbox as h5tbx
h5tbx.__version__

For the notebook we retrieve the dataset directory from the dot-env file:

In [None]:
from dotenv import load_dotenv
import os
import numpy as np

load_dotenv('generation.env')
dataset_dir = os.getenv('dataset_dir')
dataset_dir

## 1. Define an inital configuration
The initial configuration shall be the default one of the synpivimage package. From there, we adjust the parameters.<br>
Note, that the image width/height is chosen to be odd. This is recommended when using the auto-correlation feature later on.<br>
Also note, that the default configuration **disabled noise generation** since it always can be added afterwards. 

In [None]:
cfg = spi.DEFAULT_CFG

cfg['bit_depth'] = 8
cfg['nx'] = 32
cfg['ny'] = 32
cfg['sensor_gain'] = 0.6
cfg['particle_size_std'] = 1
image_size = cfg['nx']*cfg['ny']
cfg

## 2. Defining parameters to be varied

Create tuples with the configuration name and the parameters. The parameter values can be a number only, a list or a numpy array (Obviously passing a number is equal to not specify the variation tuple at all).<br>
Last we specify how many images per combinations will be generated with `n_per_combination`

In [None]:
# particle_number_range = ('particle_number', np.arange(1, image_size*0.1, 10).astype(int))
particle_number_range = ('particle_number', int(0.1*30*30))
particle_mean_size_range = ('particle_size_mean', (2, 3))
lasershape_range = ('laser_shape_factor', (1,10))

n_per_combination = 101

## 3. Generate a list of all config combinations:
Before we can write the data to disk, we generate a `ConfigManager`. This class is initalize by passing a list of all configuration dictionaries that define each image generation. Thus we need 101 configuration dictionaries. Using `build_ConfgManger()` we retrieve the `ConfigManger`-instance by passing the inital config dictionary and the variation tuples from above (so we don't need to deal with building the man config dictionaries ourselves):

In [None]:
CFGs = spi.build_ConfigManager(cfg, [particle_number_range,
                                     particle_mean_size_range,
                                     lasershape_range],
                              per_combination = n_per_combination,
                              shuffle=True)
CFGs

## 4. Write all data to HDF5 files:
With a large amount of parameters the image generation may produce arrays that will blow your RAM. Therefore define the maximal number of images to be created before written to the file. This parameter is called `n_split`. If using `None`, all data is written into a single file. If you pass any integer number, then that number of files will be created. Filenames will look like `<dataset_dir>/ds_000001.hdf` and so on.

In [None]:
%%time
hdf_filename = CFGs.to_hdf(dataset_dir, nproc=1, n_split=1000, overwrite=True)

Let's check if all datasets have a correct `standard_name` based on the Standard Name Table on GitLab: https://git.scc.kit.edu/standard_name_tables/snt_particleimagevelocimetry

In [None]:
import pint

In [None]:
ureg = pint.UnitRegistry()

In [None]:
ureg.pixel

In [None]:
snt = h5tbx.conventions.cflike.standard_name.StandardNameTable.from_gitlab(url='https://git.scc.kit.edu',
                                                                           file_path='particle_image_velocimetry-v1.yaml',
                                                                           project_id='35942',
                                                                           ref_name='main')

In [None]:
snt.check_file(hdf_filename[0], raise_error=True)

In [None]:
with h5tbx.H5File(hdf_filename[0], 'r') as h5:
    h5.dump()
    h5['images'][0, :, :].plot(cmap='gray')

In [None]:
with h5tbx.H5File(hdf_filename[0], 'r') as h5:
    h5['images'][0, :, :].plot.hist(bins=40, xlim=[0, 2**8], xscale='linear', yscale='linear')