# Making TFRecords for the IMNN

One way we can feed data to the IMNN is via the TFRecords format which stores a sequence of binary records.

The IMNN needs two different forms of records, one for the fiducial simulations and one for the derivative simulations.

Here we show how we can serialise and save a collection of records of size 150Mb (which is about optimal for reading).

First we need to define a function which grabs a single simulation (at a given seed) and if it is a simulation for the numerical derivative it also needs a given derivative (0 or 1 for the negative or positive part of the numerical derivative) as well as and index for which parameter the derivative is being taken on.

Lets say we had a very large numpy file for for the fiducial (10000 simulations) and derivatives (1000 simulations) of some 3D data (256 voxels per side) with 5 parameters in the simulator model that wouldn't fit in GPU memory,
```python
n_fid_sims = 10000
n_der_sims = 1000
input_shape = (256, 256, 256, 1)
n_params = 5

fiducial_simulations.shape
  # (10000, 256, 256, 256, 1)
fiducial_validation_simulations.shape
  # (10000, 256, 256, 256, 1)
derivative_simulations.shape
  # (1000, 2, 5, 256, 256, 256, 1)
derivative_validation_simulations.shape
  # (1000, 2, 5, 256, 256, 256, 1)
```
We could write two functions to get single simulations and return them as numpy arrays to serialise them as TFRecords, (these functions have to return numpy arrays even if your simulations have some other form on file for example).
```python
def get_fiducial(seed, data):
    return data[seed]

def get_derivative(seed, derivative, parameter, data):
    return data[seed, derivative, parameter]

get_fiducial(0, fiducial_simulations).shape
  # (256, 256, 256, 1)
get_fiducial(0, fiducial_validation_simulations).shape
  # (256, 256, 256, 1)
get_derivative_simulation(0, 0, 0, derivative_simulations).shape
  # (256, 256, 256, 1)
get_derivative_simulation(0, 0, 0, derivative_validation_simulations).shape
  # (256, 256, 256, 1)
```

We are going to serialise the input data from a numpy array to a list of bytes and the indices as integers. This can all be done using the TFRecords module as part of utils.

```python
from imnn_tf.utils import TFRecords
```

To write all the simulations to records in `data/tfrecords` with the default name `fiducial` and `derivative` (`validation_fiducial` and `validation_derivative` for validation data) we can run

```python
writer = TFRecords(record_size=150)

writer.write_record(
    n_sims=n_fid_sims, 
    get_simulation=lambda x : get_fiducial(x, fiducial_simulations),
    fiducial=True, 
    directory="data/tfrecords")

writer.write_record(
    n_sims=n_fid_sims, 
    get_simulation=lambda x : get_fiducial(x, fiducial_validation_simulations),
    fiducial=True, 
    validation=True,
    directory="data/tfrecords")

writer.write_record(
    n_sims=n_fid_sims, 
    get_simulation=lambda x, y, z : get_derivative(x, y, z, derivative_simulations),
    fiducial=False,
    n_params=n_params,
    directory="data/tfrecords")

writer.write_record(
    n_sims=n_fid_sims, 
    get_simulation=lambda x, y, z : get_derivative(x, y, z, derivative_validation_simulations),
    fiducial=False,
    n_params=n_params,
    validation=True,
    directory="data/tfrecords")
```