# Notebook 5: Composing Examples

In this notebook we will examine how we can use the elements we have encountered so far, in order to construct examples which will allow us to train machine learning models with data generated in real time. This is a core functionality of GravyFlow.

As usual, we begin by performing the relevent imports:

In [1]:
import os
os.environ['KERAS_BACKEND'] = 'jax'

# Built-in imports
from typing import List, Dict, Iterator
from pathlib import Path

# Dependency imports: 
import numpy as np
import keras
from keras import ops
import jax
import jax.numpy as jnp
from bokeh.io import show, output_notebook
from bokeh.layouts import gridplot

# Import the GravyFlow module.
import gravyflow as gf

2025-11-28 22:46:43.613100: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1764391603.636213 1748627 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1764391603.643632 1748627 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-11-28 22:46:43.671688: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
INFO:2025-11-28 22:46:59,635:jax._src.xla_bridge:812: Unable to initialize backend 'tpu': INTERNAL: Failed to open libtpu.so:

## Creating an example generator through composition

We can combine all the elements we have seen so, noise generation, waveform generation, and waveform projection, and use MLy to create a custom Python generator.

In order to scale the injection with respect to the noise we can use a `gf.ScalingMethod` object. GravyFlow supports scaling with SNR (`gf.ScalingTypes.SNR`), HRSS (`gf.ScalingTypes.HRSS`), and HPEAK (`gf.ScalingTypes.HPEAK`).

- `value` : `Union[gf.Distribution, np.ndarray]`, Required
  > The value or distribution to use to scale the injections, units vary depending on type parameter.

- `type_` : `gf.ScalingTypes`, Required
  > Type of scaling, one of either `gf.ScalingTypes.SNR`, `gf.ScalingTypes.HRSS`, or `gf.ScalingTypes.HPEAK`.

Let us create a `gf.ScalingMethod` object:

In [2]:
# Create a scaling method object in order to scale the injection to the noise:
scaling_method : gf.ScalingMethod = gf.ScalingMethod(
    value=gf.Distribution(
        value=20,
        type_=gf.DistributionType.CONSTANT
    ),
    type_=gf.ScalingTypes.SNR
)

Next, we can set up all the other elements that we can use to compose or example generator:

In [None]:
# This object will be used to obtain real interferometer data based on specified parameters.
ifo_data_obtainer : gf.IFODataObtainer = gf.IFODataObtainer(
    observing_runs=gf.ObservingRun.O3, # Specify the observing run (e.g., O3).
    data_quality=gf.DataQuality.BEST,  # Choose the quality of the data (e.g., BEST).
    data_labels=[                      # Define the types of data to include.
        gf.DataLabel.NOISE, 
        gf.DataLabel.GLITCHES
    ],
    segment_order=gf.SegmentOrder.RANDOM, # Order of segment retrieval (e.g., RANDOM).
    force_acquisition=True,               # Force the acquisition of new data.
    cache_segments=False                  # Choose not to cache the segments.
)

# Initialize the noise generator wrapper:
# This wrapper will use the ifo_data_obtainer to generate real noise based on the specified parameters.
noise: gf.NoiseObtainer = gf.NoiseObtainer(
    ifo_data_obtainer=ifo_data_obtainer, # Use the previously set up IFODataObtainer object.
    noise_type=gf.NoiseType.WHITE,        # Specify the type of noise as REAL.
    ifos=gf.IFO.L1 # Specify the interferometer (e.g., LIGO Livingston L1).
)

# Define a uniform distribution for the mass of the first object in solar masses.
mass_1_distribution_msun : gf.Distribution = gf.Distribution(
    min_=10.0, 
    max_=60.0, 
    type_=gf.DistributionType.UNIFORM
)

# Define a uniform distribution for the mass of the second object in solar masses.
mass_2_distribution_msun : gf.Distribution = gf.Distribution(
    min_=10.0, 
    max_=60.0, 
    type_=gf.DistributionType.UNIFORM
)

# Define a uniform distribution for the inclination of the binary system in radians.
inclination_distribution_radians : gf.Distribution = gf.Distribution(
    min_=0.0, 
    max_=np.pi, 
    type_=gf.DistributionType.UNIFORM
)
    
# Initialize a RippleGenerator with the defined distributions and specific approximant.
# Available approximants: "IMRPhenomD", "IMRPhenomXAS", "IMRPhenomPv2", "TaylorF2", "IMRPhenomD_NRTidalv2"
ripple_generator : gf.WaveformGenerator = gf.RippleGenerator(
    approximant="IMRPhenomPv2", # Specify the waveform approximant here
    mass_1_msun=mass_1_distribution_msun,
    mass_2_msun=mass_2_distribution_msun,
    inclination_radians=inclination_distribution_radians,
    scaling_method=scaling_method,
    injection_chance=0.5,
    # For IMRPhenomPv2, you might want to specify spins if not using defaults:
    # spin_1_in=(0.0, 0.0, 0.0), 
    # spin_2_in=(0.0, 0.0, 0.0)
)
    
generator : Iterator = gf.data(       
    noise_obtainer=noise,
    waveform_generators=ripple_generator, # Use the ripple generator
    num_examples_per_batch=8,
    input_variables=[
        gf.ReturnVariables.WHITENED_ONSOURCE, 
    ],
    output_variables=[
        gf.ReturnVariables.INJECTIONS, 
        gf.ReturnVariables.WHITENED_INJECTIONS,
        gf.ReturnVariables.INJECTION_MASKS
    ]
)

2025-11-28 22:47:05,477 - INFO - Available GPUs: ['0']
2025-11-28 22:47:05.753006: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
I0000 00:00:1764391625.753333 1748627 gpu_device.cc:2022] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3000 MB memory:  -> device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:06:00.0, compute capability: 7.0
2025-11-28 22:47:05,785 - INFO - Visible GPUs after restriction: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:1', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:2', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:3', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:4', device_type='GPU'), PhysicalDevice(name='/physical_device:GPU:5', device_type='GPU'), PhysicalDevice(name='/physical_devi

Similiarly to the individual elements, we can use this example generator as an iterator, and produce N examples for use to examine:

In [4]:
# Generate a batch of examples using the composed generator.
input_data, output_data = next(generator)

2025-11-28 22:47:09,427 - INFO - Using previously created distribution strategy.


We can print the output of the generator to examine its format, which values are returned will depend on which parameters we have requested in the input_variables, and output_variables field in our gf.data function. Both are returned in the form of a dictionary, which can easly be fed into a keras model if the inputs of the model are named similarly to the variable fields, we will show an example of this later in the notebook.

In [5]:
# This is the data we will uses as an input examples to out model:
print(f"Dictionary to feed the model: \n {input_data}")

# This is the target data we will use as labels to train our model:
print(f"Dictionary to use as the model labels: \n {output_data}")

Dictionary to feed the model: 
 {'WHITENED_ONSOURCE': <jax.Array: shape=(8, 1, 2048), dtype=float16, numpy=
array([[[ 1.583  , -0.4795 , -0.369  , ..., -0.342  ,
         -0.684  ,  1.166  ]],

       [[ 1.047  ,  0.667  ,  2.09   , ..., -1.096  ,
          0.1973 , -0.3337 ]],

       [[-0.0856 , -0.6196 , -1.855  , ...,  0.2218 ,
         -1.561  , -1.305  ]],

       ...,

       [[ 0.761  , -0.1691 , -0.2466 , ..., -1.129  ,
          0.5244 ,  0.2922 ]],

       [[-1.113  ,  0.2135 ,  3.441  , ..., -0.311  ,
          1.305  , -0.7114 ]],

       [[-2.357  ,  1.281  , -1.204  , ..., -0.08777,
         -0.6196 ,  0.3792 ]]], dtype=float16)>}
Dictionary to use as the model labels: 
 {'INJECTIONS': <jax.Array: shape=(8, 1, 2048), dtype=float32, numpy=
array([[[ 0.45970765,  0.45380932,  0.4461255 , ...,
          0.        ,  0.        ,  0.        ]],

       [[ 0.        ,  0.        ,  0.        , ...,
          0.        ,  0.        ,  0.        ]],

       [[ 0.        ,  0.   

We can then print some examples from this dataset to examine the output:

In [6]:
# Extract the data from the generator output: 
onsource: jax.Array = input_data[gf.ReturnVariables.WHITENED_ONSOURCE.name]
injections: jax.Array = output_data[gf.ReturnVariables.INJECTIONS.name]
whitened_injections: jax.Array = output_data[gf.ReturnVariables.WHITENED_INJECTIONS.name]
injection_masks: jax.Array = output_data[gf.ReturnVariables.INJECTION_MASKS.name][0]

# Initialize an empty layout for the strain plots.
generated_data_layout: List = []

# Iterate over the multi-injections and their corresponding parameters.
for onsource_, injection, whitened_injection, masks_ in zip(
        onsource,
        injections,
        whitened_injections,
        injection_masks
    ):
    # Create strain plots for each Phenom D and WNB injection with titles displaying the parameter values.
    generated_data_layout.append([
        gf.generate_strain_plot(
            {
                "Onsouce": onsource_,
                "Whitened Injection": whitened_injection,
                "Injection": injection,
            },
            height=400,
            width=800,
            title=f"Example Output. Has Injection: {bool(masks_)}"
        )
    ])

# Arrange the plots in a grid layout and display them in the notebook.
grid = gridplot(generated_data_layout)
output_notebook()
show(grid)