# Notebook 5: Composing Examples

In this notebook we will examine how we can use the elements we have encountered so far, in order to construct examples which will allow us to train machine learning models with data generated in real time. This is a core functionality of GravyFlow.

As usual, we begin by performing the relevent imports:

In [7]:
# Built-in imports
from typing import List, Dict, Iterator
from pathlib import Path

# Import the GravyFlow module.
import gravyflow as gf

# Dependency imports: 
import jax
import numpy as np
from bokeh.io import show, output_notebook
from bokeh.layouts import gridplot

## Creating an example generator through composition

We can combine all the elements we have seen so, noise generation, waveform generation, and waveform projection, and use MLy to create a custom Python generator.

In order to scale the injection with respect to the noise we can use a `gf.ScalingMethod` object. GravyFlow supports scaling with SNR (`gf.ScalingTypes.SNR`), HRSS (`gf.ScalingTypes.HRSS`), and HPEAK (`gf.ScalingTypes.HPEAK`).

- `value` : `Union[gf.Distribution, np.ndarray]`, Required
  > The value or distribution to use to scale the injections, units vary depending on type parameter.

- `type_` : `gf.ScalingTypes`, Required
  > Type of scaling, one of either `gf.ScalingTypes.SNR`, `gf.ScalingTypes.HRSS`, or `gf.ScalingTypes.HPEAK`.

Let us create a `gf.ScalingMethod` object:

In [8]:
# Create a scaling method object in order to scale the injection to the noise:
scaling_method : gf.ScalingMethod = gf.ScalingMethod(
    value=gf.Distribution(
        value=20,
        type_=gf.DistributionType.CONSTANT
    ),
    type_=gf.ScalingTypes.SNR
)

Next, we can set up all the other elements that we can use to compose or example generator:

In [9]:
# This object will be used to obtain real interferometer data based on specified parameters.
ifo_data_obtainer : gf.IFODataObtainer = gf.IFODataObtainer(
    observing_runs=gf.ObservingRun.O3, # Specify the observing run (e.g., O3).
    data_quality=gf.DataQuality.BEST,  # Choose the quality of the data (e.g., BEST).
    data_labels=[                      # Define the types of data to include.
        gf.DataLabel.NOISE, 
        gf.DataLabel.GLITCHES
    ],
    segment_order=gf.SegmentOrder.RANDOM, # Order of segment retrieval (e.g., RANDOM).
    force_acquisition=True,               # Force the acquisition of new data.
    cache_segments=False                  # Choose not to cache the segments.
)

# Initialize the noise generator wrapper:
# This wrapper will use the ifo_data_obtainer to generate real noise based on the specified parameters.
noise: gf.NoiseObtainer = gf.NoiseObtainer(
    ifo_data_obtainer=ifo_data_obtainer, # Use the previously set up IFODataObtainer object.
    noise_type=gf.NoiseType.REAL,        # Specify the type of noise as REAL.
    ifos=[gf.IFO.L1, gf.IFO.H1] # Specify the interferometer (e.g., LIGO Livingston L1).
)

# Define a uniform distribution for the mass of the first object in solar masses.
mass_1_distribution_msun : gf.Distribution = gf.Distribution(
    min_=5.0, 
    max_=90.0, 
    type_=gf.DistributionType.UNIFORM
)

# Define a uniform distribution for the mass of the second object in solar masses.
mass_2_distribution_msun : gf.Distribution = gf.Distribution(
    min_=5.0, 
    max_=90.0, 
    type_=gf.DistributionType.UNIFORM
)

# Define a uniform distribution for the inclination of the binary system in radians.
inclination_distribution_radians : gf.Distribution = gf.Distribution(
    min_=0.0, 
    max_=np.pi, 
    type_=gf.DistributionType.UNIFORM
)
    
# Initialize a CBCGenerator with the defined distributions and specific approximant.
# Available approximants: "IMRPhenomD", "IMRPhenomXAS", "IMRPhenomPv2", "TaylorF2", "IMRPhenomD_NRTidalv2"
cbc_generator : gf.WaveformGenerator = gf.CBCGenerator(
    approximant="IMRPhenomD", # Specify the waveform approximant here
    mass_1_msun=mass_1_distribution_msun,
    mass_2_msun=mass_2_distribution_msun,
    inclination_radians=inclination_distribution_radians,
    scaling_method=scaling_method,
    injection_chance=0.5,
    # For IMRPhenomPv2, you might want to specify spins if not using defaults:
    # spin_1_in=(0.0, 0.0, 0.0), 
    # spin_2_in=(0.0, 0.0, 0.0)
)
    
generator : Iterator = gf.data(       
    noise_obtainer=noise,
    waveform_generators=cbc_generator, # Use the ripple generator
    num_examples_per_batch=8,
    input_variables=[
        gf.ReturnVariables.WHITENED_ONSOURCE, 
    ],
    output_variables=[
        gf.ReturnVariables.INJECTIONS, 
        gf.ReturnVariables.WHITENED_INJECTIONS,
        gf.ReturnVariables.INJECTION_MASKS
    ]
)

Similiarly to the individual elements, we can use this example generator as an iterator, and produce N examples for use to examine:

In [10]:
# Generate a batch of examples using the composed generator.
input_data, output_data = next(generator)



We can print the output of the generator to examine its format, which values are returned will depend on which parameters we have requested in the input_variables, and output_variables field in our gf.data function. Both are returned in the form of a dictionary, which can easly be fed into a keras model if the inputs of the model are named similarly to the variable fields, we will show an example of this later in the notebook.

In [11]:
# This is the data we will uses as an input examples to out model:
print(f"Dictionary to feed the model: \n {input_data}")

# This is the target data we will use as labels to train our model:
print(f"Dictionary to use as the model labels: \n {output_data}")

Dictionary to feed the model: 
 {'WHITENED_ONSOURCE': Array([[[ 2.883e-01,  2.094e+00,  5.552e-01, ..., -2.744e-01,
          1.589e+00,  7.827e-01],
        [-4.292e-01, -3.125e-01, -4.731e-01, ..., -1.019e+00,
         -3.135e-01, -3.740e-01]],

       [[ 1.004e+00,  5.762e-01,  2.196e-01, ...,  9.521e-03,
         -1.136e+00,  2.410e+00],
        [ 7.397e-01,  3.215e-01,  7.383e-01, ...,  1.082e+00,
          1.837e+00, -1.605e+00]],

       [[-7.939e-01,  9.713e-04,  4.111e-01, ...,  1.804e+00,
          3.484e-01,  2.986e-01],
        [ 1.485e+00,  8.105e-02, -3.025e-01, ...,  6.953e-01,
         -1.334e-01, -8.264e-02]],

       ...,

       [[-1.737e+00,  1.360e+00, -1.945e-01, ..., -1.222e+00,
         -1.427e+00,  1.918e-01],
        [ 5.547e-01,  1.061e+00, -2.219e+00, ..., -8.511e-01,
         -3.689e-01,  1.469e+00]],

       [[ 7.871e-01, -2.008e-01,  2.336e+00, ...,  3.379e-01,
         -1.704e-01, -2.715e-01],
        [ 7.183e-01, -2.377e+00,  5.327e-01, ..., -9.468e-01,

We can then print some examples from this dataset to examine the output:

In [12]:
# Extract the data from the generator output: 
onsource: jax.Array = input_data[gf.ReturnVariables.WHITENED_ONSOURCE.name]
injections: jax.Array = output_data[gf.ReturnVariables.INJECTIONS.name][0]
whitened_injections: jax.Array = output_data[gf.ReturnVariables.WHITENED_INJECTIONS.name][0]
injection_masks: jax.Array = output_data[gf.ReturnVariables.INJECTION_MASKS.name][0]

# Initialize an empty layout for the strain plots.
generated_data_layout: List = []


# Iterate over the multi-injections and their corresponding parameters.
for onsource_, injection, whitened_injection, masks_ in zip(
        onsource,
        injections,
        whitened_injections,
        injection_masks
    ):
    
    # Create strain plots for each Phenom D and WNB injection with titles displaying the parameter values.
    generated_data_layout.append([
        gf.generate_strain_plot(
            {
                "Onsouce": onsource_,
                "Whitened Injection": whitened_injection,
                "Injection": injection,
            },
            height=400,
            width=800,
            title=f"Example Output. Has Injection: {bool(masks_)}"
        )
    ])

# Arrange the plots in a grid layout and display them in the notebook.
grid = gridplot(generated_data_layout)
output_notebook()
show(grid)