# Notebook 5: Composing Examples

In this notebook we will examine how we can use the elements we have encountered so far, in order to construct a examples which will allow us to train machine learning models with data generated in real time. This is a core functionality of GravyFlow.

As usual, we begin by performing the relevent imports:

In [5]:
# Built-in imports
from typing import List, Dict, Iterator
from pathlib import Path

# Dependency imports: 
import numpy as np
import tensorflow as tf
from bokeh.io import show, output_notebook
from bokeh.layouts import gridplot

# Import the GravyFlow module.
import gravyflow as gf

And creating a TensorFlow GPU stratergy:

In [6]:
# Set up the environment using gf.env() and return a tf.distribute.Strategy object.
env : tf.distribute.Strategy = gf.env()

INFO:root:TensorFlow version: 2.12.1, CUDA version: 11.8
INFO:root:[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


## Creating an example generator through composition

We can combine all the elements we have seen so, noise generation, waveform generation, and waveform projection, and use MLy to create a custom Python generator.

In order to scale the injection with respect to the noise we can use a `gf.ScalingMethod` object. GravyFlow supports scaling with SNR (`gf.ScalingTypes.SNR`), HRSS (`gf.ScalingTypes.HRSS`), and HPEAK (`gf.ScalingTypes.HPEAK`).

- `value` : `Union[gf.Distribution, np.ndarray]`, Required
  > The value or distribution to use to scale the injections, units vary depending on type parameter.

- `type_` : `gf.ScalingTypes`, Required
  > Type of scaling, one of either `gf.ScalingTypes.SNR`, `gf.ScalingTypes.HRSS`, or `gf.ScalingTypes.HPEAK`.

Let us create a `gf.ScalingMethod` object:

In [7]:
# Create a scaling method object in order to scale the injection to the noise:
scaling_method : gf.ScalingMethod = gf.ScalingMethod(
    value=gf.Distribution(
        value=20,
        type_=gf.DistributionType.CONSTANT
    ),
    type_=gf.ScalingTypes.SNR
)

Next, we can set up all the other elements that we can use to compose or example generator:

In [8]:
with env:
    # This object will be used to obtain real interferometer data based on specified parameters.
    ifo_data_obtainer : gf.IFODataObtainer = gf.IFODataObtainer(
        observing_runs=gf.ObservingRun.O3, # Specify the observing run (e.g., O3).
        data_quality=gf.DataQuality.BEST,  # Choose the quality of the data (e.g., BEST).
        data_labels=[                      # Define the types of data to include.
            gf.DataLabel.NOISE, 
            gf.DataLabel.GLITCHES
        ],
        segment_order=gf.SegmentOrder.RANDOM, # Order of segment retrieval (e.g., RANDOM).
        force_acquisition=True,               # Force the acquisition of new data.
        cache_segments=False                  # Choose not to cache the segments.
    )

    # Initialize the noise generator wrapper:
    # This wrapper will use the ifo_data_obtainer to generate real noise based on the specified parameters.
    noise: gf.NoiseObtainer = gf.NoiseObtainer(
        ifo_data_obtainer=ifo_data_obtainer, # Use the previously set up IFODataObtainer object.
        noise_type=gf.NoiseType.REAL,        # Specify the type of noise as REAL.
        ifos=gf.IFO.L1 # Specify the interferometer (e.g., LIGO Livingston L1).
    )

    # Define a uniform distribution for the mass of the first object in solar masses.
    mass_1_distribution_msun : gf.Distribution = gf.Distribution(
        min_=10.0, 
        max_=60.0, 
        type_=gf.DistributionType.UNIFORM
    )

    # Define a uniform distribution for the mass of the second object in solar masses.
    mass_2_distribution_msun : gf.Distribution = gf.Distribution(
        min_=10.0, 
        max_=60.0, 
        type_=gf.DistributionType.UNIFORM
    )

    # Define a uniform distribution for the inclination of the binary system in radians.
    inclination_distribution_radians : gf.Distribution = gf.Distribution(
        min_=0.0, 
        max_=np.pi, 
        type_=gf.DistributionType.UNIFORM
    )

    # Initialize a PhenomD waveform generator with the defined distributions.
    # This generator will produce waveforms with randomly varied masses and inclination angles.
    phenom_d_generator : gf.WaveformGenerator = gf.cuPhenomDGenerator(
        mass_1_msun=mass_1_distribution_msun,
        mass_2_msun=mass_2_distribution_msun,
        inclination_radians=inclination_distribution_radians,
        scaling_method=scaling_method,
        injection_chance=0.5 # Set so half produced examples will not contain this signal
    )
    
    generator : Iterator = gf.data(       
        noise_obtainer=noise,
        waveform_generators=phenom_d_generator,
        num_examples_per_batch=8,
        input_variables=[
            gf.ReturnVariables.WHITENED_ONSOURCE, 
        ],
        output_variables=[
            gf.ReturnVariables.INJECTIONS, 
            gf.ReturnVariables.WHITENED_INJECTIONS,
            gf.ReturnVariables.INJECTION_MASKS
        ]
    )

INFO:root:Loading event times from cache.


Similiarly to the individual elements, we can use this example generator as an iterator, and produce N examples for use to examine:

In [10]:
# Use the TensorFlow environment 'env' created earlier with gf.env()
with env:
    # Generate a batch of examples using the composed generator.
    input_data, output_data = next(generator)

2024-08-20 09:35:25.555437: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INTERNAL: No function library is provided.
	 [[{{node PartitionedCall_1}}]]
2024-08-20 09:35:25.555786: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INTERNAL: No function library is provided.
	 [[{{node PartitionedCall_2}}]]
2024-08-20 09:35:25.560346: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INTERNAL: No function library is provided.
	 [[{{node PartitionedCall_1}}]]
2024-08-20 09:35:25.560588: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you ca

In [24]:
import cProfile
import pstats
from pstats import SortKey

def function_to_profile():
    # Your code here
    with env:
    	# Generate a batch of examples using the composed generator.
    	input_data, output_data = next(generator)

# Run the profiler
cProfile.run('function_to_profile()', 'profile_output')

# Analyze the results
p = pstats.Stats('profile_output')
p.strip_dirs().sort_stats(SortKey.TIME).print_stats(344)

Tue Aug 20 09:37:46 2024    profile_output

         65282 function calls (62997 primitive calls) in 0.097 seconds

   Ordered by: internal time
   List reduced from 985 to 344 due to restriction <344>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       13    0.019    0.001    0.019    0.001 {built-in method tensorflow.python._pywrap_tfe.TFE_Py_Execute}
       83    0.002    0.000    0.002    0.000 {built-in method tensorflow.python.client._pywrap_tf_session.TF_FinishOperation}
       15    0.002    0.000    0.002    0.000 {built-in method tensorflow.python._pywrap_tfe.TFE_Py_FastPathExecute}
     6377    0.002    0.000    0.004    0.000 {built-in method builtins.isinstance}
       48    0.001    0.000    0.004    0.000 tensor_util.py:421(make_tensor_proto)
       83    0.001    0.000    0.009    0.000 ops.py:1912(_create_c_op)
       83    0.001    0.000    0.024    0.000 ops.py:3755(_create_op_internal)
       10    0.001    0.000    0.001    0.000 constan

<pstats.Stats at 0x7fda08500040>

We can print the output of the generator to examine its format, which values are returned will depend on which parameters we have requested in the input_variables, and output_variables field in our gf.data function. Both are returned in the form of a dictionary, which can easly be fed into a keras model if the inputs of the model are named similarly to the variable feilds, we will show an example of this later in the notebook.

In [21]:
# This is the data we will uses as an input examples to out model:
print(f"Dictionary to feed the model: \n {input_data}")

# This is the target data we will use as labels to train our model:
print(f"Dictionary to use as the model labels: \n {output_data}")

Dictionary to feed the model: 
 {'WHITENED_ONSOURCE': <tf.Tensor: shape=(8, 1, 2048), dtype=float16, numpy=
array([[[-1.214  , -0.03455,  0.1855 , ..., -0.949  ,
          0.271  ,  1.201  ]],

       [[-0.3955 , -1.844  ,  1.588  , ..., -0.7656 ,
          1.777  , -0.4202 ]],

       [[-1.105  , -0.3062 , -0.868  , ...,  0.01317,
          0.1548 ,  1.038  ]],

       ...,

       [[-0.369  ,  3.172  ,  1.591  , ..., -0.01865,
          0.724  , -0.4272 ]],

       [[ 1.556  ,  0.9805 , -1.746  , ..., -1.03   ,
         -0.2825 , -1.093  ]],

       [[ 0.4404 , -0.2177 ,  2.229  , ...,  0.8154 ,
         -0.441  , -1.399  ]]], dtype=float16)>}
Dictionary to use as the model labels: 
 {'INJECTIONS': <tf.Tensor: shape=(1, 8, 1, 2048), dtype=float32, numpy=
array([[[[ 0.25261962,  0.2676883 ,  0.279508  , ...,
           0.        ,  0.        ,  0.        ]],

        [[ 0.        ,  0.        ,  0.        , ...,
           0.        ,  0.        ,  0.        ]],

        [[ 0.        

We can then print some examples from this dataset to examine the output:

In [25]:
# Extract the data from the generator output: 
onsource: tf.Tensor = input_data[gf.ReturnVariables.WHITENED_ONSOURCE.name]
injections: tf.Tensor = output_data[gf.ReturnVariables.INJECTIONS.name][0]
whitened_injections: tf.Tensor = output_data[gf.ReturnVariables.WHITENED_INJECTIONS.name][0]
injection_masks: tf.Tensor = output_data[gf.ReturnVariables.INJECTION_MASKS.name][0]

# Initialize an empty layout for the strain plots.
generated_data_layout: List = []

# Iterate over the multi-injections and their corresponding parameters.
for onsource_, injection, whitened_injection, masks_ in zip(
        onsource,
        injections,
        whitened_injections,
        injection_masks
    ):
    # Create strain plots for each Phenom D and WNB injection with titles displaying the parameter values.
    generated_data_layout.append([
        gf.generate_strain_plot(
            {
                "Onsouce": onsource_,
                "Whitened Injection": whitened_injection,
                "Injection": injection,
            },
            height=400,
            width=800,
            title=f"Example Output. Has Injection: {bool(masks_)}"
        )
    ])

# Arrange the plots in a grid layout and display them in the notebook.
grid = gridplot(generated_data_layout)
output_notebook()
show(grid)