# Notebook 5: Composing Examples

In this notebook we will examine how we can use the elements we have encountered so far, in order to construct a examples which will allow us to train machine learning models with data generated in real time. This is a core functionality of GravyFlow.

As usual, we begin by performing the relevent imports:

In [1]:
# Built-in imports
from typing import List, Dict, Iterator
from pathlib import Path

# Dependency imports: 
import numpy as np
import tensorflow as tf
from bokeh.io import show, output_notebook
from bokeh.layouts import gridplot

# Import the GravyFlow module.
import gravyflow as gf

2024-08-20 10:04:42.448662: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


And creating a TensorFlow GPU stratergy:

In [2]:
# Set up the environment using gf.env() and return a tf.distribute.Strategy object.
env : tf.distribute.Strategy = gf.env()

INFO:root:TensorFlow version: 2.12.1, CUDA version: 11.8
2024-08-20 10:05:03.301082: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2024-08-20 10:05:03.301159: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1635] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3000 MB memory:  -> device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:0b:00.0, compute capability: 7.0
INFO:root:[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


## Creating an example generator through composition

We can combine all the elements we have seen so, noise generation, waveform generation, and waveform projection, and use MLy to create a custom Python generator.

In order to scale the injection with respect to the noise we can use a `gf.ScalingMethod` object. GravyFlow supports scaling with SNR (`gf.ScalingTypes.SNR`), HRSS (`gf.ScalingTypes.HRSS`), and HPEAK (`gf.ScalingTypes.HPEAK`).

- `value` : `Union[gf.Distribution, np.ndarray]`, Required
  > The value or distribution to use to scale the injections, units vary depending on type parameter.

- `type_` : `gf.ScalingTypes`, Required
  > Type of scaling, one of either `gf.ScalingTypes.SNR`, `gf.ScalingTypes.HRSS`, or `gf.ScalingTypes.HPEAK`.

Let us create a `gf.ScalingMethod` object:

In [3]:
# Create a scaling method object in order to scale the injection to the noise:
scaling_method : gf.ScalingMethod = gf.ScalingMethod(
    value=gf.Distribution(
        value=20,
        type_=gf.DistributionType.CONSTANT
    ),
    type_=gf.ScalingTypes.SNR
)

Next, we can set up all the other elements that we can use to compose or example generator:

In [4]:
with env:
    # This object will be used to obtain real interferometer data based on specified parameters.
    ifo_data_obtainer : gf.IFODataObtainer = gf.IFODataObtainer(
        observing_runs=gf.ObservingRun.O3, # Specify the observing run (e.g., O3).
        data_quality=gf.DataQuality.BEST,  # Choose the quality of the data (e.g., BEST).
        data_labels=[                      # Define the types of data to include.
            gf.DataLabel.NOISE, 
            gf.DataLabel.GLITCHES
        ],
        segment_order=gf.SegmentOrder.RANDOM, # Order of segment retrieval (e.g., RANDOM).
        force_acquisition=True,               # Force the acquisition of new data.
        cache_segments=False                  # Choose not to cache the segments.
    )

    # Initialize the noise generator wrapper:
    # This wrapper will use the ifo_data_obtainer to generate real noise based on the specified parameters.
    noise: gf.NoiseObtainer = gf.NoiseObtainer(
        ifo_data_obtainer=ifo_data_obtainer, # Use the previously set up IFODataObtainer object.
        noise_type=gf.NoiseType.REAL,        # Specify the type of noise as REAL.
        ifos=gf.IFO.L1 # Specify the interferometer (e.g., LIGO Livingston L1).
    )

    # Define a uniform distribution for the mass of the first object in solar masses.
    mass_1_distribution_msun : gf.Distribution = gf.Distribution(
        min_=10.0, 
        max_=60.0, 
        type_=gf.DistributionType.UNIFORM
    )

    # Define a uniform distribution for the mass of the second object in solar masses.
    mass_2_distribution_msun : gf.Distribution = gf.Distribution(
        min_=10.0, 
        max_=60.0, 
        type_=gf.DistributionType.UNIFORM
    )

    # Define a uniform distribution for the inclination of the binary system in radians.
    inclination_distribution_radians : gf.Distribution = gf.Distribution(
        min_=0.0, 
        max_=np.pi, 
        type_=gf.DistributionType.UNIFORM
    )

    # Initialize a PhenomD waveform generator with the defined distributions.
    # This generator will produce waveforms with randomly varied masses and inclination angles.
    phenom_d_generator : gf.WaveformGenerator = gf.cuPhenomDGenerator(
        mass_1_msun=mass_1_distribution_msun,
        mass_2_msun=mass_2_distribution_msun,
        inclination_radians=inclination_distribution_radians,
        scaling_method=scaling_method,
        injection_chance=1.0 # Set so half produced examples will not contain this signal
    )
    
    generator : Iterator = gf.data(       
        noise_obtainer=noise,
        waveform_generators=phenom_d_generator,
        num_examples_per_batch=8,
        input_variables=[
            gf.ReturnVariables.WHITENED_ONSOURCE, 
        ],
        output_variables=[
            gf.ReturnVariables.INJECTIONS, 
            gf.ReturnVariables.WHITENED_INJECTIONS,
            gf.ReturnVariables.INJECTION_MASKS
        ]
    )

2024-08-20 10:05:03.485856: I tensorflow/compiler/xla/service/service.cc:169] XLA service 0x56088f220350 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2024-08-20 10:05:03.485897: I tensorflow/compiler/xla/service/service.cc:177]   StreamExecutor device (0): Tesla V100-SXM2-16GB, Compute Capability 7.0
2024-08-20 10:05:03.491292: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-08-20 10:05:03.529079: I tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:424] Loaded cuDNN version 8600
2024-08-20 10:05:03.695335: I ./tensorflow/compiler/jit/device_compiler.h:180] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.
INFO:root:Loading event times from cache.


Similiarly to the individual elements, we can use this example generator as an iterator, and produce N examples for use to examine:

In [6]:
# Use the TensorFlow environment 'env' created earlier with gf.env()
with env:
    # Generate a batch of examples using the composed generator.
    input_data, output_data = next(generator)

2024-08-20 10:05:17.688792: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INTERNAL: No function library is provided.
	 [[{{node PartitionedCall_1}}]]
2024-08-20 10:05:17.689392: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INTERNAL: No function library is provided.
	 [[{{node PartitionedCall_2}}]]
2024-08-20 10:05:17.694703: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INTERNAL: No function library is provided.
	 [[{{node PartitionedCall_1}}]]
2024-08-20 10:05:17.694985: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you ca

In [7]:
import cProfile
import pstats
from pstats import SortKey

def function_to_profile():
    # Your code here
    with env:
    	# Generate a batch of examples using the composed generator.
    	input_data, output_data = next(generator)

# Run the profiler
cProfile.run('function_to_profile()', 'profile_output')

# Analyze the results
p = pstats.Stats('profile_output')
p.strip_dirs().sort_stats(SortKey.TIME).print_stats(344)

Tue Aug 20 10:05:22 2024    profile_output

         14544 function calls (14004 primitive calls) in 0.021 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       13    0.004    0.000    0.004    0.000 {built-in method tensorflow.python._pywrap_tfe.TFE_Py_Execute}
       15    0.002    0.000    0.002    0.000 {built-in method tensorflow.python._pywrap_tfe.TFE_Py_FastPathExecute}
       10    0.001    0.000    0.002    0.000 constant_op.py:75(convert_to_eager_tensor)
        3    0.001    0.000    0.002    0.001 math_ops.py:938(cast)
       77    0.001    0.000    0.001    0.000 typing.py:1408(_get_protocol_attrs)
       39    0.001    0.000    0.001    0.000 inspect.py:3050(_bind)
       13    0.000    0.000    0.004    0.000 function_type.py:368(canonicalize_to_monomorphic)
     1143    0.000    0.000    0.002    0.000 {built-in method builtins.isinstance}
       91    0.000    0.000    0.000    0.000 {built-in method tenso

<pstats.Stats at 0x7fc360763cd0>

We can print the output of the generator to examine its format, which values are returned will depend on which parameters we have requested in the input_variables, and output_variables field in our gf.data function. Both are returned in the form of a dictionary, which can easly be fed into a keras model if the inputs of the model are named similarly to the variable feilds, we will show an example of this later in the notebook.

In [8]:
# This is the data we will uses as an input examples to out model:
print(f"Dictionary to feed the model: \n {input_data}")

# This is the target data we will use as labels to train our model:
print(f"Dictionary to use as the model labels: \n {output_data}")

Dictionary to feed the model: 
 {'WHITENED_ONSOURCE': <tf.Tensor: shape=(8, 1, 2048), dtype=float16, numpy=
array([[[-1.215  , -0.03406,  0.1852 , ..., -0.949  ,
          0.271  ,  1.201  ]],

       [[-1.134  , -2.58   ,  0.8687 , ..., -0.765  ,
          1.777  , -0.42   ]],

       [[-1.368  , -0.594  , -1.178  , ...,  0.02081,
          0.1614 ,  1.042  ]],

       ...,

       [[-0.2957 ,  3.229  ,  1.634  , ..., -0.01342,
          0.727  , -0.4263 ]],

       [[ 1.3125 ,  0.7666 , -1.929  , ..., -1.03   ,
         -0.2837 , -1.098  ]],

       [[ 0.9756 ,  0.3188 ,  2.764  , ...,  0.8154 ,
         -0.441  , -1.4    ]]], dtype=float16)>}
Dictionary to use as the model labels: 
 {'INJECTIONS': <tf.Tensor: shape=(1, 8, 1, 2048), dtype=float32, numpy=
array([[[[ 0.25261962,  0.2676883 ,  0.279508  , ...,
           0.        ,  0.        ,  0.        ]],

        [[-0.2885979 , -0.29481485, -0.2930275 , ...,
           0.        ,  0.        ,  0.        ]],

        [[-0.10134438

We can then print some examples from this dataset to examine the output:

In [23]:
# Use the TensorFlow environment 'env' created earlier with gf.env()
with env:
    # Generate a batch of examples using the composed generator.
    input_data, output_data = next(generator)


# Extract the data from the generator output: 
onsource: tf.Tensor = input_data[gf.ReturnVariables.WHITENED_ONSOURCE.name]
injections: tf.Tensor = output_data[gf.ReturnVariables.INJECTIONS.name][0]
whitened_injections: tf.Tensor = output_data[gf.ReturnVariables.WHITENED_INJECTIONS.name][0]
injection_masks: tf.Tensor = output_data[gf.ReturnVariables.INJECTION_MASKS.name][0]

# Initialize an empty layout for the strain plots.
generated_data_layout: List = []

# Iterate over the multi-injections and their corresponding parameters.
for onsource_, injection, whitened_injection, masks_ in zip(
        onsource,
        injections,
        whitened_injections,
        injection_masks
    ):
    # Create strain plots for each Phenom D and WNB injection with titles displaying the parameter values.
    generated_data_layout.append([
        gf.generate_strain_plot(
            {
                "Onsouce": onsource_,
                "Whitened Injection": whitened_injection,
                "Injection": injection,
            },
            height=400,
            width=800,
            title=f"Example Output. Has Injection: {bool(masks_)}"
        )
    ])

# Arrange the plots in a grid layout and display them in the notebook.
grid = gridplot(generated_data_layout)
output_notebook()
show(grid)