#  Notebook 2: Data Acquisition

In this notebook, we will explore how to generate noise and acquire real interferometer output data. In GravyFlow, datasets are constructed through composition. By combining various elements, such as noise, injections, and conditioning, we can create customized datasets tailored to our specific needs. 

**Note:** The iterators demonstrated in this notebook are not necessarily recommended for use in training machine learning models. Even if you only require noise, without any injected signals, it is probably more convenient to use `gf.Dataset`, which will be explained in a later notebook. However, instances of `gf.Dataset` are composed by combining iterators of the type produced in this notebook with iterators shown in subsequent notebooks. Therefore, understanding the function of these iterators is useful.

We will begin by performing the necessary imports:

In [1]:
# Built-in imports
from typing import List
from itertools import islice

# Dependency imports: 
import tensorflow as tf
from bokeh.io import show, output_notebook
from bokeh.layouts import gridplot

# Import the GravyFlow module.
import gravyflow as gf

2024-02-28 05:39:07.185984: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-02-28 05:39:07.186032: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-02-28 05:39:07.187801: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-02-28 05:39:07.197870: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


## Setup Environment:

As described in the first notebook, we should set up the environment using `gf.env` to ensure we work on an available GPU. This is crucial because GravyFlow is optimized for GPU-based computations, significantly accelerating data processing and analysis. Ensuring that a GPU is available and properly configured allows us to fully leverage this computational power, which is especially important when working with large datasets or complex machine learning models.

In [2]:
# Set up the environment using gf.env() and return a tf.distribute.Strategy object.
env : tf.distribute.Strategy = gf.env()

INFO:root:TensorFlow version: 2.15.0, CUDA version: 12.2
2024-02-28 05:39:24.120264: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:47] Overriding orig_value setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2024-02-28 05:39:24.120693: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 2000 MB memory:  -> device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:85:00.0, compute capability: 7.0
INFO:root:[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]


## Set GravyFlow Global Defaults

Often, when working within a single notebook or Python script, some variables remain constant throughout our analysis. To accommodate these scenarios, GravyFlow allows us to set a number of defaults in a global defaults class. The values that can be set in this way include:

- `seed` : int = 1000
  > This is the default random seed that GravyFlow will use to initialize TensorFlow and Numpy random operations used for operations such as dataset shuffling, random noise generation, and waveform parameter randomization. Setting a consistent seed should result in consistent, repeatable outputs. The default value is 1000.
- `num_examples_per_generation_batch` : int = 2048
  > When acquiring real interferometer data, GravyFlow downloads data in batches. This is done for efficiency, to reduce the number of overall download requests and the overhead that comes with that. This parameter determines the number of training examples generated for each of those batches. When generating waveforms, they are also generated in batches of this number. The default value is 2048.
- `num_examples_per_batch` : int = 32
  > This parameter determines the number of examples that will be output by each iteration of the GravyFlow generators. When used to train a machine learning model, which is the primary design goal of GravyFlow. This number should be set to the same value as your desired training batch size.
- `sample_rate_hertz` : float = 2048.0
  > The default sample rate of the data input and output by GravyFlow in Hertz. The default value is 2048.0 Hz.
- `onsource_duration_seconds` : float = 1.0
    > The default duration of onsource data provided by GravyFlow iterators, in seconds. In GravyFlow, the onsource data is defined as data being analysed by your method that may contain a gravitational wave signal. As opposed to the offsource data, which is assumed not to contain any significant data features, and can be used as an example of uncontaminated noise for data conditioning purposes such as whitening. The default value is 1.0 s.
- `offsource_duration_seconds` : float = 16.0
    > The default duration of offsource data provided by GravyFlow iterators, in seconds. Offsource data is data that is assumed not to contain any significant features, and can be used as an example of uncontaminated noise for data conditioning purposes such as whitening. The default value is 16.0 s.
- `crop_duration_seconds` : float = 0.5
    > During some data conditioning operations, (currently only whitening), edge effects will be created. This will need to be cropped before data analysis is performed. GravyFlow does this automatically. crop_duration_seconds defines how much data to be cropped either side of the onsource segment, in seconds. Data will be cropped either side of the onsource so total cropped duration will be 2 × crop_duration_seconds. The default value is 0.5 s.
- `scale_factor` : float = 1.0E21
    > When gathering data for use in machine learning applications, we want our values to be close to one, as activation functions such as ReLU, SoftMax, and Sigmoid, are designed around this assumption. For that reason, we often want to scale our input data, which can be very small in the case of gravitational wave data. This value is used to scale both approximants and noise. The default value is 1.0E21.

### Important Consideration

Setting global variables like this can be problematic if you plan to use different values within the same Python script or notebook. There is a risk of forgetting to reset variables for some functions, which may lead to errors. If you intend to work with data that varies in any of these parameters, it is recommended to pass them as arguments to the corresponding function, instead of relying on the default values.

Below we set some of these values to illustrate how they can be defined:

In [3]:
# Here we set the default GravyFlow values. In this example, they are not changed from the defaults,
# but this illustrates how you can set them.
gf.Defaults.set(
    sample_rate_hertz=2048.0,
    onsource_duration_seconds=1.0,
    offsource_duration_seconds=16.0,
    crop_duration_seconds=0.5,
    scale_factor=1.0E21
)

## Types of Noise

GravyFlow currently supports four types of noise. Each type of noise has an associated GravyFlow ENUM. These are:

1. White Gaussian Noise: `gf.NoiseType.WHITE`
   > Simple white noise with a Gaussian distribution.

2. Coloured Gaussian Noise: `gf.NoiseType.COLORED`
   > White Gaussian noise coloured by the specified detector's design PSD (Power Spectral Density).

3. Pseudo-Real Noise: `gf.NoiseType.PSEUDO_REAL`
   > White Gaussian noise coloured by a PSD of data drawn from the detector. This type of noise can simulate the variance and non-stationary nature of real detector noise without including as many non-linearities.

4. Real Noise: `gf.NoiseType.REAL`
   > Real noise acquired from the specified interferometer. Evidently, this has the advantage of being the most realistic type of noise, but the disadvantage that it can lead to greater overfitting to the specific noise characteristics.

## Generating Simulated Noise

The first two types of noise, White Gaussian Noise and Coloured Gaussian Noise, can be generated using only a `gf.NoiseObtainer` object. `gf.NoiseObtainer` takes several arguments:

- `data_directory_path` : Path, default = `Path("./generator_data")`
  > Specifies the directory path where the NoiseObtainer will cache downloaded noise, applicable for Real or Pseudo-Real noise. The default value is `Path("./generator_data")`.

- `ifo_data_obtainer` : Union[None, gf.IFODataObtainer], default = `None`
  > Required for real or pseudo-real noise. A `gf.IFODataObtainer` object manages the acquisition of interferometer data, which will be detailed later in this notebook. The default value is `None`.

- `ifos` : Union[gf.IFO, List[gf.IFO]], default = `gf.IFO.L1`
  > A list of interferometers from which to simulate or acquire noise. GravyFlow currently supports three IFOs, each represented by an ENUM: LIGO Livingston (`gf.IFO.L1`), LIGO Hanford (`gf.IFO.H1`), and Virgo (`gf.IFO.V1`). The default value is `gf.IFO.L1`.

- `noise_type` : gf.NoiseType, default = `gf.NoiseType.REAL`
  > Determines the type of noise to simulate, as discussed above. Options include `gf.NoiseType.WHITE`, `gf.NoiseType.COLORED`, `gf.NoiseType.PSEUDO_REAL`, and `gf.NoiseType.REAL`. The default value is `gf.NoiseType.REAL`. The default value is `gf.NoiseType.REAL`.

- `groups` : Union[dict, None], default = `{"train" : 0.98, "validate" : 0.01, "test" : 0.01}`
  > Allows the creation of distinct groups within real data segments. This is useful for separating training and testing data to ensure no overlap. By default, this parameter sets up train, validate, and test groups. Note that changes to this dictionary will affect the consistency of group assignments across different analyses. This parameter has no effect for `gf.NoiseType.WHITE` or `gf.NoiseType.COLORED`. The default value, which is generated after object initilization, is `{"train" : 0.98, "validate" : 0.01, "test" : 0.01}`.

We can initialize a `gf.NoiseObtainer` object like so:

In [4]:
# Initialize the white noise generator:
white_noise : gf.NoiseObtainer = gf.NoiseObtainer(
    # In white noise generation, the only parameter we need to set is the noise type.
    noise_type=gf.NoiseType.WHITE
)

From this `white_noise` object, we can then create a noise generator by calling this object. When calling an initialized `gf.NoiseObtainer`, it accepts the following arguments:

- `sample_rate_hertz` : `Union[float, None]` = `None`
	> The sample rate of the output noise. If None, which is the default, this value reverts to the default set in `gf.Defaults`.
- `onsource_duration_seconds` : `Union[float, None]` = `None`
    > The duration of the onsource noise, in seconds. If `None`, which is the default, this value reverts to the default set in `gf.Defaults`.
- `crop_duration_seconds` : `Union[float, None]` = `None`
    > A crop duration can also be added, for consistency with the rest of the pipeline. This provides extra noise equivalent to 2 × `crop_duration_seconds` that can be cropped after data conditioning. If `None`, which is the default, this value reverts to the default set in `gf.Defaults`.
- `offsource_duration_seconds` : `Union[float, None]` = `None`
    > The duration of offsource noise, in seconds. If `None`, which is the default, this value reverts to the default set in `gf.Defaults`.
- `num_examples_per_batch` : `Union[int, None]` = `None`
	> The number of noise examples provided each time this iterator is called. If `None`, which is the default, this value reverts to the default set in `gf.Defaults`.
- `scale_factor` : `float` = 1.0
    > The scale factor to multiply the noise. Unlike the other values, this is 1.0 by default, as we usually scale at another point in our pipeline.
- `group` : `str` = `"train"`
	> This parameter designates which group to draw real data from in the real or pseudo-real case. See the description of the groups parameter above. This parameter will do nothing when supplied for `gf.NoiseType.WHITE` or `gf.NoiseType.COLORED` noise.
- `seed` : `Union[int, None]` = `None`
    > This sets the random seed for any random values used in the noise generation or data acquisition process. Note if default, the value will use the value set in `gf.Defaults` which is 1000 by default. so results will still be deterministic. If pseduo-stocastic results are desired, the default seed in `gf.Defaults` should be set to `None`.

Next, we will demonstrate some examples of GravyFlow generating some noise:

### White Noise Example

Since we have previously defined all our parameters, we can generate white noise without setting any additional parameters. However, since we only want to generate one example, we set `num_examples_per_batch=1`.

Since a call to a `gf.NoiseObtainer` returns a Python iterator, as it's primarily designed for use in a loop, we cannot use the object as is, or index into the object directly. We can use Python's built-in `next` function to retrieve the next item returned from the generator.

NOTE: Currently TensorFlow raises an error the first time you attempt to initilize an environment in Jupyter Notebook. If this occurs, please run the cell again.  

In [5]:
# Using the environment 'env' created earlier with gf.env()
with env:
    # Generate white noise by calling the white_noise object with one example per 
    # batch. The next() function is used to retrieve the generated noise data.
    white_onsource, white_offsource, _ = next(
        white_noise(num_examples_per_batch=1)
    )
    
    # In the case of real and pseudo-real noise, the third element returned would 
    # be the GPS start time of the real noise segment. For simulated noise, such 
    # as white noise, this simply returns None.

We can then plot these results. GravyFlow includes a number of plotting functions to aid visualisation. These functions utalise Bokeh, a plotting library that can create interactive HTML plots.

In [6]:
# Generate a plot for the onsource white noise strain.
# The first element of white_onsource (white_onsource[0]) is used for plotting.
white_onsource_strain_plot = gf.generate_strain_plot(
    {"Onsource Noise": white_onsource[0]},
    title="Onsource White Noise"
)

# Generate a plot for the offsource white noise strain.
# The first element of white_offsource (white_offsource[0]) is used for plotting.
white_offsource_strain_plot = gf.generate_strain_plot(
    {"Offsource Noise": white_offsource[0]},
    title="Offsource White Noise"
)

# Create a layout for the plots. 
# This layout arranges the onsource and offsource strain plots side by side.
white_plot_layout: List = [
    [white_onsource_strain_plot, white_offsource_strain_plot]
]

# Arrange the plots in a grid layout and display them in the notebook.
grid = gridplot(white_plot_layout)
output_notebook()
show(grid)

### Coloured Noise Example

Generating coloured noise is very similar to generating white noise. However, in this case, we must specify which interferometer we wish to simulate. GravyFlow currently uses the O3 detector PSD specification to colour the noise based on the chosen detector.

In [8]:
# Initialize the colored noise generator:
# For colored noise, we need to specify an interferometer.
colored_noise: gf.NoiseObtainer = gf.NoiseObtainer(
    noise_type=gf.NoiseType.COLORED,
    ifos=gf.IFO.L1  # Specify the interferometer, e.g., LIGO Livingston (L1).
)

# Using the environment 'env' created earlier with gf.env()
with env:
    # Generate colored noise by calling the colored_noise object with one example 
    # per batch. The next() function is used to retrieve the generated noise data.
    colored_onsource, colored_offsource, _ = next(
        colored_noise(num_examples_per_batch=1)
    )
    
    # In the case of real and pseudo-real noise, the third element returned would 
    # be the GPS start time of the real noise segment. For simulated noise, such 
    # as colored noise, this simply returns None.

2024-02-28 05:39:37.817760: I external/local_tsl/tsl/platform/default/subprocess.cc:304] Start cannot spawn child process: No such file or directory


Again, we can plot these results:

In [9]:
# Generate a plot for the onsource colored noise strain.
# The first element of colored_onsource (colored_onsource[0]) is used for plotting.
colored_onsource_strain_plot = gf.generate_strain_plot(
    {"Onsource Noise": colored_onsource[0]},
    title="Onsource Coloured Noise"
)

# Generate a plot for the offsource colored noise strain.
# The first element of colored_offsource (colored_offsource[0]) is used for plotting.
colored_offsource_strain_plot = gf.generate_strain_plot(
    {"Offsource Noise": colored_offsource[0]},
    title="Offsource Coloured Noise"
)

# Create a layout for the plots. 
# This layout arranges the onsource and offsource strain plots side by side.
colored_plot_layout: List = [
    [colored_onsource_strain_plot, colored_offsource_strain_plot]
]

# Arrange the plots in a grid layout and display them in the notebook.
grid = gridplot(colored_plot_layout)
output_notebook()
show(grid)

## Obtaining Real Noise

If we want to generate pseudo-random noise from real data or obtain samples of real data, we must create an additional object to pass to the noise generator, an instance of `gf.IFODataObtainer`. This object contains various parameters which specify which data to collect:

- `observing_runs` : Union[`gf.ObservingRun`, List[`gf.ObservingRun`]]:
  > Specify which observing run you would like to sample data from (multiple observing runs not yet supported). By default, the data obtainer will pull a random sample from a random time during the chosen observing run, which satisfies the other selection criteria given by data quality and data labels. If you want to retrieve data from a custom GPS range, this can be achieved using the overrides dictionary.

- `data_quality` : `gf.DataQuality`:
  > Specify what kind of data to acquire. Currently, only supports `gf.DataQuality.BEST`, which retrieves the cleaned output channels with lines removed.

- `data_labels` : Union[`gf.DataLabel`, List[`gf.DataLabel`]]:
  > The data labels parameter specifies which features to include or exclude from our sample pool. The three types of data labels are `gf.DataLabel.NOISE`, `gf.DataLabel.EVENT`, and `gf.DataLabel.GLITCH`. Glitches are mapped using the GravitySpy glitch database, and events are any event or candidate event listed in a GWTC catalogue. For example, to return only noise and glitches, use this list: `[gf.DataLabel.NOISE, gf.DataLabel.GLITCH]`, which excludes known and possible event times from the returned data. If you wish to exclude glitches, use just `gf.DataLabel.NOISE`. Note that excluding glitches will slightly increase preprocessing time, as they are numerous. Currently, the noise obtainer does not support extracting features only, such as only events or only glitches.

- `segment_order` : gf.SegmentOrder = `gf.SegmentOrder.RANDOM`:
  > This parameter specifies the order in which segments are retrieved by the iterator. Options are `gf.SegmentOrder.RANDOM`, where the order is randomized deterministically based on the current seed, `gf.SegmentOrder.CHRONOLOGICAL`, where segments are returned in order of their GPS times, and `gf.SegmentOrder.SHORTEST_FIRST`, where the shortest segment is returned first. This is primarily used for debugging. `gf.SegmentOrder.RANDOM` is recommended for most use cases. The default value is `gf.SegmentOrder.RANDOM`.

- `max_segment_duration_seconds` : float = 2048.0:
  > This parameter determines the maximum length of downloaded data segments. GravyFlow downloads data in segments and then distributes these segments into smaller examples until that segment is exhausted, at which point it downloads the next segment. This approach means that several segments in a row will be drawn from similar GPS times, which are no greater than this value apart, reducing mixing. If a greater mix of data from across your input range is desired, use a lower value for this number. Note that this will increase data acquisition overhead. If less mixing is necessary, experiment with larger values. Be aware that larger values will result in higher memory usage. The default value is 2048.0 s.

- `saturation` : float = 1.0:
  > This parameter determines how many examples to create from every downloaded segment. If this is one, then one second of example data will be generated for every second of segment data. The default value is 1.0.

- `force_acquisition` : bool = False:
  > If true, this parameter forces the data_obtainer to acquire and save new segment data even if it finds cached segment data. The default is False.

- `cache_segments` : bool = True:
  > If true, this parameter will cause the IFODataObtainer object to save downloaded segments to an HDF5 file in a location specified by its containing DataObtainer. When running the iterator for a second time with the same parameters, the IFODataObtainer will load the saved data rather than downloading it again, unless `force_acquisition` is True. The default is True.

- `overrides` : dict = None:
  > This parameter lets you set more specific GPS time ranges by overriding the parameters given by the inputted `gf.ObservingRun` Enum. For example, this override dictionary could be passed to restrict the GPS times to a specific range: `{"start_gps_times": ...}`

- `logging_level` : int = `logging.WARNING`:
  > Specifies the logging level for the data acquisition process.

In [10]:
# Setup the IFODataObtainer object:
# This object will be used to obtain real interferometer data based on specified parameters.
ifo_data_obtainer: gf.IFODataObtainer = gf.IFODataObtainer(
    observing_runs=gf.ObservingRun.O3, # Specify the observing run (e.g., O3).
    data_quality=gf.DataQuality.BEST,  # Choose the quality of the data (e.g., BEST).
    data_labels=[                      # Define the types of data to include.
        gf.DataLabel.NOISE, 
        gf.DataLabel.GLITCHES
    ],
    segment_order=gf.SegmentOrder.RANDOM, # Order of segment retrieval (e.g., RANDOM).
    force_acquisition=True,               # Force the acquisition of new data.
    cache_segments=False                  # Choose not to cache the segments.
)

With the `gf.IFODataObtainer` object set up, we can now proceed to initialize our noise generator. The `gf.NoiseObtainer` will use the `gf.IFODataObtainer` instance to pull real noise data based on our specified criteria. This approach allows us to generate noise samples generated from observational data.

In this step, we also specify the type of noise we wish to generate (`gf.NoiseType.REAL`) and select the interferometer (`gf.IFO.L1`). By choosing `gf.NoiseType.REAL`, we ensure that the noise samples are pulled from real interferometric data, providing us with authentic noise characteristics for our analysis.

In [11]:
# Initialize the noise generator wrapper:
# This wrapper will use the ifo_data_obtainer to generate real noise based on the specified parameters.
noise: gf.NoiseObtainer = gf.NoiseObtainer(
    ifo_data_obtainer=ifo_data_obtainer, # Use the previously set up IFODataObtainer object.
    noise_type=gf.NoiseType.REAL,        # Specify the type of noise as REAL.
    ifos=gf.IFO.L1                       # Specify the interferometer (e.g., LIGO Livingston L1).
)

We can then use this object to create a Python generator, just as before:

In [12]:
# Use the TensorFlow environment 'env' created earlier with gf.env()
with env:
    # Generate noise by calling the noise object with one example per batch.
    # The next() function retrieves the generated noise data.
    # This returns onsource noise, offsource noise, and GPS time of the noise segment.
    onsource, offsource, gps_times = next(noise(num_examples_per_batch=1))

  return (cert.not_valid_after - datetime.utcnow()).total_seconds()
  return (cert.not_valid_after - datetime.utcnow()).total_seconds()
2024-02-28 05:39:58.628628: I external/local_xla/xla/service/service.cc:168] XLA service 0x5622aedf7cd0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2024-02-28 05:39:58.628660: I external/local_xla/xla/service/service.cc:176]   StreamExecutor device (0): Tesla V100-SXM2-16GB, Compute Capability 7.0
2024-02-28 05:39:58.660855: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
2024-02-28 05:39:58.780607: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:454] Loaded cuDNN version 8904
I0000 00:00:1709127598.848102  213052 device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


Similarly, we can plot the resultant real noise:

In [13]:
# Generate a plot for the onsource noise strain.
# The first element of onsource (onsource[0]) is used for plotting,
# and the corresponding GPS time is included in the title.
onsource_gps_time = gps_times[0][0].numpy()
onsource_strain_plot = gf.generate_strain_plot(
    {"Onsource Noise": onsource[0]},
    title=f"Onsource Real Noise at {onsource_gps_time:.0f} s"
)

# Generate a plot for the offsource noise strain.
# The first element of offsource (offsource[0]) is used for plotting,
# and the corresponding GPS time is included in the title.
offsource_gps_time = gps_times[0][0].numpy() - 16.5
offsource_strain_plot = gf.generate_strain_plot(
    {"Offsource Noise": offsource[0]},
    title=f"Offsource Real Noise at {offsource_gps_time:.0f} s"
)

# Create a layout for the plots.
# This layout arranges the onsource and offsource strain plots side by side.
layout: List = [[onsource_strain_plot, offsource_strain_plot]]

# Arrange the plots in a grid layout and display them in the notebook.
grid = gridplot(layout)
output_notebook()
show(grid)

The process for acquiring pseduo-real noise is identical to real, using `gf.NoiseType.PSEUDO_REAL' rather than `gf.NoiseType.REAL'.

## Using Noise Objects as Iterators

Now that we have set up our noise generator, we can use it as an iterator to generate multiple samples of noise. This is particularly useful when we need to process or analyze a series of noise examples, such as for training machine learning models or conducting statistical analyses. In this section, we will demonstrate how to iterate over the `gf.NoiseObtainer` object to generate a specified number of noise samples.

By iterating over the noise() object within a for loop, we can efficiently generate multiple sets of onsource and offsource noise, along with their corresponding GPS times. This approach allows us to handle each noise sample individually, providing flexibility for a wide range of applications.

Let's go through a simple example where we iterate over the noise generator to retrieve multiple noise samples:

In [14]:
# Set the number of iterations - the number of batches of noise samples we want 
# to generate:
num_iterations: int = 16

# Using the TensorFlow environment 'env' created earlier with gf.env()
with env: 
    # Iterate over the noise generator using islice to limit the number of iterations
    for onsource, offsource, gps_times in islice(noise(), num_iterations):
        # For each iteration, we receive a batch of onsource and offsource noise 
        # data samples along with the corresponding GPS times.
        # Here, we simply print a message indicating the reception of new noise 
        # examples.
        print(
            (f"Got {onsource.shape[0]} more noise examples! Should probably do "
             "something useful with them!")
        )

  return (cert.not_valid_after - datetime.utcnow()).total_seconds()


Got 32 more noise examples! Should probably do something useful with them!
Got 32 more noise examples! Should probably do something useful with them!
Got 32 more noise examples! Should probably do something useful with them!
Got 32 more noise examples! Should probably do something useful with them!
Got 32 more noise examples! Should probably do something useful with them!
Got 32 more noise examples! Should probably do something useful with them!
Got 32 more noise examples! Should probably do something useful with them!
Got 32 more noise examples! Should probably do something useful with them!
Got 32 more noise examples! Should probably do something useful with them!
Got 32 more noise examples! Should probably do something useful with them!
Got 32 more noise examples! Should probably do something useful with them!
Got 32 more noise examples! Should probably do something useful with them!
Got 32 more noise examples! Should probably do something useful with them!
Got 32 more noise example

## Acquiring Data from Multiple Interferometers

In addition to generating noise samples from a single interferometer, GravyFlow provides the capability to acquire data from multiple interferometers simultaneously. By specifying multiple interferometers in the `gf.NoiseObtainer` object, we can seamlessly retrieve noise data from each of them within the same workflow.

When assigning segment times, GravyFlow creates splits that are present in all detectors and pairs valid segments together. These paired segments are then retrieved in union when outputs from multiple detectors are requested.

Let's set up a `gf.NoiseObtainer` object to acquire real noise data from both LIGO Livingston (L1) and LIGO Hanford (H1) interferometers:

In [15]:
# Initialize a NoiseObtainer object to acquire data from multiple interferometers
multi_ifo_noise: gf.NoiseObtainer = gf.NoiseObtainer(
    ifo_data_obtainer=ifo_data_obtainer,  # Reuse the previously set up IFODataObtainer object.
    noise_type=gf.NoiseType.REAL,         # Specify the type of noise as REAL.
    ifos=[gf.IFO.L1, gf.IFO.H1]           # Specify multiple interferometers (LIGO Livingston and LIGO Hanford).
)

We then call the `gf.NoiseObtainer` object, to create a Python generator, and get one element from that generator:

In [16]:
# Using the multi_ifo_noise object to generate a batch of noise samples
# from multiple interferometers (LIGO Livingston and LIGO Hanford).
# The next() function is used to retrieve the generated noise data.
multi_onsource, multi_offsource, multi_gps_times = next(
    multi_ifo_noise(num_examples_per_batch=1)
)

# This operation will yield onsource noise, offsource noise, and GPS times
# for the noise segment from both interferometers.

  return (cert.not_valid_after - datetime.utcnow()).total_seconds()
  return (cert.not_valid_after - datetime.utcnow()).total_seconds()


And plot the multi-detector noise:

In [17]:
# Generate a strain plot for the onsource noise obtained from multiple interferometers.
# The first element of multi_onsource (multi_onsource[0]) is used for plotting,
# and the corresponding GPS time is included in the title.
multi_onsource_strain_plot = gf.generate_strain_plot(
    {"Onsource Noise": multi_onsource[0]},
    title=[
        f"Onsource Real L1 Noise at {multi_gps_times[0][0]:.0f} s",
        f"Onsource Real H1 Noise at {multi_gps_times[0][1]:.0f} s",
    ]
)

# Generate a strain plot for the offsource noise obtained from multiple interferometers.
# The first element of multi_offsource (multi_offsource[0]) is used for plotting,
# and the corresponding GPS time is included in the title.
multi_offsource_strain_plot = gf.generate_strain_plot(
    {"Offsource Noise": multi_offsource[0]},
    title=[
        f"L1 Offsource Real L1 Noise at {multi_gps_times[0][0]:.0f} s",
        f"H1 Offsource Real H1 Noise at {multi_gps_times[0][1]:.0f} s",
    ]
)

# Create a layout for the plots. 
# This layout arranges the onsource and offsource strain plots from multiple interferometers 
# side by side.
multi_layout: List = [[multi_onsource_strain_plot, multi_offsource_strain_plot]]

# Arrange the plots in a grid layout and display them in the notebook.
multi_grid = gridplot(multi_layout)
output_notebook()
show(multi_grid)

## Summary of Notebook 2: Data Acquisition

In this notebook, we have explored various functionalities of GravyFlow, particularly in the context of noise acquisition and processing. Key highlights include:

1. **Global Defaults**: Setting global defaults in GravyFlow for consistent parameters across our analysis.

2. **Noise Generation**: We learned how to generate different types of noise, including White and Coloured Gaussian Noise, using `gf.NoiseObtainer`.

3. **Iterating Over Noise Samples**: Demonstrated the use of `gf.NoiseObtainer` as an iterator to generate multiple noise samples, ideal for batch processing or model training.

4. **Multi-Interferometer Data Acquisition**: Showcased the capability to acquire noise data from multiple interferometers simultaneously, enhancing the depth of analysis.

5. **Visualization with Strain Plots**: Utilized GravyFlow's plotting capabilities to visualize onsource and offsource noise data.

6. **Real Noise Data Handling**: Explored the use of `gf.IFODataObtainer` for acquiring real noise data, along with the handling of GPS times and segment pairing.

The insights gained from this notebook lay the foundation for our next steps in gravitational wave data analysis.

In the next notebook, we will delve into the generation of gravitational waveforms. Building on the noise data acquired and processed in this notebook, we will focus on creating and injecting waveforms into the background noise. This process is crucial for simulating realistic gravitational wave signals and preparing datasets for tasks like signal detection and parameter estimation.

We will explore various waveform models available in GravyFlow, understand how to parameterize these waveforms, and learn techniques for effectively injecting them into noise data. The goal is to create a robust and realistic dataset that mimics actual gravitational wave signals, paving the way for advanced analyses and model training.

## Appendix: Exploring The Data Acquisition Process

In [None]:
# Setup the IFODataObtainer object:
# This object will be used to obtain real interferometer data based on specified parameters.
ifo_data_obtainer: gf.IFODataObtainer = gf.IFODataObtainer(
    observing_runs=gf.ObservingRun.O3,         # Specify the observing run (e.g., O3).
    data_quality=gf.DataQuality.BEST,          # Choose the quality of the data (e.g., BEST).
    data_labels=[                              # Define the types of data to include.
        gf.DataLabel.NOISE, 
        gf.DataLabel.GLITCHES
    ],
    segment_order=gf.SegmentOrder.RANDOM,      # Order of segment retrieval (e.g., RANDOM).
    force_acquisition=True,                    # Force the acquisition of new data.
    cache_segments=False                       # Choose not to cache the segments.
	logging_level=Logging.INFO
)