

![KLE TU](https://pbs.twimg.com/media/CEvKZ8CUsAAj3Ll.jpg)
<h2 style="text-align:center;">Data Mining and Analysis Course Project ( 2021 )</h2>

***
<h3 style="text-align:center;"> Team - 5D03</h3>

- [Avantika Shrivastava](https://www.kaggle.com/avantikashrivastava) *- 01FE19BCS253* 
- [Tanmayi Shurpali](https://www.kaggle.com/t01fe19bcs238)     *- 01FE19BCS238*
- [Shrinidhi Kulkarni](https://www.kaggle.com/shrinidhi05)   *- 01FE19BCS241* 
- [Bhavana Kumbar](https://www.kaggle.com/bhavanakumbar)        *- 01FE19BCS244*

***

## Contents of this Notebok
1. Imports
2. Reading DataSet
3. Explorative Data Analysis
4. Preprocessing
5. Model Training
6. Evaluation
7. Results and Conclusion

***
***Kaggle Challange Name:*** [G2Net Gravitational Wave Detection](https://www.kaggle.com/c/g2net-gravitational-wave-detection) - (Submission Deadline : 30th September, 2021)
***
***Introduction*** : 
*Gravitational Waves have been discussed since the beginning of the 20th century, and scientifically researched since the Einstein's General Theory of Relativity. They are caused by massive celestial bodies, like the Neutron Stars or ***Black Holes***, when they accelerate they cause gravitational waves, in the form of waves, propagating through the curvature of space-time at the speed of light. These disturbances can be felt on the other side of the observable universe, but are extremely weak as they lose energy as gravitational radiation. It can be imagined similar to throwing a pebble in the pond, the site where the pebble hits water is the source of the disturbance and the outgoing ripples, are the gravitational waves, that get weaker as they move away from the source.In February 2015, the Laser Interferometer Gravitational-wave Observatory ***(LIGO) Scientific Collaboration and the Virgo Collaboration*** announced the first observation of a Gravitational-Wave (GW) signal from a ***stellar-mass Compact Binary Coalescence (CBC) system** .Despite all the initial successes, the future of GW astronomy is facing many challenges. Because of the effectiveness of ML algorithms in identifying patterns in data, ML techniques may be harnessed to make all these searches more sensitive and robust. Applications of ML algorithms to GW searches range from building automated data analysis methods for low-latency pipelines to distinguishing terrestrial noise from astrophysical signals and improving the reach of searches.*


![KLE TU](https://media2.giphy.com/media/xT9IgoYWAh5lYliiYM/giphy.gif)


***Problem Statement*** : *To preprocess data then build, train & evaluate binary classification model to predict if the given set of signals has Gravitational Waves in them or not.*

***Objectives :*** 
1. *To understand and visulaize raw data using EDA methods*
2. *To build and train model using Deep Learning Techniques*
3. *To evaluate model using metrics like ROC AUC (receiver operating charateristics area under curve)*

***
Note: This notebook was developed and run in the kaggle notebook environment.

# 1.Imports

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing
import matplotlib.pyplot as plt # plotting tools
%matplotlib inline
import seaborn as sns
sns.set()
plt.rcParams["axes.grid"] = False

import matplotlib.mlab as mlab
from scipy import signal
from scipy.interpolate import interp1d
from scipy.signal import butter, filtfilt, iirdesign, zpk2tf, freqz
# Train test split
from sklearn.model_selection import train_test_split

from glob import glob
from tqdm import tqdm

# Import tensorflow
import tensorflow as tf

# Model & compile arguments
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam

# Get the layers
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import MaxPooling2D
from tensorflow.keras.layers import GlobalAveragePooling2D
from tensorflow.keras.layers import Flatten

# Import the Efficientnet models
from tensorflow.keras.applications import EfficientNetB0

# TF model metrics
from tensorflow.keras.metrics import AUC

# 
import librosa
import torch

# (Install &) Import the nnAudio library for Constant Q-Transform
try:
    from nnAudio.Spectrogram import CQT1992v2
except:
    !pip install -q nnAudio
    from nnAudio.Spectrogram import CQT1992v2
#Note: The nnAudio's CQT1992v2 is used, instead of GWpy from analysis notebook, to transform the wave data into Constant Q-Transform spectrograms because this performs the operation much faster and is GPU compatible.
    
from IPython.display import HTML

# 2. Reading Datasets

   1. **train/** - the training set files, one npy file per observation; labels are provided in a files shown below
   2. **test/** - the test set files; you must predict the probability that the observation contains a gravitational wave
   3. **training_labels.csv** - target values of whether the associated signal contains a gravitational wave 
   4. **sample_submission.csv** - a sample submission file in the correct format

In [None]:
## Get the training ids
train = pd.read_csv('../input/g2net-gravitational-wave-detection/training_labels.csv')

# Get the subsmission file
sample_sub = pd.read_csv('../input/g2net-gravitational-wave-detection/sample_submission.csv')

# 3. Explorative Data Analysis


In [None]:
print(f'Training labels: {train.shape[0]} | Test dataset: {sample_sub.shape[0]}')

There are **5,60,000** records in train data and **2,26,000** records in Test data

In [None]:
#Checking train data contents
train.head()

In [None]:
train.shape

- There are **5,60,000 rows** and **2 columns.**
- The attributes are **'id'** and **'target'**

In [None]:
train['target'].value_counts()

**'target'** has value either *'0' or '1'*.

In [None]:
train['id'].value_counts()

**'id'** has 5,60,000 unique values.

In [None]:
#Checking for any Null Values
train.isnull().sum()

There are no *null* values in the file

In [None]:
sns.countplot(data=train, x="target")


**Inference**
- The data is evenly distributed with 50-50 division between the samples with and without gravitational waves signal.

**Insights**
- The data is binary classified.
- target *'0'* indicates absence of GW signal i.e **only noise** where as *'1'* indicates presence of GW signal i.e **GW signal + noise.**

## 3.2 Raw Data Visualization
To visualize raw signal, we load any one np array file as sample

In [None]:
#Obtain path of all 5,60,000 files in ./train
train_path = glob('../input/g2net-gravitational-wave-detection/train/*/*/*/*')

In [None]:
print("The total number of files in the training set:", len(train_path))

In [None]:
ids = [path.split("/")[-1].split(".")[0] for path in train_path]
paths_df = pd.DataFrame({"path":train_path, "id": ids})
train_data = pd.merge(left=train, right=paths_df, on="id")
train_data.head()

In [None]:
#Load npy files into DataFrame with target 1
file_path = pd.DataFrame(train_data).iloc[0]

#Loading any one file
example_strain = np.load(file_path.path)

In [None]:
#View file contents
print(example_strain)

#Shape of the array
print (example_strain.shape)

**Inference :**
- There are **3** rows.
- Each index(row) has **4096** columns.


**Insights :**
- In the competition description it is given that the observations are recorded from 3  gravitational wave interferometers (LIGO Hanford, LIGO Livingston, and Virgo)
- The quantity in this time series is strain, which is of the order of ~$10^{-20}$, recorded for 2 sec periods sampled at 2048 Hz - 4096 data points.
- The output of a GW detector is a temporal series of the detector strain, h(t).

In [None]:
#Plotting Raw Data from the three detectors

def plot_graph(example_strain):
    plt.figure(figsize=(20,5))

    plt.plot(example_strain[0,:], c="firebrick", label="LIGO Hanford")
    plt.plot(example_strain[1,:], c="mediumseagreen", label="LIGO Livingston")
    plt.plot(example_strain[2,:], c="slateblue", label="Virgo")
    plt.title("Id: "+file_path.id);
    plt.grid("on")
    plt.xlabel("Timestamp");
    plt.legend();

In [None]:
#Plotting graph for strain with target 1
plot_graph(example_strain)

In [None]:
#Load npy files into DataFrame with target 0
file_path = pd.DataFrame(train_data).iloc[1]

#Loading any one file
example_strain_negative = np.load(file_path.path)

#Plotting graph for strain with target 0
plot_graph(example_strain_negative)

**Insights**
- The three signals originating from different detectors all look a bit different.
- It is difficult to infer just by looking whether the given wave has GW signal or not as there is no notable difference between graphs of *target - 0* and *target - 1.*
- The strain is of the order $10^{-20}$, which is extremely small and can be affected by many external factors. However, as seen in both the sample plots, the strain data is a combination of many frequencies and analysing the signals in frequency domain, instead of the time domain, might give us better insights.
- Depending on the location the amplitude recorded varies ( smaller amplitudes indicating *weak signal* detection.
- Astrophysical signals have typical amplitudes comparable to the detector background noise. Therefore, characterization and reduction of detector noise is essential to GW searches. 
- The interferometer is sensitive towards gravitational waves but unfortunately also for terrestrial forces and displacements.This may also include vibrations of the instruments themselves etc.. This kind of forces cause streching of the interferometer arms and this leads to constructive interferance and the waves we can see above.

### Typical signal processing workflow
Next, we try to implement the steps from this paper by LIGO by following these steps:
- Plot the raw signal
- Window the signal
- Whiten the signal
- Bandpass the signal

## Raw Data Visulaization - ASD
A Fourier Transform is the most commonly used method in maths and signal processing, to decompose the signals into its constituent discrete frequencies. This spectrum of frequencies can be analyzed based on average, power or energy of the signal to get a spectral density plot. As it says, one of the ways to visualize a raw signal in frequency domain is by plotting the amplitude spectral density (ASD).

In [None]:
#define some signal parameters
sample_rate = 2048 #Hz (1/seconds)
time_span = 2 # each signal lasts 2 s
signal_length = time_span
samples_total = time_span * sample_rate
dt = 1/(samples_total) #4096 points in total
dt

channel = 1 #picking detector 1

In [None]:
# set of observatories
obs_list = ('LIGO Hanford', 'LIGO Livingston', 'Virgo')

In [None]:
# Gravitational wave analysis python library
try:
    import gwpy
except:
    !pip install -q --user gwpy
    import gwpy
from gwpy.timeseries import TimeSeries

In [None]:
# function to plot the amplitude spectral density (ASD) plot
def plot_asd(sample_id):
    # Get the data
    sample = example_strain
    
    # we convert the data to gwpy's TimeSeries for analysis
    for i in range(sample.shape[0]):
        ts = TimeSeries(sample[i], sample_rate=sample_rate)
        ax = ts.asd(signal_length).plot(figsize=(12, 5)).gca()
        ax.set_xlim(10, 1024);
        ax.set_title(f"ASD plots for sample: {sample_id} from {obs_list[i]}");

In [None]:
# plot ASD for sample w/ GW
plot_asd(file_path.id)

These plots are plotted on a log scale for x-axis, and we see that it ranges from 10 Hz ~ 1000 Hz. Although, these limits are for visualization purposes only, it helps us see some peaks for each observatory. A particular frequency can be peculiar in one measurement but remember that the GW signal has to be detected in all three waves to be confirmed. This data here still seems a bit noisy and as showed in the tutorial, if sampled for longer periods of time (on real data), it can give some valuable insights. However, the data in this competition is simulated and we try to find other ways to visualize it.
***
### Power Spectral Density Plots

In [None]:
#Computing Power Series Density
plt.figure(figsize=(20,5))

fhat = np.fft.fft(example_strain[channel,:], samples_total)
PSD = fhat * np.conj(fhat) / samples_total
freq = 1/(dt*samples_total) * np.arange(samples_total)


L = np.arange(1, np.floor(samples_total/2), dtype="int")
plt.plot(freq[L],PSD[L], '.-')
plt.grid("on")
plt.xlabel("Frequency Hz");
plt.title("Power spectral density");
plt.yscale("log")

## Insights
- The steep shape at low frequencies is dominated by noise related to ground motion. 
- Above roughly 100 Hz, the Advanced LIGO detectors are currently quantum noise limited, and their noise curves are dominated by shot noise. 
- High amplitude noise features are also present in the data at certain frequencies, including lines due to the AC power grid (harmonics of 60 Hz in the U.S. and 50 Hz in Europe), mechanical resonances of the mirror suspensions, injected calibration lines, and noise entering through the detector control systems. 

In [None]:
#breaking signal into FFT components
show_side_effects = True
fig, ax = plt.subplots(6,1,figsize=(20,15))

ax[0].plot(example_strain[channel])
ax[1].plot(np.fft.ifft((PSD>1e-38)*fhat))
ax[2].plot(np.fft.ifft(((PSD>1e-40) & (PSD <= 1e-38))*fhat))
ax[3].plot(np.fft.ifft(((PSD>1e-42) & (PSD <= 1e-40))*fhat))
ax[4].plot(np.fft.ifft(((PSD>0.5e-42) & (PSD <= 1e-42))*fhat))
ax[5].plot(np.fft.ifft((PSD<=0.5e-42)*fhat))

if not show_side_effects:
    for n in range(3,6):
        ax[n].set_xlim(20,2000)

- The first step in many LVC analyses is to Fourier transform the time-domain data using a fast Fourier transform (FFT) . - - There are interesting kind of side effects in the beginning and the end of each wave after doing the inverse Fourier transform.
- Since the FFT implicitly assumes that the stretch of data being transformed is periodic in time, window functions have to be applied to the data to suppress spectral leakage using e.g. a Tukey (cosine-tapered) window function. 
- Failing to window the data will lead to spectral leakage and spurious correlations in the phase between bins. 

### Apply Window Functions - Removing Spectral leakages

In [None]:
from scipy import signal
from scipy.interpolate import interp1d
from scipy.signal import butter, filtfilt, iirdesign, zpk2tf, freqz

In [None]:
hp_window = 1
hp_tukey_alpha = 0.125
fband = [35.0, 200.0]

In [None]:
blackman_window = signal.blackman(int(samples_total*hp_window)) #signal.tukey(strain, alpha=1./8)
tukey_window = signal.tukey(samples_total*hp_window, hp_tukey_alpha)

In [None]:
fig, ax = plt.subplots(3,1,figsize=(20,15))

#plotting raw data
ax[0].plot(example_strain[channel])
ax[0].set_title("Original data")

#plotting data with blackman 
ax[1].plot(example_strain[channel]*blackman_window)
ax[1].set_title("With blackman window applied")

#plotting data with tukey window
ax[2].plot(example_strain[channel]*tukey_window)
ax[2].set_title("With tukey window applied");

For the analysis of transient data the use of **Tukey windows** is advantageous as signals will suffer less modification than, **Blackman windows**.

In [None]:
windowed_strain = example_strain[channel]*tukey_window

### Whitening - Making signal more Uniform

- Whitening the data is suppressing the extra noise at low frequencies and at the spectral lines, to better see the weak signals in the most sensitive band.
- It is always one of the first steps in astrophysical data analysis (searches, parameter estimation).
- It requires no prior knowledge of spectral lines, etc; only the data are needed.

In [None]:
#Whitening data to make signal more uniform
def whiten(strain, samples_total, dt):
    # TODO: normalization 
    
    fhat = np.fft.fft(strain, samples_total)
    PSD = fhat * np.conj(fhat) / samples_total
    freq = 1/(dt*samples_total) * np.arange(samples_total)
    
    # scipy interp1d interpolation
    interp_psd = interp1d(freq, PSD, "nearest")
    
    w_fhat = fhat/np.sqrt(interp_psd(freq))
    w_strain = np.fft.ifft(w_fhat)
    return w_strain, interp_psd(freq)

In [None]:
w_strain, ip = whiten(windowed_strain, samples_total, dt)

In [None]:
#plotting whitened data
fig, ax = plt.subplots(2,1,figsize=(20,10))
ax[0].plot(np.log(ip[0:1024]), '-o')
ax[0].set_title("Interpolated PSD")
ax[0].set_xlabel("Frequency Hz")
ax[0].set_ylabel("Sn(t)")
ax[1].plot(w_strain, '-.')
ax[1].set_ylabel("dw(t)")
ax[1].set_xlabel("Timestamp")


This is the whitened signal. Next, since we know this data is from merger binary black holes, the frequency is in lower range and this we apply a bandpass filter to passthrough signals between 35 ~ 350 Hz.
***
### Bandpass Filter - Filtering signal for certain bandwidth

In [None]:
#Defining bandpass filter
def bandpass(strain, fband, fs):
    """Bandpasses strain data using a butterworth filter.
    
    Args:
        strain (ndarray): strain data to bandpass
        fband (ndarray): low and high-pass filter values to use
        fs (float): sample rate of data
    
    Returns:
        ndarray: array of bandpassed strain data
    """
    bb, ab = butter(4, [fband[0]*2./fs, fband[1]*2./fs], btype='band')
    normalization = np.sqrt((fband[1]-fband[0])/(fs/2))
    strain_bp = filtfilt(bb, ab, strain) / normalization
    return strain_bp

#Applying bandpass filter
bandpassed_strain = bandpass(w_strain, fband, samples_total)

#plotting bandpass filtered data
plt.figure(figsize=(20,5))
plt.plot(bandpassed_strain, '-')

**Constant Q-Transform**
- The signal analysis didn't provide much insights, so let's try the second method in signal processing. Tranforming the waves into spectrograms images, i.e. frequency-domain, and then visualize them. This technique is widely used in audio analysis and since our data is a wave with bunch of frequencies, we can use the same technique as well.
- The advantage of using a spectrogram, over a direct Fourier Transform where you lose time info, is that it captures the shift or change in frequencies over time and this removes white noise frequencies that are persistent, leaving the signals of interest. Constant Q-Transform is one way to visualize the spectrogram.

In [None]:
# function to plot the Q-transform spectrogram
def plot_q_transform(sample_id,example_strain):
    # Get the data
    sample = example_strain
    
    # we convert the data to gwpy's TimeSeries for analysis
    for i in range(sample.shape[0]):
        ts = TimeSeries(sample[i], sample_rate=sample_rate)
        ax = ts.q_transform(whiten=True).plot().gca()
        ax.set_xlabel('')
        ax.set_title(f"Spectrogram plots for sample: {sample_id} from {obs_list[i]}")
        ax.grid(False)
        ax.set_yscale('log');

In [None]:
# plot the Q-transform for sample w/ GW
plot_q_transform(file_path.id,example_strain)

- Visibly, all three signals have different features and the above were plotted from a sample which has gravitational waves, and it shows the famous 'chirp' confirming the presence of gravitational waves. 
- This transformation removes the unwanted noise frequencies, but still some of it remains, but a signal has to be detected in all three waves to be predicted as gravitational wave.
- Next, we can compare how the Q-Transforms look for samples with and without gravitational wave signals.

In [None]:
sample_gw_id = pd.DataFrame(train_data).iloc[0].id
sample_no_gw_id = pd.DataFrame(train_data).iloc[1].id

# function to plot the Q-transform spectrogram side-by-side
def plot_q_transform_sbs(sample_gw_id, sample_no_gw_id,example_strain,example_strain_negative ):
    # Get the data
    sample_gw = example_strain
    sample_no_gw = example_strain_negative
    
    for i in range(len(obs_list)):
        # get the timeseries
        ts_gw = TimeSeries(sample_gw[i], sample_rate=sample_rate)
        ts_no_gw = TimeSeries(sample_no_gw[i], sample_rate=sample_rate)
        
        # get the Q-transform
        image_gw = ts_gw.q_transform(whiten=True)
        image_no_gw = ts_no_gw.q_transform(whiten=True)

        plt.figure(figsize=(20, 10))
        plt.subplot(131)
        plt.imshow(image_gw)
        plt.title(f"id: {sample_gw_id} | Target=1")
        plt.grid(False)

        plt.subplot(132)
        plt.imshow(image_no_gw)
        plt.title(f"id: {sample_no_gw_id} | Target=0")
        plt.grid(False)
        
        plt.show()


In [None]:
# let's plot two spectrograms for sample w/ and w/o GW signal side-by-side
plot_q_transform_sbs(sample_gw_id, sample_no_gw_id,example_strain,example_strain_negative )

- Apart from a few hints, we cannot say for sure that the difference between the waves with and without GW signals is obvious. 
- There can be some cleaning or filtering we can apply to remove the noise further but that's where the Deep Learning shines. 
- The things we can't detect visually, machine learning can. Next, in the modelling notebook, we build data pipelines, transform the data to spectrograms, and build models to make the predictions.
***

# 4.Preprocessing Methods
Astrophysical signals have typical amplitudes comparable to the detector background noise. Therefore, characterization and reduction of detector noise is essential to GW searches. 
We follow signal processing methodology to preprocess signals, converting the time domain data to frequency domain, converting to Constant Q-Transform images and using these as input to our model training step. 

There are mainly two ways in which we can preprocess this type of data to train our models:

1.  **Using the time series data,** and performing some cleaning steps to enhance the signal, remove noise, as described in publications by B P Abbott et al. and Daniel George et al.Typical signal processing workflow
Next, we try to implement the steps from this paper referenced above by following these steps:
Plot the raw signal
Window the signal
Whiten the signal
Bandpass the signal
1. **Getting the Constant Q-Transformed spectrogram image,** which is a frequency-domain fourier transformed data, while treating the sample being analyzed as a wave.

**Creating TF Data pipeline :**
Next, we create the TensorFlow input data pipeline. This is crucial as loading such a huge dataset can create a bottleneck on the entire workflow and can cause memory overload.

In [None]:
# function to return the npy file corresponding to the id
def get_npy_filepath(id_, is_train=True):
    path = ''
    if is_train:
        return f'../input/g2net-gravitational-wave-detection/train/{id_[0]}/{id_[1]}/{id_[2]}/{id_}.npy'
    else:
        return f'../input/g2net-gravitational-wave-detection/test/{id_[0]}/{id_[1]}/{id_[2]}/{id_}.npy'

In [None]:
# let's define some signal parameters
sample_rate = 2048 # data is provided at 2048 Hz
signal_length = 2 # each signal lasts 2 s
fmin, fmax = 20, 1024 # filter above 20 Hz, and max 1024 Hz (Nyquist freq = sample_rate/2)
hop_length = 64 # hop length parameter for the stft

# model compile params
batch_size = 250 # size in which data is processed and trained at-once in model
epochs = 3 # number of epochs (keep low as dataset is quite large 3~5 is enough as observed)

In [None]:
# Define the Constant Q-Transform
transform = CQT1992v2(sr=sample_rate, fmin=fmin, fmax=fmax, hop_length=hop_length)

# check if GPU enabled, then run the transform on GPU for faster execution
# if tf.test.is_gpu_available():
#     cq_transform = cq_transform.to('cuda')

In [None]:
# function to load the file, preprocess, return the respective Constant Q-transform
# the Cqt function
# preprocess function
def preprocess_function_cqt(path):
    signal = np.load(path.numpy())
    # there are 3 signal as explained before for each interferometers
    for i in range(signal.shape[0]):
        # normalize signal
        signal[i] /= np.max(signal[i])
    # horizontal stack
    signal = np.hstack(signal)
    # tensor conversion
    signal = torch.from_numpy(signal).float()
    # getting the image from CQT transform
    image = transform(signal)
    # converting to array from tensor
    image = np.array(image)
    # transpose the image to get right orientation
    image = np.transpose(image,(1,2,0))
    
    # conver the image to tf.tensor and return
    return tf.convert_to_tensor(image)

In [None]:
image = preprocess_function_cqt(tf.convert_to_tensor(train_data['path'][2]))
print(image.shape)
plt.imshow(image)

In [None]:
# From the Constant Q-Transform that we got, get the shape
input_shape = (69, 193, 1)

***
# 5. Modeling
**Strategy**

- This is essentially a signal processing problem with classification task, there can be two ways in which we can build models around this data, as also mentioned in the [LIGO research paper](https://arxiv.org/pdf/1908.11170.pdf) - using "raw" signals with minimal pre-processing and using "images" by transforming the waves into spectrograms. 
- However, building models on raw signal data, by following the cleaning steps from respective publications, didnot yield acceptable results. It is worth mentioning that only a part of the data was used while strategy selection process, and it was concluded that more pre-processing was necessary, or rather proper pre-processing, if we were to use raw signal.
- Eventually, the second method that we went with in this project, is used to transform the waves into the spectrogram image. We train two models to evaluate the results:
1. Simple CNN- a simple CNN architecture that is a modified version of the model usually used in MNIST Digit Recognizer tutorials. This acts as our baseline model.
2. EfficientNet-a EfficientNetB7 model that has been developed and pre-trained on ImageNet dataset. This model is chosen as it is known for its excellent performance with significantly fewer number of parameters, that can drastically improve the computational efficiency.

In [None]:
# Get the feature ids and target
X = train_data['id']
y = train_data['target'].astype('int8').values

In [None]:
x_train, x_valid, y_train, y_valid = train_test_split(X, y, random_state = 51, stratify = y)

In [None]:
def preprocess_function_parse_tf(path, y=None):
    [x] = tf.py_function(func=preprocess_function_cqt, inp=[path], Tout=[tf.float32])
    x = tf.ensure_shape(x, input_shape)
    if y is None:
        return x
    else:
        return x,y

In [None]:
train_dataset = tf.data.Dataset.from_tensor_slices((x_train.apply(get_npy_filepath).values, y_train))
# shuffle the dataset
train_dataset = train_dataset.shuffle(len(x_train))
train_dataset = train_dataset.map(preprocess_function_parse_tf, num_parallel_calls=tf.data.AUTOTUNE)
train_dataset = train_dataset.batch(batch_size)
train_dataset = train_dataset.prefetch(tf.data.AUTOTUNE)

In [None]:
# valid dataset
# Get the data filepaths as tensor_slices
valid_dataset = tf.data.Dataset.from_tensor_slices((x_valid.apply(get_npy_filepath).values, y_valid))

# apply the map method to tf_parse_function()
valid_dataset = valid_dataset.map(preprocess_function_parse_tf, num_parallel_calls=tf.data.AUTOTUNE)

# set batch size of the dataset
valid_dataset = valid_dataset.batch(batch_size)

# prefetch the data
valid_dataset = valid_dataset.prefetch(tf.data.AUTOTUNE)


## 1. Baseline Model : Simple CNN

In [None]:
#CNN Modeling
train_dataset.take(1)

#Instantiate the Sequential model
model_cnn = Sequential(name='CNN_model')

# Add the first Convoluted2D layer w/ input_shape & MaxPooling2D layer followed by that
model_cnn.add(Conv2D(filters=16,
                     kernel_size=3,
                     input_shape=input_shape,
                     activation='relu',
                     name='Conv_01'))
model_cnn.add(MaxPooling2D(pool_size=2, name='Pool_01'))

# Second pair of Conv1D and MaxPooling1D layers
model_cnn.add(Conv2D(filters=32,
                     kernel_size=3,
                     input_shape=input_shape,
                     activation='relu',
                     name='Conv_02'))
model_cnn.add(MaxPooling2D(pool_size=2, name='Pool_02'))

# Third pair of Conv1D and MaxPooling1D layers
model_cnn.add(Conv2D(filters=64,
                     kernel_size=3,
                     input_shape=input_shape,
                     activation='relu',
                     name='Conv_03'))
model_cnn.add(MaxPooling2D(pool_size=2, name='Pool_03'))

# Add the Flatten layer
model_cnn.add(Flatten(name='Flatten'))

# Add the Dense layers
model_cnn.add(Dense(units=512,
                activation='relu',
                name='Dense_01'))
model_cnn.add(Dense(units=64,
                activation='relu',
                name='Dense_02'))

# Add the final Output layer
model_cnn.add(Dense(1, activation='sigmoid', name='Output'))

In [None]:
# Display the CNN model architecture
model_cnn.summary()

In [None]:
# compile the model with following parameters
# Optimizer: Adam (learning_rate=0.0001)
# loss: binary_crossentropy
# metrics: accuracy/AUC
model_cnn.compile(optimizer=Adam(learning_rate=0.0001),
                  loss='binary_crossentropy',
                  metrics=[[AUC(), 'accuracy']])

# Fit the data
history_cnn = model_cnn.fit(x=train_dataset,
                            epochs=3,
                            validation_data=valid_dataset,
                            batch_size=batch_size,
                            verbose=1)

- The baseline model seems to be converging well only after about 3 epochs. 
- But, it takes almost about an hour to train each epoch, we can only say that the model can be improved with further training and fine-tuning the structure. 
- At the end of 3rd epoch, we see 0.83 AUC score and 0.76 accuracy for training dataset, while 0.84 AUC score and 0.77 accuracy for the validation dataset.

In [None]:
# save the model
model_cnn.save('./model_CNN.h5')

In [None]:
sub = pd.read_csv('../input/g2net-gravitational-wave-detection/sample_submission.csv')
x_test = sub[['id']]

In [None]:
x_test.tail()

In [None]:
# test dataset
test_dataset = tf.data.Dataset.from_tensor_slices((x_test['id'].apply(get_npy_filepath, is_train=False).values))
test_dataset = test_dataset.map(preprocess_function_parse_tf, num_parallel_calls=tf.data.AUTOTUNE)
test_dataset = test_dataset.batch(batch_size)
test_dataset = test_dataset.prefetch(tf.data.AUTOTUNE)

The Simple CNN model has been trained and can be found in our this version of [Notebook](https://www.kaggle.com/t01fe19bcs238/g2net-eda-preprocessing-model) 

In [None]:
# load the trained Simple CNN model 
saved_cnn_model = tf.keras.models.load_model('../input/g2netfinal/full_cnn_model.h5')

In [None]:
saved_cnn_model.fit(x=valid_dataset, epochs=3, batch_size=batch_size, verbose=1)

In [None]:
saved_cnn_model.save('./model/full_cnn_model.h5')

In [None]:
full_cnn_model = tf.keras.models.load_model('./model/full_cnn_model.h5')

In [None]:
# predict the test dataset using CNN
preds_cnn = saved_cnn_model.predict(test_dataset)

In [None]:
# Function to save kaggle submissions for test prediction probabilities
def get_kaggle_format(prediction_probs, model='base'):
    # load the sample submission file
#     sub = pd.read_csv('../input/g2net-gravitational-wave-detection/sample_submission.csv')
    sample_sub['target'] = prediction_probs
    
    # Output filename for kaggle submission
    filename = f"kaggle_sub_{model}.csv"
    
    # Save the DataFrame to a file
    sample_sub.to_csv(filename, index=False)
    print(f'File name: {filename}')

In [None]:
#save the kaggle submission file
get_kaggle_format(preds_cnn, model='cnn')

***
## 2. Advanced Model - EfficientNet B7 Model

- The baseline model performed quite well actually, but it was a very simple model which we trained for our particular dataset from scratch .
- However, there are more advanced and pre-trained state-of-the-art models that we can try to use for our classification task. 
- EfficientNet is one such model architecture that has been researched extensively recently, and has achieved state-of-the-art level accuracy as compared to other models on ImageNet data with significantly fewer number of parameters, which means faster training times. 
- As we have a large dataset, we can use these models, with and without pretrained weights to see if we get better results than our baseline. 

In [None]:
# Import libraries
import matplotlib.pyplot as plt # plotting tools
from random import shuffle
import math
import os

#import keras
!pip install -U git+https://github.com/leondgarse/keras_efficientnet_v2
import re
import os
from scipy.signal import get_window
from typing import Optional, Tuple
import warnings
import random
import math
import tensorflow as tf
import keras_efficientnet_v2
from sklearn import metrics
from sklearn.model_selection import KFold, StratifiedKFold
from tensorflow.keras import backend as K
from tensorflow.keras import mixed_precision
import tensorflow_addons as tfa
from kaggle_datasets import KaggleDatasets
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau

In [None]:
# Function to get hardware strategy
def get_hardware_strategy():
    try:
        # TPU detection. No parameters necessary if TPU_NAME environment variable is
        # set: this is always the case on Kaggle.
        tpu = tf.distribute.cluster_resolver.TPUClusterResolver()
        print('Running on TPU ', tpu.master())
    except ValueError:
        tpu = None

    if tpu:
        tf.config.experimental_connect_to_cluster(tpu)
        tf.tpu.experimental.initialize_tpu_system(tpu)
        strategy = tf.distribute.experimental.TPUStrategy(tpu)
        policy = mixed_precision.Policy('mixed_bfloat16')
        mixed_precision.set_global_policy(policy)
        tf.config.optimizer.set_jit(True)
    else:
        # Default distribution strategy in Tensorflow. Works on CPU and single GPU.
        strategy = tf.distribute.get_strategy()

    print("REPLICAS: ", strategy.num_replicas_in_sync)
    return tpu, strategy

tpu, strategy = get_hardware_strategy()

In [None]:
# For tf.dataset
AUTO = tf.data.experimental.AUTOTUNE

# Data access (Train tf records)
GCS_PATH1 = KaggleDatasets().get_gcs_path('g2net-tf-records-tr-bp-filter-1')
GCS_PATH2 = KaggleDatasets().get_gcs_path('g2net-tf-records-tr-bp-filter-1')
GCS_PATH3 = KaggleDatasets().get_gcs_path('g2net-tf-records-tr-bp-filter-3')
# Data access (Test tf records)
GCS_PATH4 = KaggleDatasets().get_gcs_path('g2net-tf-records-ts-bp-filter-1')
GCS_PATH5 = KaggleDatasets().get_gcs_path('g2net-tf-records-ts-bp-filter-2')

# Configuration
EPOCHS = 30
BATCH_SIZE = 16 * strategy.num_replicas_in_sync
IMAGE_SIZE = [512, 512]
# Seed
SEED = 2021
# Learning rate
LR = 0.0001
# Verbosity
VERBOSE = 1

# Training filenames directory
TRAINING_FILENAMES = tf.io.gfile.glob(GCS_PATH1 + '/train*.tfrec') + tf.io.gfile.glob(GCS_PATH2 + '/train*.tfrec') + tf.io.gfile.glob(GCS_PATH3 + '/train*.tfrec')
# Testing filenames directory
TESTING_FILENAMES = tf.io.gfile.glob(GCS_PATH4 + '/test*.tfrec') + tf.io.gfile.glob(GCS_PATH5 + '/test*.tfrec')

In [None]:
from sklearn.model_selection import train_test_split

train_names, valid_names = train_test_split(TRAINING_FILENAMES, test_size = 0.20, random_state = 51)
valid_names

In [None]:
# Function to create cqt kernel
def create_cqt_kernels(
    q: float,
    fs: float,
    fmin: float,
    n_bins: int = 84,
    bins_per_octave: int = 12,
    norm: float = 1,
    window: str = "tukey",
    fmax: Optional[float] = None,
    topbin_check: bool = True
) -> Tuple[np.ndarray, int, np.ndarray, float]:
    fft_len = 2 ** _nextpow2(np.ceil(q * fs / fmin))
    
    if (fmax is not None) and (n_bins is None):
        n_bins = np.ceil(bins_per_octave * np.log2(fmax / fmin))
        freqs = fmin * 2.0 ** (np.r_[0:n_bins] / np.float(bins_per_octave))
    elif (fmax is None) and (n_bins is not None):
        freqs = fmin * 2.0 ** (np.r_[0:n_bins] / np.float(bins_per_octave))
    else:
        warnings.warn("If nmax is given, n_bins will be ignored", SyntaxWarning)
        n_bins = np.ceil(bins_per_octave * np.log2(fmax / fmin))
        freqs = fmin * 2.0 ** (np.r_[0:n_bins] / np.float(bins_per_octave))
        
    if np.max(freqs) > fs / 2 and topbin_check:
        raise ValueError(f"The top bin {np.max(freqs)} Hz has exceeded the Nyquist frequency, \
                           please reduce the `n_bins`")
    
    kernel = np.zeros((int(n_bins), int(fft_len)), dtype=np.complex64)
    
    length = np.ceil(q * fs / freqs)
    for k in range(0, int(n_bins)):
        freq = freqs[k]
        l = np.ceil(q * fs / freq)
        
        if l % 2 == 1:
            start = int(np.ceil(fft_len / 2.0 - l / 2.0)) - 1
        else:
            start = int(np.ceil(fft_len / 2.0 - l / 2.0))

        sig = get_window(window, int(l), fftbins=True) * np.exp(
            np.r_[-l // 2:l // 2] * 1j * 2 * np.pi * freq / fs) / l
        
        if norm:
            kernel[k, start:start + int(l)] = sig / np.linalg.norm(sig, norm)
        else:
            kernel[k, start:start + int(l)] = sig
    return kernel, fft_len, length, freqs


def _nextpow2(a: float) -> int:
    return int(np.ceil(np.log2(a)))

# Function to prepare cqt kernel
def prepare_cqt_kernel(
    sr=22050,
    hop_length=512,
    fmin=30,
    fmax=1024,
    n_bins=84,
    bins_per_octave=12,
    norm=1,
    filter_scale=1,
    window="hann"
):
    q = float(filter_scale) / (2 ** (1 / bins_per_octave) - 1)
    print(q)
    return create_cqt_kernels(q, sr, fmin, n_bins, bins_per_octave, norm, window, fmax)

# Function to create cqt image
def create_cqt_image(wave, hop_length=16):
    CQTs = []
    for i in range(3):
        x = wave[i]
        x = tf.expand_dims(tf.expand_dims(x, 0), 2)
        x = tf.pad(x, PADDING, "REFLECT")

        CQT_real = tf.nn.conv1d(x, CQT_KERNELS_REAL, stride=hop_length, padding="VALID")
        CQT_imag = -tf.nn.conv1d(x, CQT_KERNELS_IMAG, stride=hop_length, padding="VALID")
        CQT_real *= tf.math.sqrt(LENGTHS)
        CQT_imag *= tf.math.sqrt(LENGTHS)

        CQT = tf.math.sqrt(tf.pow(CQT_real, 2) + tf.pow(CQT_imag, 2))
        CQTs.append(CQT[0])
    return tf.stack(CQTs, axis=2)

HOP_LENGTH = 6
cqt_kernels, KERNEL_WIDTH, lengths, _ = prepare_cqt_kernel(
    sr=2048,
    hop_length=HOP_LENGTH,
    fmin=20,
    fmax=1024,
    bins_per_octave=9)
LENGTHS = tf.constant(lengths, dtype=tf.float32)
CQT_KERNELS_REAL = tf.constant(np.swapaxes(cqt_kernels.real[:, np.newaxis, :], 0, 2))
CQT_KERNELS_IMAG = tf.constant(np.swapaxes(cqt_kernels.imag[:, np.newaxis, :], 0, 2))
PADDING = tf.constant([[0, 0],
                        [KERNEL_WIDTH // 2, KERNEL_WIDTH // 2],
                        [0, 0]])

In [None]:
# Function to seed everything
def seed_everything(seed):
    random.seed(seed)
    np.random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    tf.random.set_seed(seed)

# Function to prepare image
def prepare_image(wave):
    # Decode raw
    wave = tf.reshape(tf.io.decode_raw(wave, tf.float64), (3, 4096))
    normalized_waves = []
    # Normalize
    for i in range(3):
        normalized_wave = wave[i] / tf.math.reduce_max(wave[i])
        normalized_waves.append(normalized_wave)
    # Stack and cast
    wave = tf.stack(normalized_waves)
    wave = tf.cast(wave, tf.float32)
    # Create image
    image = create_cqt_image(wave, HOP_LENGTH)
    # Resize image
    image = tf.image.resize(image, [*IMAGE_SIZE])
    # Reshape
    image = tf.reshape(image, [*IMAGE_SIZE, 3])
    return image

# This function parse our images and also get the target variable
def read_labeled_tfrecord(example):
    LABELED_TFREC_FORMAT = {
        'wave': tf.io.FixedLenFeature([], tf.string),
        'wave_id': tf.io.FixedLenFeature([], tf.string),
        'target': tf.io.FixedLenFeature([], tf.int64)
    }
    example = tf.io.parse_single_example(example, LABELED_TFREC_FORMAT)
    image = prepare_image(example['wave'])
    image_id = example['wave_id']
    target = tf.cast(example['target'], tf.float32)
    return image, image_id, target

# This function parse our images and also get the target variable
def read_unlabeled_tfrecord(example):
    LABELED_TFREC_FORMAT = {
        'wave': tf.io.FixedLenFeature([], tf.string),
        'wave_id': tf.io.FixedLenFeature([], tf.string)
    }
    example = tf.io.parse_single_example(example, LABELED_TFREC_FORMAT)
    image = prepare_image(example['wave'])
    image_id = example['wave_id']
    return image, image_id

# This function loads TF Records and parse them into tensors
def load_dataset(filenames, ordered = False, labeled = True):
    ignore_order = tf.data.Options()
    if not ordered:
        ignore_order.experimental_deterministic = False 
        
    dataset = tf.data.TFRecordDataset(filenames, num_parallel_reads = AUTO)
    dataset = dataset.with_options(ignore_order)
    dataset = dataset.map(read_labeled_tfrecord if labeled else read_unlabeled_tfrecord, num_parallel_calls = AUTO) 
    return dataset

# This function is to get our training dataset
def get_training_dataset(filenames, ordered = False, labeled = True):
    dataset = load_dataset(filenames, ordered = ordered, labeled = labeled)
    dataset = dataset.repeat()
    dataset = dataset.shuffle(2048)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.prefetch(AUTO)
    return dataset

def get_validation_dataset(filenames, ordered = False, labeled = True):
    dataset = load_dataset(filenames, ordered = ordered, labeled = labeled)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.prefetch(AUTO)
    return dataset

# This function is to get our validation and test dataset
def get_val_test_dataset(filenames, ordered = True, labeled = True):
    dataset = load_dataset(filenames, ordered = ordered, labeled = labeled)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.prefetch(AUTO) 
    return dataset

# Function to count how many photos we have in
def count_data_items(filenames):
    # The number of data items is written in the name of the .tfrec files, i.e. flowers00-230.tfrec = 230 data items
    n = [int(re.compile(r"-([0-9]*)\.").search(filename).group(1)) for filename in filenames]
    return np.sum(n)

NUM_TRAINING_IMAGES = count_data_items(train_names)
NUM_VALID_IMAGES = count_data_items(valid_names)
NUM_TESTING_IMAGES = count_data_items(TESTING_FILENAMES)
print(f'Dataset: {NUM_TRAINING_IMAGES} training images')
print(f'Dataset: {NUM_VALID_IMAGES} valid images')
print(f'Dataset: {NUM_TESTING_IMAGES} testing images')

In [None]:
# Learning rate callback function
def get_lr_callback():
    lr_start   = 0.0001
    lr_max     = 0.000015 * BATCH_SIZE
    lr_min     = 0.0000001
    lr_ramp_ep = 3
    lr_sus_ep  = 0
    lr_decay   = 0.7
   
    def lrfn(epoch):
        if epoch < lr_ramp_ep:
            lr = (lr_max - lr_start) / lr_ramp_ep * epoch + lr_start   
        elif epoch < lr_ramp_ep + lr_sus_ep:
            lr = lr_max    
        else:
            lr = (lr_max - lr_min) * lr_decay**(epoch - lr_ramp_ep - lr_sus_ep) + lr_min    
        return lr

    lr_callback = tf.keras.callbacks.LearningRateScheduler(lrfn, verbose = VERBOSE)
    return lr_callback

# Function to create our EfficientNetB7 model
def get_model():
    with strategy.scope():
        inp = tf.keras.layers.Input(shape = (*IMAGE_SIZE, 3))
        x = keras_efficientnet_v2.EfficientNetV2XL(drop_connect_rate=0.2, num_classes=0, pretrained="imagenet21k-ft1k")(inp)
        x = tf.keras.layers.GlobalAveragePooling2D()(x)
        output = tf.keras.layers.Dense(1, activation = 'sigmoid')(x)
        model = tf.keras.models.Model(inputs = [inp], outputs = [output])
        opt = tf.keras.optimizers.Adam(learning_rate = LR)
        opt = tfa.optimizers.SWA(opt)
        model.compile(
            optimizer = opt,
            loss = [tf.keras.losses.BinaryCrossentropy()],
            metrics = [tf.keras.metrics.AUC()]
        )
        return model
    
# Function to train a model with 100% of the data
def train_and_evaluate():
    print('\n')
    print('-'*50)
    if tpu:
        tf.tpu.experimental.initialize_tpu_system(tpu)
    train_dataset = get_training_dataset(train_names, ordered = False, labeled = True)
    train_dataset = train_dataset.map(lambda image, image_id, target: (image, target))
    
    valid_dataset = get_validation_dataset(valid_names, ordered = False, labeled = True)
    valid_dataset = valid_dataset.map(lambda image, image_id, target: (image, target))
    
    STEPS_PER_EPOCH = NUM_TRAINING_IMAGES // (BATCH_SIZE * 4)
    K.clear_session()
    # Seed everything
    seed_everything(SEED)
    model = get_model()
    es = EarlyStopping(patience = 5, restore_best_weights=True,verbose=1)
    history = model.fit(train_dataset,
                        validation_data = valid_dataset,
                        steps_per_epoch = STEPS_PER_EPOCH,
                        epochs = EPOCHS,
                        callbacks = [get_lr_callback(), es], 
                        verbose = VERBOSE)
        
    print('\n')
    print('-'*50)
    print('Test inference...')
    # Predict the test set 
    dataset = get_val_test_dataset(TESTING_FILENAMES, ordered = True, labeled = False)
    image = dataset.map(lambda image, image_id: image)
    test_predictions = model.predict(image).astype(np.float32).reshape(-1)
    # Get the test set image_id
    image_id = dataset.map(lambda image, image_id: image_id).unbatch()
    image_id = next(iter(image_id.batch(NUM_TESTING_IMAGES))).numpy().astype('U')
    # Create dataframe output
    test_df = pd.DataFrame({'id': image_id, 'target': test_predictions})
    # Save test dataframe to disk
    test_df.to_csv(f'submission_efn_{IMAGE_SIZE[0]}_{SEED}.csv', index = False)
    
train_and_evaluate()

# 6. Evaluation
- We compiled both the models to keep track of  ROC AUC
- The focus was on looking out for a good AUC value, which tells us that the model is good at separating the two classes well. 
- We compared the two models later and also see what kaggle submission scores we get from our predictions for the test dataset.

The most basic direct comparison between the two models, our simple CNN and the EfficientNet, is summarized in the following table.

<table>
    <tr>
        <th>Model</th>
        <th>Train AUC</th>
        <th>Val AUC</th>
        <th>Test AUC (kaggle)</th>
        <th>Avg time/epoch</th>
    </tr>
    <tr>
        <td>Simple CNN</td>
        <td>0.8363</td>
        <td>0.8388</td>
        <td>0.8435</td>
        <td>3300s|55min</td>
    </tr>
     <tr>
        <td>EfficientNetB7</td>
        <td>0.8952</td>
        <td>0.8800</td>
        <td>0.8754</td>
        <td>860s|15min</td>
    </tr>
 <table>

- Since the predictions made by model are predicted values for the classes, we can look at the predicted values to judge how well our model did classifying those, specifically how confidently did the model predicted those targets. 
- Closer the predicted probabilities of the target are to 0 and 1, we can say more confident the model output is.

In [None]:
# load the CNN predictions into a dataframe
df_preds_cnn = pd.read_csv('../input/g2netassets/kaggle_sub_cnn.csv')
df_preds_cnn.head()

In [None]:
df_preds_cnn.shape

In [None]:
df_preds_cnn[(df_preds_cnn['target'] >= 0.9) | (df_preds_cnn['target'] <= 0.1)]['target'].count()

In [None]:
df_preds_cnn[(df_preds_cnn['target'] >= 0.8) | (df_preds_cnn['target'] <= 0.2)]['target'].count()

- As you can see, out of the 226000 total test predictions, we can say that 74524, or ~33% of the values were predicted by the CNN model with high confidence (>80% probability) for either class
- 48979 or ~22% were predicted with more than 90% probability. 
- Now this cannot be translated directly into good performance, without the true test values; but with further training, regularization and structure changes, we can seek to improve these values in the future.

In [None]:
# load the EFNet predictions into a dataframe
df_preds_efn = pd.read_csv('./submission_efn_512_2021.csv')
df_preds_efn.head()

In [None]:
df_preds_efn.shape

In [None]:
df_preds_efn[(df_preds_efn['target'] >= 0.9) | (df_preds_efn['target'] <= 0.1)]['target'].count()

In [None]:
df_preds_efn[(df_preds_efn['target'] >= 0.8) | (df_preds_efn['target'] <= 0.2)]['target'].count()

- Here, out of the 226000 total test predictions, we can say that 74524, or ~61% of the values were predicted by the EfficientNet model with high confidence (>80% probability) for either class
- 70680 or ~31% were predicted with more than 90% probability. 

***
# 7.Results & Conclusions

- Gravitational Waves are NOT EASY to detect! Once detected, they are hard to find. 
- After sifting through a varierty of preprocessing steps, we transformed the orginal strain wave data into frequency spectrograms, which are images that we then used to train deep learning models. 
- One of the biggest challenges in this project was managing such a large dataset, which was solved by using the TensorFlow's tf.data API, and streamlining the entire workflow all the way from data import to model training & prediction tasks. This helped us achieve the goal of this project of building a pipeline that is flexible and can be reused in the future.
- Our simple CNN architecture, just after 3 epochs, was performing more than expected.
- The Efficient Net B7 model worked quite well with AUC score of 0.8754
- We evaluated the models for ROC AUC score, as we wanted our model to be good at separating the two classes, but also tracked accuracy scores for comparison. Overall, we achieved AUC score of 0.8754 on the test dataset from kaggle.

***
**References**
1. [Enhancing gravitational-wave science with machine learning](https://iopscience.iop.org/article/10.1088/2632-2153/abb93a/pdf)
2. [Improving significance of binary black hole mergers in Advanced LIGO data using deep learning](https://arxiv.org/abs/2010.08584)
3. [GW Tutorials](https://www.gw-openscience.org/LVT151012data/LOSC_Event_tutorial_LVT151012.html#Intro-to-signal-processing)
4. [tf.data: Build TensorFlow input pipelines](https://www.tensorflow.org/guide/data)
5. [Image classification via fine-tuning with EfficientNet](https://keras.io/examples/vision/image_classification_efficientnet_fine_tuning/)