<h1 style="text-align:center; font-size:300%; background-color:#000000; color: #ffffff; padding:5%;">Gravitational Waves</h1>

<h3 style="text-align:center; font-style:italic;">"Bundle of woods makes them strong", unlike black holes who don't agree with the statement!</h3>

<img style="width:70%; margin:0 auto; border-radius:15px;" src="https://i.ibb.co/4gLcXM7/d41586-019-00573-4-16480554.jpg">

<p style="text-align: center; font-size: 100%;">Gravitational waves are basically the ripples or disturbances in the curvature of
spacetime, generated by the violent masses in the universe, that propogate as waves from their source with the
speed of light. They were first discovered in 2015, due to collision of two black holes as shown in the figure
above. These waves are basically the signals generated by collision and can be detected
through different detectors like LIGO Hanford, LIGO Livingston, and Virgo used in the dataset of this competetion.
Each data example basically contains the three waves detected by the three detectors.</p>

<div>
    <h2 style="font-size:140%;">Contributers (Kaggle Profile)</h2>
    <a style="padding:1.5% 2.4%; display:inline-block;background-color: #0a0037; color:white;font-size:85%; border-radius:10px;" href="https://www.kaggle.com/mohammadasimbluemoon">Mohammad Asim</a>
    <a style="padding:1.5%; display:inline-block;background-color: #2d2c2c;; color:white;font-size:85%; border-radius:10px;" href="https://www.kaggle.com/khhassan">Muhammad Hassan</a>
</div>

<div>
    <h2 style="font-size:200%;">Purpose and Description of Notebook</h2>
    <p style="text-align:justify;">The problem with the given dataset <a href="https://www.kaggle.com/c/g2net-gravitational-wave-detection">G2Net Gravitational Wave Detection</a> is that it contains gravitational signals detected by the detectors
    and does not convey enough information about the variation of the signals in the dataset.
    So, to visualize the dataset, this should be converted into the image form. This notebook basically generates
    the image data from the given gravitational waves (GW) detected by the detectors and performs the EDA
    of the given training data.</p>
</div>

<hr>

**Installing the required package 'pycbc' used to convert the data to time series format as you will see after.**

In [None]:
!pip -q install pycbc

# Importing the required Libraries
**These are some of the librarries needed throughout the notebook.**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import random
import pylab
import pycbc.types
from PIL import Image
from multiprocessing import Pool
from multiprocessing import cpu_count
import os
import glob
from tqdm.notebook import tqdm

# Reading the Training and Submission Data
**Reading the training data to get an overview of it.**
* The following block will give the data frame of the training data.

In [None]:
submission_df = pd.read_csv('../input/g2net-gravitational-wave-detection/sample_submission.csv')
submission_df.head()
train_df = pd.read_csv('../input/g2net-gravitational-wave-detection/training_labels.csv')
print("No. of Training Examples : ", len(train_df))
train_df.head()

* The following block will plot the histogram of the training data.

In [None]:
train_df.hist(bins=3)
plt.title("Distribution of Training Data")
plt.xlabel('target')
plt.ylabel('id')

# Visualizing Training Data
**The following block will read the given ".npy" file and visualize the training data.**
* This block will first of all read the file path using "convert_id_to_path" function.
* After that, it will give the path to the "visualize_signals" function to convert the data and plot the signals for easy visualization.
* Also, it will give the shape of the running data example.

In [None]:
main_path = '../input/g2net-gravitational-wave-detection/'
train_folder = 'train'
test_folder = 'test'
def convert_id_to_path(_id):
    return f"{main_path}/{train_folder}/{_id[0]}/{_id[1]}/{_id[2]}/{_id}.npy"
def visualize_signals(_id, target, colors=("black","red","green"), signal_names=("LIGO Hanford","LIGO Livingston","Virgo")):
    path = convert_id_to_path(_id)
    data = np.load(path) #This function returns the input array (from a disk file with 'npy' extension).
    x = data
    print(data.shape)
    plt.figure(figsize=(16,7))
    for i in range(3):
        plt.subplot(4,1,i+1)
        plt.plot(data[i], color=colors[i])
        plt.legend(signal_names[i], fontsize=12, loc="lower right")
        
        plt.subplot(4,1,4)
        plt.plot(data[i], color=colors[i])
    plt.subplot(4,1,4)
    plt.legend(signal_names, fontsize=12, loc="lower right")
    plt.suptitle(f"id: {_id} target: {target}", fontsize=16)
    return x

**This block will basically call the above function and plot the above signals based on the given index.**

In [None]:
for i in range(2):   #We are viusalizing the first two examples only. You can visualize...
                     #...more by changing the value in the range function                    
    _id = train_df.iloc[i]["id"]  #This line will get the '_id/index' from the training dataframe.
    target = train_df.iloc[i]["target"] #This line will get the 'target/label' from the training dataframe.
    data = visualize_signals(_id, target) 


# Appyling the Q_Transform on the signals
**To visualize the signals and variation in the data correctly, we have to apply the q_transform to the given signal. So, the following block will do the following:**
* It will first convert the data into the time series instances.
* After that, it will normalize/whitten the data.
* Then, it will take the q_transform of the time series instances.
* After that, it will normalize it to 0-255 scale.
* At the end, it will save the data into the image form.


In [None]:
def get_constant_q_transform(file_names):
    esp=1e-6
    normalize=True
    
    for i in tqdm(range(len(file_names)),  desc='Progress'):
        example_id = file_names[i].split('/')[-1].split('.')[0]
        # load the specific 2s sample
        data = np.load(file_names[i])
        channels = []
        for i in range(3):
            # convert the data to a TimeSeries instance
            ts = pycbc.types.TimeSeries(data[i, :], epoch=0, delta_t=1.0/2048) #this is the use of...
                                                                               #...the installed library. 
            # whiten the data (i.e. normalize the noise power at different frequencies)
            ts = ts.whiten(0.125, 0.125)
            # calculate the qtransform
            time, freq, power = ts.qtransform(.002, logfsteps=100, qrange=(10, 10), frange=(20, 512))
            # normalize and scale to 0-255
            if normalize:
                mean = power.mean()
                std = power.std()
                power = (power - mean) / (std + esp)
                _min, _max = power.min(), power.max()
                power[power < _min] = _min
                power[power > _max] = _max
                power = 255 * (power - _min) / (_max - _min)
                power = power.astype(np.uint8)
            channels.append(power)
        vstacked = np.stack(channels, axis = -1)
        im = Image.fromarray(vstacked)
        im.save(f"train_cqt/{example_id}.png")



# Plotting Q_Transform
**The following blocks will plot the q_transform and the time series instances that we have generated from the given training data.**
* The follwing block will generate the function that will plot the q_transform.

In [None]:
import cv2
def q_transform_plot(ts, name):
    esp=1e-6
    ts = ts.whiten(0.125, 0.125)
    time, freq, power = ts.qtransform(.002, logfsteps=100, qrange=(10, 10), frange=(20, 512))
    mean = power.mean()
    std = power.std()
    power = (power - mean) / (std + esp)
    _min, _max = power.min(), power.max()
    power[power < _min] = _min
    power[power > _max] = _max
    power = 255 * (power - _min) / (_max - _min)
    power = power.astype(np.uint8)
    fig, ax = plt.subplots(1, 1, figsize=(15, 7))
    ax.imshow(np.array(power))
    plt.title('Q-Transformed ('+name+")")
    return power

The following block will plot the three graphs:
* Time series data before whitening of every detector in data.
* Time series data after whitening of every detector in data.
* Q_transform of every detector in data.

In [None]:
names = ['LIGO Hanford', 'LIGO Livingston', 'Virgo']
color = ['r', 'g', 'b']
path = convert_id_to_path(train_df.iloc[0]['id'])
data = np.load(path)
for i in range(data.shape[0]):
    ts = pycbc.types.TimeSeries(data[i, :], epoch=0, delta_t=1.0/2048) 
    plt.figure(figsize=(16,3))
    plt.subplot(2,2,1)
    plt.plot(ts, color[i])
    plt.legend(names[i], fontsize=12, loc="lower right")
    plt.title("Before Whitening ("+names[i]+")")
    plt.subplot(2,2,2)
    plt.plot(ts.whiten(0.125, 0.125), color[i])
    plt.legend(names[i], fontsize=12, loc="lower right")
    plt.title("After Applying Whitening ("+names[i]+")")
    q_transform_plot(ts, names[i])