## Q1) Given the normalized data named "SiesmicEventsClassification_Normalized", and the target is clustering between the noise and seismic event waveforms in a self-supervised/unsupervised manner, do the following:


1. Divide the dataset into an 80% training set and a 20% testing set.

2. Determine the ideal U-Net architecture that minimizes the reconstruction error on the testing dataset.

3. Utilize the latent representation from the bottleneck layer to cluster seismic and noise waveforms, for instance, by employing the K-means algorithm. You can also use PCA for dimension reduction before applying K-means if the latent representation is large.

4. Reiterate all the preceding steps using frequency or time-frequency transformed data as input for the network.

## Q2) Denoising and Interpolating Marmousi2 Data

Given the Marmousi2 data, you are willing to denoise and interpolate the seismic data simultaneously. To prepare the input and output of the network, use the following code:


In [None]:
import torch
import random

# Small value to prevent division by zero in SNR
epsilon = 1e-10

# Example data with shape (30, 1000, 200)
dat = torch.tensor(data)  # Convert to tensor
num_shots = dat.shape[0]
shots_per_noise_level = 10

# Define different levels of noise (standard deviations)
noise_levels = torch.linspace(0.5, 1, shots_per_noise_level)

# Initialize a list to collect shots with noise and SNR
noisy_shots = []
clean_shots = []
snr_values = []

# Maximum percentage of gaps
max_gap_percentage = 0.2

# Add Gaussian noise to each shot and calculate SNR
for i in range(num_shots):
    for noise_level in noise_levels:
        noise = torch.normal(mean=0.0, std=noise_level, size=dat[i, :, :].shape)
        clean_shot = dat[i, :, :]
        noisy_shot = dat[i, :, :] + noise
        
        # Create random gaps in the noisy shot
        mask_size = noisy_shot.shape[1]  # Assuming size corresponds to 1000 in (30, 1000, 200)
        num_gaps = int(mask_size * max_gap_percentage)
        gap_indices = random.sample(range(mask_size), num_gaps)  # Randomly choose indices to mask

        # Apply the gaps to the noisy shot
        for idx in gap_indices:
            noisy_shot[:, idx] = 0  # Set to zero or any value to create a gap

        # Calculate SNR
        signal_power = torch.mean(dat[i, :, :]**2)  # Power of the signal
        noise_power = torch.mean(noise**2) + epsilon  # Power of the noise with epsilon
        snr = 10 * torch.log10(signal_power / noise_power)  # SNR in dB   

        snr_values.append(snr.item())
        noisy_shots.append(noisy_shot)
        clean_shots.append(clean_shot)

# Convert list to a PyTorch tensor
data_noisy = torch.stack(noisy_shots)
data_clean = torch.stack(clean_shots)

# Check the shape and SNR values
print('The number of noisy data:', data_noisy.shape)  # Shape of the noisy data
print(snr_values[0:10])  # Print the SNR for each noisy shot

## After preparing the data, do the following:

1. Divide the data into 80% for training and 20% for testing.
2. Plot the input and target of your problem to fully understand what you are trying to accomplish.
3. Design a U-Net to denoise and interpolate the seismic data simultaneously.
4. Plot some samples of the test set, obtain the SNR for each test sample, and calculate the SNR improvement after applying the U-Net.