## Starting Over: Building, Training, and Evaluating a Conv2D Autoencoder

Starting fresh to go through the entire process step-by-step.

### Mount Google Drive

Mount your Google Drive to access files stored there.

In [37]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### Define Paths for Training and Testing Data

Specify the paths for the zip files containing the training and testing data in your Google Drive.

In [38]:
import os

# Define the base directory for zip files
zip_base_dir = '/content/drive/My Drive/ZIP/'

# Define the path for the test data zip file (corrected filename)
test_zip_file = os.path.join(zip_base_dir, 'spectrograms_anomolous.zip')

# Define the list of training data zip files (excluding the test file)
all_zip_files = os.listdir(zip_base_dir)
training_zip_files = [os.path.join(zip_base_dir, f) for f in all_zip_files if f.endswith('.zip') and f != 'spectrograms_anomolous.zip']

print(f"Test data zip file: {test_zip_file}")
print("Training data zip files:")
for f in training_zip_files:
    print(f)

Test data zip file: /content/drive/My Drive/ZIP/spectrograms_anomolous.zip
Training data zip files:
/content/drive/My Drive/ZIP/spectrograms_full_extended.zip
/content/drive/My Drive/ZIP/spectrograms_full_ext2.zip
/content/drive/My Drive/ZIP/spectrograms_full_ext1.zip
/content/drive/My Drive/ZIP/spectrograms_collapsed.zip
/content/drive/My Drive/ZIP/spectrograms_45_deg.zip


### Create Extraction Directories

Create separate directories to store the extracted files for training and testing data.

In [39]:
import os
import shutil

# Define directories for extracting training and testing data
train_extract_dir = '/tmp/extracted_training_data'
test_extract_dir = '/tmp/extracted_testing_data'

# Clean up the directories before extraction to ensure a clean slate
print(f"Cleaning up extraction directories: {train_extract_dir} and {test_extract_dir}")
if os.path.exists(train_extract_dir):
    shutil.rmtree(train_extract_dir)
os.makedirs(train_extract_dir, exist_ok=True)

if os.path.exists(test_extract_dir):
    shutil.rmtree(test_extract_dir)
os.makedirs(test_extract_dir, exist_ok=True)

print(f"Re-created training data extraction directory: {train_extract_dir}")
print(f"Re-created testing data extraction directory: {test_extract_dir}")

Cleaning up extraction directories: /tmp/extracted_training_data and /tmp/extracted_testing_data
Re-created training data extraction directory: /tmp/extracted_training_data
Re-created testing data extraction directory: /tmp/extracted_testing_data


### Extract Training Data

Extract the contents of the zip files designated for training into the training data directory.

In [40]:
import zipfile
import os
import shutil

# Use the training_zip_files list defined earlier
# Use the train_extract_dir defined earlier

# Clean up any previous anomalous data extraction in the training directory
anomalous_dir_in_train = os.path.join(train_extract_dir, 'spectrograms_anomolous')
if os.path.exists(anomalous_dir_in_train):
    print(f"Removing anomalous data from training directory: {anomalous_dir_in_train}")
    shutil.rmtree(anomalous_dir_in_train)

print("Extracting training data...")
for zip_file in training_zip_files:
    try:
        with zipfile.ZipFile(zip_file, 'r') as zip_ref:
            print(f"Extracting {zip_file} to {train_extract_dir}...")
            zip_ref.extractall(train_extract_dir)
            print("Extraction complete.")
    except FileNotFoundError:
        print(f"Error: Training zip file not found at {zip_file}. Skipping extraction.")
    except Exception as e:
        print(f"Error extracting {zip_file}: {e}")

print("\nTraining data extraction complete.")

Extracting training data...
Extracting /content/drive/My Drive/ZIP/spectrograms_full_extended.zip to /tmp/extracted_training_data...
Extraction complete.
Extracting /content/drive/My Drive/ZIP/spectrograms_full_ext2.zip to /tmp/extracted_training_data...
Extraction complete.
Extracting /content/drive/My Drive/ZIP/spectrograms_full_ext1.zip to /tmp/extracted_training_data...
Extraction complete.
Extracting /content/drive/My Drive/ZIP/spectrograms_collapsed.zip to /tmp/extracted_training_data...
Extraction complete.
Extracting /content/drive/My Drive/ZIP/spectrograms_45_deg.zip to /tmp/extracted_training_data...
Extraction complete.

Training data extraction complete.


### Extract Testing Data

Extract the contents of `spectrograms_anomalous.zip` into the testing data directory.

In [41]:
import zipfile
import os

# Use the test_zip_file variable defined earlier
# Use the test_extract_dir variable defined earlier

print("Extracting testing data...")
try:
    with zipfile.ZipFile(test_zip_file, 'r') as zip_ref:
        print(f"Extracting {test_zip_file} to {test_extract_dir}...")
        zip_ref.extractall(test_extract_dir)
        print("Extraction complete.")
except FileNotFoundError:
    print(f"Error: Test zip file not found at {test_zip_file}. Skipping extraction.")
except Exception as e:
    print(f"Error extracting {test_zip_file}: {e}")

print("\nTesting data extraction complete.")

Extracting testing data...
Extracting /content/drive/My Drive/ZIP/spectrograms_anomolous.zip to /tmp/extracted_testing_data...
Extraction complete.

Testing data extraction complete.


### Verify Extracted Files

List the contents of both the training and testing directories to confirm the files have been extracted correctly.

In [42]:
import os

print(f"Contents of training directory ({train_extract_dir}):")
if os.path.isdir(train_extract_dir):
    training_contents = os.listdir(train_extract_dir)
    if training_contents:
        for item in training_contents:
            print(f"- {item}")
        print("\nChecking subdirectories for .npy files:")
        found_npy_train = False
        for root, dirs, files in os.walk(train_extract_dir):
            npy_files = [f for f in files if f.endswith('.npy')]
            if npy_files:
                print(f"  Found {len(npy_files)} .npy files in {root} (showing first 5):")
                for i, fname in enumerate(npy_files[:5]):
                    print(f"    {fname}")
                found_npy_train = True
        if not found_npy_train:
            print("  No .npy files found in any subdirectories of the training extraction path.")
    else:
        print("Training directory is empty.")
else:
    print(f"Error: Training directory not found at {train_extract_dir}")

print("\n" + "="*30 + "\n")

print(f"Contents of testing directory ({test_extract_dir}):")
if os.path.isdir(test_extract_dir):
    testing_contents = os.listdir(test_extract_dir)
    if testing_contents:
        for item in testing_contents:
            print(f"- {item}")
        print("\nChecking subdirectories for .npy files:")
        found_npy_test = False
        for root, dirs, files in os.walk(test_extract_dir):
            npy_files = [f for f in files if f.endswith('.npy')]
            if npy_files:
                print(f"  Found {len(npy_files)} .npy files in {root} (showing first 5):")
                for i, fname in enumerate(npy_files[:5]):
                    print(f"    {fname}")
                found_npy_test = True
        if not found_npy_test:
             print("  No .npy files found in any subdirectories of the testing extraction path.")
    else:
        print("Testing directory is empty.")
else:
    print(f"Error: Testing directory not found at {test_extract_dir}")

Contents of training directory (/tmp/extracted_training_data):
- spectrograms_full_ext2
- spectrograms_collapsed
- spectrograms_full_extended
- spectrograms_45_deg
- spectrograms_full_ext1

Checking subdirectories for .npy files:
  Found 726 .npy files in /tmp/extracted_training_data/spectrograms_full_ext2 (showing first 5):
    spec_643MHz_20251025_080800.npy
    spec_259MHz_20251025_080730.npy
    spec_809MHz_20251025_080813.npy
    spec_413MHz_20251025_080742.npy
    spec_62MHz_20251025_080703.npy
  Found 726 .npy files in /tmp/extracted_training_data/spectrograms_collapsed (showing first 5):
    spec_742MHz_20251025_075521.npy
    spec_1439MHz_20251025_075616.npy
    spec_1141MHz_20251025_075552.npy
    spec_965MHz_20251025_075538.npy
    spec_1275MHz_20251025_075603.npy
  Found 726 .npy files in /tmp/extracted_training_data/spectrograms_full_extended (showing first 5):
    spec_872MHz_20251025_074950.npy
    spec_1705MHz_20251025_075056.npy
    spec_1174MHz_20251025_075014.npy
   

### Implement Data Generator for Conv2D

Implement a custom data generator to load 2D spectrogram `.npy` files in batches to manage memory usage for the Conv2D model.

In [43]:
import numpy as np
import tensorflow as tf
import os
from tensorflow.keras.utils import Sequence

class SpectrogramDataGenerator(Sequence):
    """
    Data Generator for loading 2D spectrograms from .npy files in batches for Conv2D input.
    """
    def __init__(self, file_paths, batch_size=32, shuffle=True, input_shape=(1024, 292, 1)): # Expected Conv2D input shape
        self.file_paths = file_paths
        self.batch_size = batch_size
        self.shuffle = shuffle
        # Input shape should now include the channel dimension for Conv2D
        self.input_shape = input_shape
        self.indexes = np.arange(len(self.file_paths))
        if self.shuffle:
            self.on_epoch_end()
        # Determine the expected 2D shape without the channel for loading
        self._loading_shape = input_shape[:2]

    def __len__(self):
        """Denotes the number of batches per epoch"""
        # Ensure we don't have empty batches at the end if drop_last is not used
        return int(np.floor(len(self.file_paths) / self.batch_size))


    def __getitem__(self, index):
        """Generate one batch of data"""
        # Generate indexes of the batch
        indexes = self.indexes[index * self.batch_size:(index + 1) * self.batch_size]

        # List of file paths for the batch
        batch_file_paths = [self.file_paths[k] for k in indexes]

        # Load and collect data for the batch
        batch_data = []
        for file_path in batch_file_paths:
            try:
                data = np.load(file_path, mmap_mode='r')
                # Ensure data matches the expected loading shape (without channel)
                if data.shape == self._loading_shape:
                    # Add channel dimension (grayscale)
                    batch_data.append(np.expand_dims(data, axis=-1))
                else:
                    print(f"Warning: Spectrogram shape mismatch for {file_path}. Expected {self._loading_shape}, got {data.shape}. Skipping.")
            except Exception as e:
                print(f"Error loading file {file_path} in generator: {e}")

        if not batch_data:
             # If no data was loaded for this batch, return empty arrays
             # This can happen if all files in the batch had errors or shape mismatches
             # Returning empty arrays with the expected output shape
             return np.empty((0, *self.input_shape)), np.empty((0, *self.input_shape))


        # Concatenate data from all files in the batch
        # Assuming each file contains one 2D spectrogram
        try:
             batch_data = np.stack(batch_data, axis=0) # Stack along the new batch dimension
        except ValueError as e:
             print(f"Error stacking data for batch index {index}: {e}")
             # This might happen if loaded data chunks have inconsistent shapes
             # Returning empty arrays on error
             return np.empty((0, *self.input_shape)), np.empty((0, *self.input_shape))


        # Assuming the autoencoder input and output are the same
        return batch_data, batch_data

    def on_epoch_end(self):
        """Updates indexes after each epoch"""
        if self.shuffle == True:
            np.random.shuffle(self.indexes)

print("SpectrogramDataGenerator class defined for Conv2D.")

SpectrogramDataGenerator class defined for Conv2D.


### Determine Spectrogram Shape and Create Generators

Determine the consistent shape of the spectrograms and create data generator instances for training and testing.

In [44]:
import numpy as np
import os

# Determine spectrogram shape and define consistent shape for the generator
print("Determining spectrogram shape...")
train_extract_base_dir = '/tmp/extracted_training_data'
all_training_files = []
for dirpath, dirnames, filenames in os.walk(train_extract_base_dir):
    for filename in filenames:
        if filename.endswith('.npy'):
            all_training_files.append(os.path.join(dirpath, filename))

spectrogram_shape = None
if all_training_files:
    try:
        # Load an example to get the shape
        example_data = np.load(all_training_files[0], mmap_mode='r')
        spectrogram_shape = example_data.shape
        print(f"Inferred spectrogram shape: {spectrogram_shape}")
        # Assuming all spectrograms have the same shape for simplicity.
        # If shapes vary, padding or resizing would be needed.
        consistent_shape = spectrogram_shape
        print(f"Using consistent shape: {consistent_shape}")
    except Exception as e:
        print(f"Error inferring spectrogram shape: {e}")
        consistent_shape = (1024, 292) # Fallback to a known shape if inference fails
        print(f"Using fallback consistent shape: {consistent_shape}")
else:
    consistent_shape = (1024, 292) # Fallback if no training files found
    print(f"No training files found to infer shape. Using fallback consistent shape: {consistent_shape}")

# Create Data Generator Instances with updated input shape and correct paths
print("\nCreating data generator instances with 2D input shape and correct paths...")
test_extract_base_dir = '/tmp/extracted_testing_data' # Base dir for testing

all_testing_files = []
for dirpath, dirnames, filenames in os.walk(test_extract_base_dir):
    for filename in filenames:
        if filename.endswith('.npy'):
            all_testing_files.append(os.path.join(dirpath, filename))


if consistent_shape is not None:
    # Input shape for the generator and model, including the channel dimension
    generator_input_shape = (*consistent_shape, 1)
    batch_size = 32 # You can adjust this batch size

    train_generator = SpectrogramDataGenerator(all_training_files, batch_size=batch_size, shuffle=True, input_shape=generator_input_shape)
    test_generator = SpectrogramDataGenerator(all_testing_files, batch_size=batch_size, shuffle=False, input_shape=generator_input_shape)

    print(f"Train generator created with 2D input shape {generator_input_shape} and {len(train_generator)} batches.")
    print(f"Test generator created with 2D input shape {generator_input_shape} and {len(test_generator)} batches.")
else:
    train_generator = None
    test_generator = None
    print("Cannot create data generators without a valid consistent shape.")

Determining spectrogram shape...
Inferred spectrogram shape: (1024, 292)
Using consistent shape: (1024, 292)

Creating data generator instances with 2D input shape and correct paths...
Train generator created with 2D input shape (1024, 292, 1) and 113 batches.
Test generator created with 2D input shape (1024, 292, 1) and 22 batches.


### Define and build the Conv2D autoencoder model

Define the architecture of a 2D convolutional autoencoder using Conv2D, MaxPooling2D, UpSampling2D, and Cropping2D layers.

In [45]:
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D, Cropping2D
from tensorflow.keras.models import Model

# Define and build the 2D convolutional autoencoder model with shape adjustments
print("\nDefining and building the 2D convolutional autoencoder model...")

# Use the generator_input_shape determined in the previous step
# It should be in the format (height, width, channels)
if 'generator_input_shape' in locals() and generator_input_shape is not None:
    input_shape = generator_input_shape
    print(f"Model input shape: {input_shape}")

    # Encoder
    input_layer = Input(shape=input_shape)
    x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_layer)
    x = MaxPooling2D((2, 2), padding='same')(x)
    x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
    x = MaxPooling2D((2, 2), padding='same')(x)
    x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
    encoded = MaxPooling2D((2, 2), padding='same')(x)

    # Decoder
    x = Conv2D(128, (3, 3), activation='relu', padding='same')(encoded)
    x = UpSampling2D((2, 2))(x)
    x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
    x = UpSampling2D((2, 2))(x)
    x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
    x = UpSampling2D((2, 2))(x)

    # Add Cropping2D to match the target width (adjust cropping based on actual shape)
    # Assuming the original shape was (1024, 292) and after 3x MaxPooling(2,2) and 3x UpSampling(2,2)
    # the shape becomes (1024, 296). We need to crop the width from 296 to 292.
    # This cropping might need adjustment if the original shape or pooling/upsampling changes.
    # Based on previous output, the shape after upsampling was (1024, 296).
    # Need to crop 2 from each side of the width: (0,0) for height, (2,2) for width
    x = Cropping2D(cropping=((0, 0), (2, 2)))(x)

    # Final Conv2D layer to get to 1 channel
    decoded = Conv2D(1, (3, 3), activation='linear', padding='same')(x)

    autoencoder = Model(input_layer, decoded)

    # Compile the autoencoder
    print("\nCompiling the autoencoder model...")
    autoencoder.compile(optimizer='adam', loss='mse')

    # Print model summary
    print("\nAutoencoder model summary:")
    autoencoder.summary()
else:
    print("Input shape not determined. Cannot define Conv2D autoencoder model.")
    autoencoder = None


Defining and building the 2D convolutional autoencoder model...
Model input shape: (1024, 292, 1)

Compiling the autoencoder model...

Autoencoder model summary:


### Train the Conv2D autoencoder

Train the defined Conv2D autoencoder model using the training data generator.

In [46]:
# Train the autoencoder using the generator
if autoencoder is not None and train_generator is not None:
    print("\nTraining the autoencoder with the data generator...")
    # Use .fit() with the generator
    history = autoencoder.fit(train_generator,
                              epochs=10, # Number of epochs
                              # steps_per_epoch is automatically inferred from the generator's __len__()
                              # validation_data can be a generator
                              validation_data=test_generator, # Use test generator for validation during training
                              shuffle=False # Shuffling is handled within the generator
                             )
    print("Autoencoder training with data generator complete.")
else:
    print("\nAutoencoder model or train/test generator not available. Skipping training.")


Training the autoencoder with the data generator...


  self._warn_if_super_not_called()


Epoch 1/10
[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m76s[0m 632ms/step - loss: 1370.3004 - val_loss: 33.4864
Epoch 2/10
[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m69s[0m 606ms/step - loss: 31.6034 - val_loss: 30.7766
Epoch 3/10
[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m67s[0m 590ms/step - loss: 30.6756 - val_loss: 30.7789
Epoch 4/10
[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m69s[0m 610ms/step - loss: 30.6092 - val_loss: 30.5959
Epoch 5/10
[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m68s[0m 599ms/step - loss: 30.6180 - val_loss: 30.5401
Epoch 6/10
[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m83s[0m 610ms/step - loss: 30.4690 - val_loss: 30.5917
Epoch 7/10
[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m67s[0m 588ms/step - loss: 30.6249 - val_loss: 30.4611
Epoch 8/10
[1m113/113[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m69s[0m 613ms/step - loss: 30.5446 - val_loss: 30.339

### Evaluate the Conv2D autoencoder

Evaluate the trained Conv2D autoencoder using the testing data generator and analyze reconstruction errors.

In [47]:
import numpy as np
import matplotlib.pyplot as plt

# Evaluate the autoencoder using the test generator
if autoencoder is not None and test_generator is not None:
    print("\nEvaluating the autoencoder with the data generator...")

    # Predict using the test generator
    reconstructions = autoencoder.predict(test_generator)

    # Get original test data from the generator for MSE calculation
    # Note: This will load the test data again in batches.
    # A more memory-efficient approach for evaluation might be needed for very large test sets.
    # For simplicity here, we will iterate through the test generator to get original data
    original_test_data = []
    for i in range(len(test_generator)):
        batch_data, _ = test_generator[i]
        original_test_data.append(batch_data)

    if original_test_data:
        original_test_data = np.concatenate(original_test_data, axis=0)
        print(f"Shape of original test data from generator: {original_test_data.shape}")
        print(f"Shape of reconstructions: {reconstructions.shape}")

        # Ensure shapes match before calculating MSE
        if original_test_data.shape == reconstructions.shape:
            # Calculate MSE for each sample (across height, width, and channel dimensions)
            mse = np.mean(np.power(original_test_data - reconstructions, 2), axis=(1, 2, 3))
            print(f"Shape of reconstruction errors (MSE per sample): {mse.shape}")

            # Plot the distribution of reconstruction errors
            plt.figure(figsize=(10, 6))
            plt.hist(mse, bins=50, density=True, alpha=0.7, color='skyblue')
            plt.title('Distribution of Reconstruction Errors (MSE) on Testing Data (from Generator)')
            plt.xlabel('Reconstruction Error (MSE)')
            plt.ylabel('Density')
            plt.grid(True)
            display(plt) # Explicitly display the plot
            plt.close() # Close the plot to free up memory

            # Print statistics about the reconstruction errors
            print(f"Mean Reconstruction Error (MSE): {np.mean(mse)}")
            print(f"Median Reconstruction Error (MSE): {np.median(mse)}")
            print(f"Standard Deviation of Reconstruction Error (MSE): {np.std(mse)}")
            print(f"Maximum Reconstruction Error (MSE): {np.max(mse)}")
            print(f"Minimum Reconstruction Error (MSE): {np.min(mse)}")
        else:
            print("Shape mismatch between original test data and reconstructions. Cannot calculate MSE.")
            print(f"Original test data shape: {original_test_data.shape}")
            print(f"Reconstructions shape: {reconstructions.shape}")

    else:
        print("No original test data loaded from generator.")

else:
     print("\nAutoencoder model or test generator not available. Skipping evaluation.")


Evaluating the autoencoder with the data generator...
[1m22/22[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 199ms/step
Shape of original test data from generator: (704, 1024, 292, 1)
Shape of reconstructions: (704, 1024, 292, 1)
Shape of reconstruction errors (MSE per sample): (704,)


<module 'matplotlib.pyplot' from '/usr/local/lib/python3.12/dist-packages/matplotlib/pyplot.py'>

Mean Reconstruction Error (MSE): 30.221345901489258
Median Reconstruction Error (MSE): 30.622276306152344
Standard Deviation of Reconstruction Error (MSE): 1.8326497077941895
Maximum Reconstruction Error (MSE): 40.88224411010742
Minimum Reconstruction Error (MSE): 26.259309768676758


### Visualize Reconstructions

Visualize some original and reconstructed spectrograms from the testing data generator.

In [48]:
import numpy as np
import matplotlib.pyplot as plt

# Assuming test_generator and autoencoder are available

if autoencoder is not None and test_generator is not None and len(test_generator) > 0:
    print("\nVisualizing reconstructions...")
    # Get a batch of test data from the generator
    original_batch, _ = test_generator[0] # Get the first batch

    # Get reconstructions for this batch
    reconstructed_batch = autoencoder.predict(original_batch)

    # Select a few examples from the batch to visualize
    num_examples = min(5, original_batch.shape[0]) # Visualize up to 5 examples or fewer if batch is smaller
    example_indices = np.random.choice(original_batch.shape[0], num_examples, replace=False)

    plt.figure(figsize=(15, 6 * num_examples))

    for i, idx in enumerate(example_indices):
        # Original spectrogram (remove batch and channel dimensions for plotting)
        original_spectrogram = np.squeeze(original_batch[idx])

        # Reconstructed spectrogram (remove batch and channel dimensions for plotting)
        reconstructed_spectrogram = np.squeeze(reconstructed_batch[idx])

        # Plot original
        plt.subplot(num_examples, 2, 2 * i + 1)
        plt.imshow(original_spectrogram, aspect='auto', origin='lower', cmap='viridis') # Use imshow for 2D data
        plt.title(f'Original Example {idx+1}')
        plt.xlabel('Time') # Assuming x-axis is time
        plt.ylabel('Frequency') # Assuming y-axis is frequency


        # Plot reconstruction
        plt.subplot(num_examples, 2, 2 * i + 2)
        plt.imshow(reconstructed_spectrogram, aspect='auto', origin='lower', cmap='viridis') # Use imshow for 2D data
        plt.title(f'Reconstruction Example {idx+1}')
        plt.xlabel('Time') # Assuming x-axis is time
        plt.ylabel('Frequency') # Assuming y-axis is frequency


    plt.tight_layout()
    display(plt) # Explicitly display the plot
    plt.close() # Close the plot to free up memory

else:
    print("\nTest generator or autoencoder not available, or test generator is empty. Skipping visualization.")


Visualizing reconstructions...
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1s/step


<module 'matplotlib.pyplot' from '/usr/local/lib/python3.12/dist-packages/matplotlib/pyplot.py'>

### Identify Potential Anomalies based on Reconstruction Error

Use the calculated reconstruction errors to identify potential anomalies in the testing data based on a threshold.

In [49]:
import numpy as np

# Assuming mse (reconstruction errors) is available from the evaluation cell (1c59cd17)
# Assuming test_generator is available from previous steps

if 'mse' in locals() and test_generator is not None:
    print("Identifying potential anomalies...")

    # Calculate a simple threshold based on the mean and standard deviation of the MSE
    mean_mse = np.mean(mse)
    std_mse = np.std(mse)
    # A common threshold is Mean + a multiple of StdDev (e.g., 2 or 3)
    # You can adjust the multiplier based on how sensitive you want the anomaly detection to be
    anomaly_threshold = mean_mse + 2 * std_mse

    print(f"\nCalculated Anomaly Threshold (Mean + 2*StdDev): {anomaly_threshold}")

    # Identify samples in the testing data with a reconstruction error above the threshold
    potential_anomaly_indices = np.where(mse > anomaly_threshold)[0]

    print(f"\nNumber of potential anomalies detected in the testing data: {len(potential_anomaly_indices)}")

    # To find which frequency centers are involved, we would need to relate these indices
    # back to the original file names and the frequency information within those files.
    # The current data generator loads batches, so tracing back the index in the concatenated
    # array to the original file and its content requires additional logic in the generator
    # or during data loading to keep track of the origin of each sample.

    # For now, let's list the indices of the potential anomalies
    if len(potential_anomaly_indices) > 0:
        print("Indices of potential anomalies (in the concatenated testing data array):")
        # Print only the first few indices if there are many
        print(potential_anomaly_indices[:10])
        if len(potential_anomaly_indices) > 10:
            print("...")

    else:
        print("No potential anomalies detected above the calculated threshold.")

    # Note: Relating indices back to specific frequency centers directly from the concatenated
    # MSE array is not straightforward without mapping the indices back to the original
    # spectrogram structure and associated metadata (like frequency centers if available).
    # This would require modifications to the data loading/generator process.

else:
    print("Reconstruction errors (mse) or test generator not available. Cannot identify anomalies.")

Identifying potential anomalies...

Calculated Anomaly Threshold (Mean + 2*StdDev): 33.88664627075195

Number of potential anomalies detected in the testing data: 21
Indices of potential anomalies (in the concatenated testing data array):
[ 18 162 167 190 222 231 272 276 303 341]
...


# Task
Modify the code to identify which files in the "spectrograms_anomalous" folder contain the detected anomalies and list those file names.

## Modify testing data loading

### Subtask:
Modify the testing data loading process within the `SpectrogramDataGenerator` to store the original file path along with the loaded data.


**Reasoning**:
The subtask is to modify the data generator to return file paths and then update the evaluation loop to capture these paths. I will start by modifying the `SpectrogramDataGenerator` class to include the file paths in the `__getitem__` method's return value.



In [50]:
import numpy as np
import tensorflow as tf
import os
from tensorflow.keras.utils import Sequence

class SpectrogramDataGenerator(Sequence):
    """
    Data Generator for loading 2D spectrograms from .npy files in batches for Conv2D input.
    Includes functionality to return corresponding file paths.
    """
    def __init__(self, file_paths, batch_size=32, shuffle=True, input_shape=(1024, 292, 1)): # Expected Conv2D input shape
        self.file_paths = file_paths
        self.batch_size = batch_size
        self.shuffle = shuffle
        # Input shape should now include the channel dimension for Conv2D
        self.input_shape = input_shape
        self.indexes = np.arange(len(self.file_paths))
        if self.shuffle:
            self.on_epoch_end()
        # Determine the expected 2D shape without the channel for loading
        self._loading_shape = input_shape[:2]

    def __len__(self):
        """Denotes the number of batches per epoch"""
        # Ensure we don't have empty batches at the end if drop_last is not used
        return int(np.floor(len(self.file_paths) / self.batch_size))


    def __getitem__(self, index):
        """Generate one batch of data and corresponding file paths"""
        # Generate indexes of the batch
        indexes = self.indexes[index * self.batch_size:(index + 1) * self.batch_size]

        # List of file paths for the batch
        batch_file_paths = [self.file_paths[k] for k in indexes]

        # Load and collect data for the batch
        batch_data = []
        loaded_batch_file_paths = [] # Store paths for successfully loaded data
        for file_path in batch_file_paths:
            try:
                data = np.load(file_path, mmap_mode='r')
                # Ensure data matches the expected loading shape (without channel)
                if data.shape == self._loading_shape:
                    # Add channel dimension (grayscale)
                    batch_data.append(np.expand_dims(data, axis=-1))
                    loaded_batch_file_paths.append(file_path)
                else:
                    print(f"Warning: Spectrogram shape mismatch for {file_path}. Expected {self._loading_shape}, got {data.shape}. Skipping.")
            except Exception as e:
                print(f"Error loading file {file_path} in generator: {e}")

        if not batch_data:
             # If no data was loaded for this batch, return empty arrays and an empty list of paths
             return np.empty((0, *self.input_shape)), np.empty((0, *self.input_shape)), []

        # Concatenate data from all files in the batch
        try:
             batch_data = np.stack(batch_data, axis=0) # Stack along the new batch dimension
        except ValueError as e:
             print(f"Error stacking data for batch index {index}: {e}")
             # Returning empty arrays and an empty list on error
             return np.empty((0, *self.input_shape)), np.empty((0, *self.input_shape)), []


        # Assuming the autoencoder input and output are the same
        # Return batch_data twice (for input and target) and the list of file paths
        return batch_data, batch_data, loaded_batch_file_paths

    def on_epoch_end(self):
        """Updates indexes after each epoch"""
        if self.shuffle == True:
            np.random.shuffle(self.indexes)

print("SpectrogramDataGenerator class modified to return file paths.")

SpectrogramDataGenerator class modified to return file paths.


**Reasoning**:
Now that the `SpectrogramDataGenerator` is modified to return file paths, I need to update the evaluation loop in the next code block to capture these file paths when iterating through the test generator and store them in a list that corresponds to the order of the concatenated data and reconstruction arrays.



In [51]:
import numpy as np
import matplotlib.pyplot as plt

# Evaluate the autoencoder using the test generator
if autoencoder is not None and test_generator is not None:
    print("\nEvaluating the autoencoder with the data generator...")

    # Predict using the test generator
    # The predict method of the model does not directly give access to the generator's file paths
    # We will iterate through the generator to get original data and file paths,
    # and then predict on those batches.

    original_test_data = []
    all_test_file_paths = [] # List to store all file paths from the test generator
    reconstructions_list = [] # List to store reconstructions for each batch

    print("Loading test data and getting reconstructions batch by batch...")
    for i in range(len(test_generator)):
        try:
            # Get a batch of original data and file paths from the generator
            batch_data, _, batch_file_paths = test_generator[i]

            if batch_data.shape[0] > 0: # Check if the batch is not empty
                # Predict reconstructions for this batch
                batch_reconstructions = autoencoder.predict(batch_data, verbose=0) # verbose=0 to reduce output

                # Append batch data, reconstructions, and file paths to the lists
                original_test_data.append(batch_data)
                reconstructions_list.append(batch_reconstructions)
                all_test_file_paths.extend(batch_file_paths) # Extend with the list of file paths

        except Exception as e:
            print(f"Error processing batch {i}: {e}")
            # Continue to the next batch if there's an error


    if original_test_data and reconstructions_list:
        # Concatenate all batches to get the full arrays
        original_test_data = np.concatenate(original_test_data, axis=0)
        reconstructions = np.concatenate(reconstructions_list, axis=0)

        print(f"Shape of original test data from generator: {original_test_data.shape}")
        print(f"Shape of reconstructions: {reconstructions.shape}")
        print(f"Number of test file paths collected: {len(all_test_file_paths)}")


        # Ensure shapes match before calculating MSE and file paths match the data length
        if original_test_data.shape == reconstructions.shape and len(all_test_file_paths) == original_test_data.shape[0]:
            # Calculate MSE for each sample (across height, width, and channel dimensions)
            mse = np.mean(np.power(original_test_data - reconstructions, 2), axis=(1, 2, 3))
            print(f"Shape of reconstruction errors (MSE per sample): {mse.shape}")

            # Plot the distribution of reconstruction errors
            plt.figure(figsize=(10, 6))
            plt.hist(mse, bins=50, density=True, alpha=0.7, color='skyblue')
            plt.title('Distribution of Reconstruction Errors (MSE) on Testing Data')
            plt.xlabel('Reconstruction Error (MSE)')
            plt.ylabel('Density')
            plt.grid(True)
            display(plt) # Explicitly display the plot
            plt.close() # Close the plot to free up memory

            # Print statistics about the reconstruction errors
            print(f"Mean Reconstruction Error (MSE): {np.mean(mse)}")
            print(f"Median Reconstruction Error (MSE): {np.median(mse)}")
            print(f"Standard Deviation of Reconstruction Error (MSE): {np.std(mse)}")
            print(f"Maximum Reconstruction Error (MSE): {np.max(mse)}")
            print(f"Minimum Reconstruction Error (MSE): {np.min(mse)}")
        else:
            print("Shape mismatch between original test data, reconstructions, or file paths count. Cannot calculate MSE or proceed.")
            print(f"Original test data shape: {original_test_data.shape}")
            print(f"Reconstructions shape: {reconstructions.shape}")
            print(f"Number of test file paths collected: {len(all_test_file_paths)}")

    else:
        print("No data loaded from test generator for evaluation.")


else:
     print("\nAutoencoder model or test generator not available. Skipping evaluation.")


Evaluating the autoencoder with the data generator...
Loading test data and getting reconstructions batch by batch...
Error processing batch 0: not enough values to unpack (expected 3, got 2)
Error processing batch 1: not enough values to unpack (expected 3, got 2)
Error processing batch 2: not enough values to unpack (expected 3, got 2)
Error processing batch 3: not enough values to unpack (expected 3, got 2)
Error processing batch 4: not enough values to unpack (expected 3, got 2)
Error processing batch 5: not enough values to unpack (expected 3, got 2)
Error processing batch 6: not enough values to unpack (expected 3, got 2)
Error processing batch 7: not enough values to unpack (expected 3, got 2)
Error processing batch 8: not enough values to unpack (expected 3, got 2)
Error processing batch 9: not enough values to unpack (expected 3, got 2)
Error processing batch 10: not enough values to unpack (expected 3, got 2)
Error processing batch 11: not enough values to unpack (expected 3

**Reasoning**:
The error message "not enough values to unpack (expected 3, got 2)" indicates that the `test_generator[i]` is still returning 2 values instead of the expected 3 (batch data, targets, file paths). This is because the `SpectrogramDataGenerator` was redefined in a previous cell, but the `test_generator` instance itself was not recreated after the class definition was updated. I need to recreate the `test_generator` instance using the updated class definition.



In [52]:
# Recreate the data generator instances with the updated class definition
print("\nRecreating data generator instances with the updated SpectrogramDataGenerator class...")

if consistent_shape is not None:
    # Input shape for the generator and model, including the channel dimension
    # Use variables defined in previous steps
    # all_training_files, all_testing_files, batch_size, generator_input_shape
    train_generator = SpectrogramDataGenerator(all_training_files, batch_size=batch_size, shuffle=True, input_shape=generator_input_shape)
    test_generator = SpectrogramDataGenerator(all_testing_files, batch_size=batch_size, shuffle=False, input_shape=generator_input_shape)

    print(f"Train generator recreated with 2D input shape {generator_input_shape} and {len(train_generator)} batches.")
    print(f"Test generator recreated with 2D input shape {generator_input_shape} and {len(test_generator)} batches.")
else:
    train_generator = None
    test_generator = None
    print("Consistent shape not determined. Cannot recreate data generators.")



Recreating data generator instances with the updated SpectrogramDataGenerator class...
Train generator recreated with 2D input shape (1024, 292, 1) and 113 batches.
Test generator recreated with 2D input shape (1024, 292, 1) and 22 batches.


**Reasoning**:
The `test_generator` has been successfully recreated with the updated class definition. Now I can re-run the evaluation code block which is expected to correctly unpack the 3 returned values and proceed with the evaluation, capturing the file paths.



In [53]:
import numpy as np
import matplotlib.pyplot as plt

# Evaluate the autoencoder using the test generator
if autoencoder is not None and test_generator is not None:
    print("\nEvaluating the autoencoder with the data generator...")

    # Predict using the test generator
    # The predict method of the model does not directly give access to the generator's file paths
    # We will iterate through the generator to get original data and file paths,
    # and then predict on those batches.

    original_test_data = []
    all_test_file_paths = [] # List to store all file paths from the test generator
    reconstructions_list = [] # List to store reconstructions for each batch

    print("Loading test data and getting reconstructions batch by batch...")
    for i in range(len(test_generator)):
        try:
            # Get a batch of original data and file paths from the generator
            batch_data, _, batch_file_paths = test_generator[i]

            if batch_data.shape[0] > 0: # Check if the batch is not empty
                # Predict reconstructions for this batch
                batch_reconstructions = autoencoder.predict(batch_data, verbose=0) # verbose=0 to reduce output

                # Append batch data, reconstructions, and file paths to the lists
                original_test_data.append(batch_data)
                reconstructions_list.append(batch_reconstructions)
                all_test_file_paths.extend(batch_file_paths) # Extend with the list of file paths

        except Exception as e:
            print(f"Error processing batch {i}: {e}")
            # Continue to the next batch if there's an error


    if original_test_data and reconstructions_list:
        # Concatenate all batches to get the full arrays
        original_test_data = np.concatenate(original_test_data, axis=0)
        reconstructions = np.concatenate(reconstructions_list, axis=0)

        print(f"Shape of original test data from generator: {original_test_data.shape}")
        print(f"Shape of reconstructions: {reconstructions.shape}")
        print(f"Number of test file paths collected: {len(all_test_file_paths)}")


        # Ensure shapes match before calculating MSE and file paths match the data length
        if original_test_data.shape == reconstructions.shape and len(all_test_file_paths) == original_test_data.shape[0]:
            # Calculate MSE for each sample (across height, width, and channel dimensions)
            mse = np.mean(np.power(original_test_data - reconstructions, 2), axis=(1, 2, 3))
            print(f"Shape of reconstruction errors (MSE per sample): {mse.shape}")

            # Plot the distribution of reconstruction errors
            plt.figure(figsize=(10, 6))
            plt.hist(mse, bins=50, density=True, alpha=0.7, color='skyblue')
            plt.title('Distribution of Reconstruction Errors (MSE) on Testing Data')
            plt.xlabel('Reconstruction Error (MSE)')
            plt.ylabel('Density')
            plt.grid(True)
            display(plt) # Explicitly display the plot
            plt.close() # Close the plot to free up memory

            # Print statistics about the reconstruction errors
            print(f"Mean Reconstruction Error (MSE): {np.mean(mse)}")
            print(f"Median Reconstruction Error (MSE): {np.median(mse)}")
            print(f"Standard Deviation of Reconstruction Error (MSE): {np.std(mse)}")
            print(f"Maximum Reconstruction Error (MSE): {np.max(mse)}")
            print(f"Minimum Reconstruction Error (MSE): {np.min(mse)}")
        else:
            print("Shape mismatch between original test data, reconstructions, or file paths count. Cannot calculate MSE or proceed.")
            print(f"Original test data shape: {original_test_data.shape}")
            print(f"Reconstructions shape: {reconstructions.shape}")
            print(f"Number of test file paths collected: {len(all_test_file_paths)}")

    else:
        print("No data loaded from test generator for evaluation.")


else:
     print("\nAutoencoder model or test generator not available. Skipping evaluation.")


Evaluating the autoencoder with the data generator...
Loading test data and getting reconstructions batch by batch...
Shape of original test data from generator: (704, 1024, 292, 1)
Shape of reconstructions: (704, 1024, 292, 1)
Number of test file paths collected: 704
Shape of reconstruction errors (MSE per sample): (704,)


<module 'matplotlib.pyplot' from '/usr/local/lib/python3.12/dist-packages/matplotlib/pyplot.py'>

Mean Reconstruction Error (MSE): 30.221345901489258
Median Reconstruction Error (MSE): 30.622276306152344
Standard Deviation of Reconstruction Error (MSE): 1.8326497077941895
Maximum Reconstruction Error (MSE): 40.88224411010742
Minimum Reconstruction Error (MSE): 26.259309768676758


## Create an index-to-file mapping

### Subtask:
Create a mapping (e.g., a list or array) that stores the original file path for each sample in the concatenated testing data array (`original_test_data`).


## Update anomaly identification

### Subtask:
Update the anomaly identification code to use the calculated anomaly indices and the `all_test_file_paths` list to determine which original files contain the potentially anomalous samples.


**Reasoning**:
Identify the indices of potential anomalies based on the calculated threshold and retrieve the corresponding file paths from the all_test_file_paths list.



In [54]:
# Assuming mse (reconstruction errors) is available from the evaluation cell (1c59cd17)
# Assuming all_test_file_paths is available from the evaluation cell (1c59cd17)

if 'mse' in locals() and 'all_test_file_paths' in locals() and all_test_file_paths:
    print("Identifying potential anomalies and their corresponding files...")

    # Calculate a simple threshold based on the mean and standard deviation of the MSE
    mean_mse = np.mean(mse)
    std_mse = np.std(mse)
    anomaly_threshold = mean_mse + 2 * std_mse # Using Mean + 2*StdDev as threshold

    print(f"\nCalculated Anomaly Threshold (Mean + 2*StdDev): {anomaly_threshold}")

    # Identify samples in the testing data with a reconstruction error above the threshold
    potential_anomaly_indices = np.where(mse > anomaly_threshold)[0]

    # Retrieve the corresponding file paths using the indices
    anomalous_file_paths = [all_test_file_paths[i] for i in potential_anomaly_indices]

    print(f"\nNumber of potential anomalies detected in the testing data: {len(potential_anomaly_indices)}")

    if len(anomalous_file_paths) > 0:
        print("File paths of potential anomalies:")
        # Print the first few file paths if there are many
        for i, file_path in enumerate(anomalous_file_paths[:10]):
            print(f"- {file_path}")
        if len(anomalous_file_paths) > 10:
            print("...")
    else:
        print("No potential anomalies detected above the calculated threshold.")

else:
    print("Reconstruction errors (mse) or all_test_file_paths not available. Cannot identify anomaly files.")

Identifying potential anomalies and their corresponding files...

Calculated Anomaly Threshold (Mean + 2*StdDev): 33.88664627075195

Number of potential anomalies detected in the testing data: 21
File paths of potential anomalies:
- /tmp/extracted_testing_data/spectrograms_anomolous/spec_773MHz_20251025_082303.npy
- /tmp/extracted_testing_data/spectrograms_anomolous/spec_103MHz_20251025_082148.npy
- /tmp/extracted_testing_data/spectrograms_anomolous/spec_792MHz_20251025_082304.npy
- /tmp/extracted_testing_data/spectrograms_anomolous/spec_800MHz_20251025_082305.npy
- /tmp/extracted_testing_data/spectrograms_anomolous/spec_934MHz_20251025_082316.npy
- /tmp/extracted_testing_data/spectrograms_anomolous/spec_153MHz_20251025_082155.npy
- /tmp/extracted_testing_data/spectrograms_anomolous/spec_783MHz_20251025_082304.npy
- /tmp/extracted_testing_data/spectrograms_anomolous/spec_93MHz_20251025_082145.npy
- /tmp/extracted_testing_data/spectrograms_anomolous/spec_802MHz_20251025_082305.npy
- /tm

## List anomalous files

### Subtask:
Present a list of the file names from the `spectrograms_anomolous` folder that contain samples identified as potential anomalies.


**Reasoning**:
Extract the base file names from the anomalous file paths and print them with a clear heading.



In [55]:
import os

if 'anomalous_file_paths' in locals() and anomalous_file_paths:
    # Extract the base file names
    anomalous_base_file_names = [os.path.basename(f) for f in anomalous_file_paths]

    print("\nPotential Anomaly Files:")
    for file_name in anomalous_base_file_names:
        print(file_name)
else:
    print("No anomalous file paths found to list.")


Potential Anomaly Files:
spec_773MHz_20251025_082303.npy
spec_103MHz_20251025_082148.npy
spec_792MHz_20251025_082304.npy
spec_800MHz_20251025_082305.npy
spec_934MHz_20251025_082316.npy
spec_153MHz_20251025_082155.npy
spec_783MHz_20251025_082304.npy
spec_93MHz_20251025_082145.npy
spec_802MHz_20251025_082305.npy
spec_809MHz_20251025_082306.npy
spec_927MHz_20251025_082316.npy
spec_88MHz_20251025_082142.npy
spec_932MHz_20251025_082316.npy
spec_96MHz_20251025_082146.npy
spec_804MHz_20251025_082305.npy
spec_941MHz_20251025_082317.npy
spec_91MHz_20251025_082143.npy
spec_929MHz_20251025_082316.npy
spec_105MHz_20251025_082150.npy
spec_434MHz_20251025_082230.npy
spec_98MHz_20251025_082148.npy


### Rank Anomalies by Reconstruction Error

Rank the identified potential anomalies based on their reconstruction error (MSE) and list the top anomalies.

In [56]:
import numpy as np
import os

# Assuming mse (reconstruction errors) is available from evaluation (cell 1c59cd17)
# Assuming potential_anomaly_indices is available from anomaly identification (cell 560fe23c)
# Assuming all_test_file_paths is available from evaluation with file paths (cell a689d191 or 9bfca8c9)

if 'mse' in locals() and 'potential_anomaly_indices' in locals() and 'all_test_file_paths' in locals():
    print("Ranking potential anomalies by reconstruction error...")

    # Get the MSE values for the potential anomalies
    anomalous_mse_values = mse[potential_anomaly_indices]

    # Sort the anomaly indices based on their MSE values in descending order
    # Get the indices that would sort the anomalous_mse_values in descending order
    ranked_indices_in_anomalous_list = np.argsort(anomalous_mse_values)[::-1]

    # Use these sorted indices to get the original indices of the anomalies, ranked
    ranked_anomaly_indices = potential_anomaly_indices[ranked_indices_in_anomalous_list]

    print(f"\nFound {len(ranked_anomaly_indices)} potential anomalies to rank.")

    if len(ranked_anomaly_indices) > 0:
        print("\nRanked Potential Anomalies (by MSE):")
        print("Rank | MSE | File Name")
        print("-----|--------------------|-----------")
        # Display the top anomalies (e.g., top 10)
        num_anomalies_to_show = min(20, len(ranked_anomaly_indices)) # Show up to 20 or fewer if less detected

        for rank, original_index in enumerate(ranked_anomaly_indices[:num_anomalies_to_show]):
            corresponding_mse = mse[original_index]
            corresponding_file = all_test_file_paths[original_index]
            # Extract just the base file name for cleaner display
            file_name = os.path.basename(corresponding_file)
            print(f"{rank + 1:<4} | {corresponding_mse:<18.6f} | {file_name}")

        if len(ranked_anomaly_indices) > num_anomalies_to_show:
            print("...")
    else:
        print("No potential anomalies were identified to rank.")

else:
    print("Required variables (mse, potential_anomaly_indices, all_test_file_paths) not available. Cannot rank anomalies.")

Ranking potential anomalies by reconstruction error...

Found 21 potential anomalies to rank.

Ranked Potential Anomalies (by MSE):
Rank | MSE | File Name
-----|--------------------|-----------
1    | 40.882244          | spec_932MHz_20251025_082316.npy
2    | 38.116680          | spec_929MHz_20251025_082316.npy
3    | 37.925835          | spec_96MHz_20251025_082146.npy
4    | 37.772583          | spec_88MHz_20251025_082142.npy
5    | 37.692722          | spec_804MHz_20251025_082305.npy
6    | 37.340153          | spec_941MHz_20251025_082317.npy
7    | 36.620926          | spec_802MHz_20251025_082305.npy
8    | 35.902058          | spec_105MHz_20251025_082150.npy
9    | 35.669212          | spec_927MHz_20251025_082316.npy
10   | 35.273869          | spec_91MHz_20251025_082143.npy
11   | 35.176708          | spec_800MHz_20251025_082305.npy
12   | 34.982578          | spec_809MHz_20251025_082306.npy
13   | 34.649902          | spec_103MHz_20251025_082148.npy
14   | 34.535919          | s

## Summary:

### Data Analysis Key Findings

*   The `SpectrogramDataGenerator` class was successfully modified to return the original file paths along with the batch data.
*   The evaluation loop was updated to collect the original test data, reconstructions, and corresponding file paths for each sample.
*   A total of 704 test file paths were successfully collected, matching the number of samples processed.
*   An anomaly threshold was calculated based on the mean and standard deviation of the reconstruction errors (MSE).
*   Using this threshold, 21 potential anomalies were identified in the testing data.
*   The file paths corresponding to these 21 anomalies were successfully retrieved and listed.

### Insights or Next Steps

*   Further analysis could involve investigating the characteristics of the identified anomalous spectrograms to understand the types of anomalies the autoencoder is detecting.
*   The anomaly threshold could be further refined or determined using alternative methods (e.g., percentile-based) to potentially improve the precision or recall of anomaly detection.
