# Ultrasound Microrobot Preprocessing and Dataset

This notebook demonstrates a modular approach to:

1. **Preprocess the Ultrasound Data**: Convert ultrasound images (1920×1080) and corresponding bounding-box label files into a single `.amat` file per microrobot type.
2. **Create a PyTorch Dataset**: Define a dataset class (`USMicroMagDataset`) that loads the `.amat` file, reshapes the image data, and makes it available to your model.

The folder structure is assumed to be as follows:

```
microrobot_folder/
├── sample.yaml
├── images/
│   ├── train/   (contains many .png images)
│   ├── val/     (contains many .png images)
│   └── test/    (contains many .png images)
└── labels/
    ├── train/   (each image has a corresponding .txt file with bounding box info)
    ├── val/
    └── test/
```

The `sample.yaml` might look like:

```yaml
path: ../dataset/cylinder
train:
  - images/train
val:
  - images/val
test:
  - images/test
nc: 1 
names: ["cylinder"]
```

# E(n)-Equivariant Steerable CNNs  -  A concrete example


In [1]:
import os
import yaml
import numpy as np
from PIL import Image

print('Modules imported successfully.')

Modules imported successfully.


## Part 1. Preprocess Ultrasound Data

This function reads ultrasound images and corresponding label files, then writes a space-delimited `.amat` file. Each row in the `.amat` file contains the flattened pixel values (from a 1920×1080 grayscale image) followed by the label values (for example, 5 numbers representing a bounding box).

In [2]:
def preprocess_ultrasound_data(microrobot_folder, max_images=None):
    """
    Preprocess ultrasound images and labels for a given microrobot type.

    Args:
        microrobot_folder (str): Path to the folder containing sample.yaml, images/, and labels/
        max_images (int or None): Maximum number of images to include per split (train/test).
                                  If None, includes all available images.
    
    Expects the following folder structure inside `microrobot_folder`:
    
        sample.yaml
        images/
            train/   -- training images (.png)
            test/    -- testing images (.png)
            val/     -- validation images (.png)
        labels/
            train/   -- training label files (.txt)
            test/    -- testing label files (.txt)
            val/     -- validation label files (.txt)
    
    The sample.yaml file is assumed to contain, for example:
    
        path: ../dataset/cylinder
        train:
          - images/train
        val:
          - images/val
        test:
          - images/test
        nc: 1 
        names: ["cylinder"]
    
    For training, we combine images from both "train" and "val" splits.
    Each output .amat file will have one row per image:
       [ flattened_pixels (1920x1080)  label_values ]
    """
    # Path to the sample.yaml file
    print("Current working directory:", os.getcwd())

    sample_yaml_path = os.path.join(microrobot_folder, "sample.yaml")
    with open(sample_yaml_path, "r") as f:
        config = yaml.safe_load(f)
    
    # Define splits: combine "train" and "val" for training; test remains separate
    splits = {"train": [], "test": []}
    
    # Combine train and validation splits
    for key in ["train", "val"]:
        if key in config and config[key]:
            for rel_dir in config[key]:
                splits["train"].append(os.path.join(microrobot_folder, rel_dir))
    
    # Test split
    if "test" in config and config["test"]:
        for rel_dir in config["test"]:
            splits["test"].append(os.path.join(microrobot_folder, rel_dir))

    # Process each split
    for mode, img_dirs in splits.items():
        data_rows = []
        count = 0  # Track how many images have been added
        for img_dir in img_dirs:
            # Determine corresponding labels directory by replacing "images" with "labels"
            label_dir = img_dir.replace("images", "labels")
            # List all PNG files in this directory
            image_files = sorted([f for f in os.listdir(img_dir) if f.lower().endswith(".png")])
            for img_file in image_files:
                if max_images is not None and count >= max_images:
                    break  # Stop if limit reached
                # Full path to the image
                img_path = os.path.join(img_dir, img_file)
                # Open image, convert to grayscale ('F')
                image = Image.open(img_path).convert('F')
                # Convert to numpy array and flatten (original size: 1920x1080)
                img_array = np.array(image, dtype=np.float32).flatten()
                
                # Find the corresponding label file
                label_filename = os.path.splitext(img_file)[0] + ".txt"
                label_path = os.path.join(label_dir, label_filename)
                with open(label_path, "r") as lf:
                    # For example: "0 0.569076 0.381246 0.115152 0.130603"
                    label_line = lf.readline().strip()
                    label_values = [float(x) for x in label_line.split()[1:]]
                
                # Concatenate flattened image and label values
                row = np.concatenate([img_array, np.array(label_values, dtype=np.float32)])
                data_rows.append(row)
                count += 1
            
            if max_images is not None and count >= max_images:
                break  # Don't process more folders if limit is reached
        
        # If any data is found, stack and save as .amat
        if data_rows:
            data_matrix = np.vstack(data_rows)
            microrobot_type = config["names"][0]  # e.g., "cylinder"
            suffix = f"_{max_images}" if max_images is not None else ""
            amat_filename = f"USMicroMagSet_processed/ultrasound_{microrobot_type}_{mode}{suffix}.amat"
            np.savetxt(amat_filename, data_matrix, fmt="%.6f")
            print(f"Saved {amat_filename} with shape {data_matrix.shape}")
        else:
            print(f"No images found for split {mode} in {microrobot_folder}.")

# Example usage (uncomment and set your folder path):
preprocess_ultrasound_data("UsMicroMagSet-main/flagella")

Current working directory: /Users/hibrahim/Documents/Class/Machine Learning/research_coding/code
Saved USMicroMagSet_processed/ultrasound_flagella_train.amat with shape (1956, 2073605)
Saved USMicroMagSet_processed/ultrasound_flagella_test.amat with shape (1054, 2073605)
