<a href="https://colab.research.google.com/github/martintmv-git/RB-IBDM/blob/main/Experiments/Generating%20Masks%20with%20SAM/Success%20with%20test%20dataset/sam_test_dataset_generate_masks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Segment Anything by Meta AI - Dataset Segmentation Test
### Generating object masks with SAM for RB-IBDM and saving them in Google Drive
> Test was made on 273 images taken out of the 27649 images dataset.

## Before starting

Make sure you are connected to a GPU. `(I used V100)`

In [1]:
!nvidia-smi

Wed Mar 27 14:24:15 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla V100-SXM2-16GB           Off | 00000000:00:04.0 Off |                    0 |
| N/A   33C    P0              24W / 300W |      0MiB / 16384MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

**NOTE:** To make it easier for us to manage datasets, images and models we create a `HOME` constant.

In [2]:
import os
HOME = os.getcwd()
print("HOME:", HOME)

HOME: /content


# Install Segment Anything Model (SAM) and dependencies

In [3]:
!pip install -q 'git+https://github.com/facebookresearch/segment-anything.git'

  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for segment-anything (setup.py) ... [?25l[?25hdone


# Download SAM weights

In [4]:
!mkdir -p {HOME}/weights
!wget -q https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth -P {HOME}/weights

In [5]:
import os

CHECKPOINT_PATH = os.path.join(HOME, "weights", "sam_vit_h_4b8939.pth")
print(CHECKPOINT_PATH, "; exist:", os.path.isfile(CHECKPOINT_PATH))

/content/weights/sam_vit_h_4b8939.pth ; exist: True


# Download Insect Data

In [11]:
!mkdir -p {HOME}/data

import os
from google.colab import drive

drive.mount('/content/drive')

# Dataset of images
dataset_path = '/content/drive/MyDrive/images_clean_test'

# Where the generated masks will be saved
save_path = '/content/drive/MyDrive/diopsis_masks_test'

# Ensure the directory exists
if not os.path.exists(save_path):
    os.makedirs(save_path)
    print(f"Directory created for saving masks: {save_path}")
else:
    print(f"Save directory already exists: {save_path}")

# Counting the number of images in the dataset
num_images = len(os.listdir(dataset_path))
print(f"Number of images read in dataset: {num_images}")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Directory created for saving masks: /content/drive/MyDrive/diopsis_masks_test
Number of images read in dataset: 273


# Load Model

In [7]:
import torch

DEVICE = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
MODEL_TYPE = "vit_h"

In [8]:
from segment_anything import sam_model_registry, SamAutomaticMaskGenerator, SamPredictor

sam = sam_model_registry[MODEL_TYPE](checkpoint=CHECKPOINT_PATH).to(device=DEVICE)

# Automated Mask Generation

To run automatic mask generation, provide a SAM model to the `SamAutomaticMaskGenerator` class. Set the path below to the SAM checkpoint. Running on CUDA and with the default model is recommended.

In [9]:
mask_generator = SamAutomaticMaskGenerator(sam)

### Generate masks with SAM

In [12]:
import cv2
from PIL import Image
import numpy as np

def save_masks_to_drive(masks, save_path, image_name):
    if masks:  # Check if there is at least one mask
        try:
            img = Image.fromarray((masks[0] * 255).astype(np.uint8))  # Use only the first mask
            mask_file_path = os.path.join(save_path, f'mask_{image_name}_0.png')  # Name for the first mask
            img.save(mask_file_path)
            print(f"Successfully saved mask to drive: {mask_file_path}")
        except Exception as e:
            print(f"Failed saving mask to drive for {image_name}: {e}")

def process_images_in_batches(dataset_path, save_path, batch_size=100):
    image_paths = [os.path.join(dataset_path, f) for f in os.listdir(dataset_path) if os.path.isfile(os.path.join(dataset_path, f))]
    for i in range(0, len(image_paths), batch_size):
        batch_paths = image_paths[i:i+batch_size]
        for path in batch_paths:
            try:
                print(f"Processing: {path}")
                image_bgr = cv2.imread(path)
                image_rgb = cv2.cvtColor(image_bgr, cv2.COLOR_BGR2RGB)
                sam_result = mask_generator.generate(image_rgb)
                masks = [mask['segmentation'] for mask in sorted(sam_result, key=lambda x: x['area'], reverse=True)]

                # Saving only the first mask as it's the most valuable for later training
                save_masks_to_drive(masks, save_path, os.path.basename(path).replace('.jpg', '').replace('.png', ''))
                print(f"Successfully processed and saved masks for: {path}")
            except Exception as e:
                print(f"Error processing {path}: {e}")

# Process the images in batches
process_images_in_batches(dataset_path, save_path, batch_size=100)

Processing: /content/drive/MyDrive/images_clean_test/220_20210901234955_215.jpg
Successfully saved mask to drive: /content/drive/MyDrive/diopsis_masks_test/mask_220_20210901234955_215_0.png
Successfully processed and saved masks for: /content/drive/MyDrive/images_clean_test/220_20210901234955_215.jpg
Processing: /content/drive/MyDrive/images_clean_test/196_20200808233933_761.jpg
Successfully saved mask to drive: /content/drive/MyDrive/diopsis_masks_test/mask_196_20200808233933_761_0.png
Successfully processed and saved masks for: /content/drive/MyDrive/images_clean_test/196_20200808233933_761.jpg
Processing: /content/drive/MyDrive/images_clean_test/148_20200719233209_824.jpg
Successfully saved mask to drive: /content/drive/MyDrive/diopsis_masks_test/mask_148_20200719233209_824_0.png
Successfully processed and saved masks for: /content/drive/MyDrive/images_clean_test/148_20200719233209_824.jpg
Processing: /content/drive/MyDrive/images_clean_test/220_20210925013602_1385.jpg
Successfully 

## Checking if the number of images in the dataset match the number of processed images

In [14]:
num_images = len(os.listdir(dataset_path))
print(f"Number of images read in dataset: {num_images}")

# ------------------------------------------------------

num_images = len(os.listdir(save_path))
print(f"Number of images read in processed images: {num_images}")

Number of images read in dataset: 273
Number of images read in processed images: 273
