You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The SAM image processor takes images as input and resizes them so that the longest edge is 1024 (using default values). This is the size expect as input fo the SAM model.
For inference, this works fine as only the images need resizing but for fine-tuning as per this tutorial, you need to resize both your images and your masks as the SAM model produces pred_masks with size 256x256. If I don't resize my masks I get ground truth has different shape (torch.Size([2, 1, 768, 1024])) from input (torch.Size([2, 1, 256, 256])) when trying to calculate loss.
To fix this, I've currently written a resize and pad function into my code:
and then have added this to my definition of SAMDataset:
class SAMDataset(Dataset):
def __init__(self, dataset, processor, transform = None):
self.dataset = dataset
self.processor = processor
self.transform = transform
def __len__(self):
return len(self.dataset)
def __getitem__(self, idx):
item = self.dataset[idx]
if self.transform:
image = self.transform(item["pixel_values"])
else:
image = item["pixel_values"]
# get bounding box prompt
padded_mask = process_mask(item["label"])
prompt = get_bounding_box(padded_mask)
# prepare image and prompt for the model
inputs = self.processor(image, input_boxes=[[prompt]], return_tensors="pt")
# remove batch dimension which the processor adds by default
inputs = {k:v.squeeze(0) for k,v in inputs.items()}
# add ground truth segmentation
inputs["ground_truth_mask"] = padded_mask
return inputs
This seems to work fine.
What I think would be good is to allow input of masks in the SAM image processor. For example, the Segformer image processor takes images and masks as inputs and resizes both to the size expected by the Segformer model.
I have also seen there is a 'post_process_mask' method in the SAM image processor but I am unsure how to implement this in the tutorial I'm following. If you think this is a better way vs. what I am suggesting then please could you explain where I would add this in the code from the tutorial notebook.
Motivation
Easier fine tuning of SAM model.
Your contribution
I could try write a PR for this and/or make a PR to update the notebook instead .
The text was updated successfully, but these errors were encountered:
Feature request
The SAM image processor takes images as input and resizes them so that the longest edge is 1024 (using default values). This is the size expect as input fo the SAM model.
For inference, this works fine as only the images need resizing but for fine-tuning as per this tutorial, you need to resize both your images and your masks as the SAM model produces
pred_masks
with size 256x256. If I don't resize my masks I getground truth has different shape (torch.Size([2, 1, 768, 1024])) from input (torch.Size([2, 1, 256, 256]))
when trying to calculate loss.To fix this, I've currently written a resize and pad function into my code:
and then have added this to my definition of SAMDataset:
This seems to work fine.
What I think would be good is to allow input of masks in the SAM image processor. For example, the Segformer image processor takes images and masks as inputs and resizes both to the size expected by the Segformer model.
I have also seen there is a 'post_process_mask' method in the SAM image processor but I am unsure how to implement this in the tutorial I'm following. If you think this is a better way vs. what I am suggesting then please could you explain where I would add this in the code from the tutorial notebook.
Motivation
Easier fine tuning of SAM model.
Your contribution
I could try write a PR for this and/or make a PR to update the notebook instead .
The text was updated successfully, but these errors were encountered: