Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple Points, Labels, and Boxes while Batch Prompting #111

Open
Jordan-Pierce opened this issue Apr 8, 2023 · 15 comments
Open

Multiple Points, Labels, and Boxes while Batch Prompting #111

Jordan-Pierce opened this issue Apr 8, 2023 · 15 comments
Labels
how-to Request for help on how to do something specific

Comments

@Jordan-Pierce
Copy link

Thanks again for the release, very useful!

I'm currently trying to do batch prompting by providing bounding boxes (creating masks for objects already annotated), and I've noticed that sometimes the boxes alone do not create entire masks. An idea to solve this was to provide a bounding box, and multiple points sampled within that box, and hopefully together that would create a better mask for those edge cases.

The notebook provides a clear example of how to perform a prediction for a single point, and a single bounding box, as well as multiple bounding boxes, but not multiple points (w/ labels) and bounding boxes. When trying to do this, I keep running into an error and it's not clear if I'm doing it incorrectly, or if you cannot do this. Below is an example of what I thought would work:

image = cv2.imread(cocodict['images'][0]['path'])
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

input_boxes = torch.tensor([b['bbox'] for b in cocodict['annotations']], device=predictor.device)

# Returns a list of 10 points, per bounding box, np array
input_points = sample_points(input_boxes, 10)
input_points = torch.tensor(input_points, device=predictor.device)

transformed_boxes = predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])
transformed_points = predictor.transform.apply_points_torch(input_points, image.shape[:2])
transformed_labels = torch.tensor([1 for _ in range(len(input_points))], device=predictor.device)

predictor.set_image(image)

masks, _, _ = predictor.predict_torch(
    point_coords=transformed_points,
    point_labels=transformed_labels,
    boxes=transformed_boxes,
    multimask_output=False,
)

I understand that I can get the results I'm looking for by breaking up the process one-by-one and just join the masks, but I'd like to know if there is a working solution for this approach. Thanks.

@zdhernandez
Copy link

@Jordan-Pierce Is this the same issue ? #115

@JordanMakesMaps
Copy link

Hi @zdhernandez, not quite. Thanks for the response though.

@HannaMao
Copy link
Contributor

HannaMao commented Apr 9, 2023

Did you check the shape of your transformed_points and transformed_labels? transformed_points should be a a BxNx2 tensor and transformed_labels should be a BxN tensor. N is the number of sampled points here.

@nikhilaravi nikhilaravi added the how-to Request for help on how to do something specific label Apr 12, 2023
@5663015
Copy link

5663015 commented Apr 14, 2023

@HannaMao @Jordan-Pierce I have the same issue. I found that the number of points and boxes must be the same, otherwise error happens (8 boxes and 7 points):

Traceback (most recent call last):
  File "D:\python\lib\site-packages\gradio\routes.py", line 395, in run_predict
    output = await app.get_blocks().process_api(
  File "D:\python\lib\site-packages\gradio\blocks.py", line 1193, in process_api
    result = await self.call_function(
  File "D:\python\lib\site-packages\gradio\blocks.py", line 916, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "D:\python\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "D:\python\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "D:\python\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "F:\code\segment_anything_webui\inference.py", line 183, in run_inference
    return predictor_inference(device, model_type, input_x, input_text, selected_points, owl_vit_threshold)
  File "F:\code\segment_anything_webui\inference.py", line 145, in predictor_inference
    masks, scores, logits = predictor.predict_torch(
  File "D:\python\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "D:\python\lib\site-packages\segment_anything\predictor.py", line 222, in predict_torch
    sparse_embeddings, dense_embeddings = self.model.prompt_encoder(
  File "D:\python\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\python\lib\site-packages\segment_anything\modeling\prompt_encoder.py", line 162, in forward
    sparse_embeddings = torch.cat([sparse_embeddings, box_embeddings], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 7 but got size 8 for tensor number 1 in the list.

@Jordan-Pierce
Copy link
Author

@5663015 thanks for double checking; @HannaMao can you confirm or deny what we're trying to do is possible given the current implementation of SAM?

Alternatively, we can always run it twice, with points, and with images, and find a way to join masks.

@akashAD98
Copy link

@nikhilaravi @Jordan-Pierce im exactly looking the same. Have you solved this issue? my object detector is detecting two objects but my SAM model only detecting a mask on 1 object, how can we do it on multiple objects in single image?

@VladMVLX
Copy link

VladMVLX commented Apr 30, 2023

I guess at the moment you need to run decoder on every single object separately.
I am trying the same approach when detector (yolo) detects multiple objects and then coordinates of those boxes passed to the decoder as:

                        point = index * 2
                        point_coords[0, point, 0] = rect.Left
                        point_coords[0, point, 1] = rect.Top
                        point_coords[0, point + 1, 0] = rect.Right
                        point_coords[0, point + 1, 1] = rect.Bottom 

                        point_labels[0, point]     = 2
                        point_labels[0, point + 1] = 3

When its only 1 object everything works fine, but it returns complete mess when there are more than 1 object detected by the object detector.

Please tell me I am wrong about it, and simply doing mistakes when inferencing SAM detector.

@kjtheron
Copy link

kjtheron commented May 5, 2023

Has apply_points_torch() been removed? I am unable to find it. Using the code above gives me an AttributeError.

AttributeError: 'ResizeLongestSide' object has no attribute 'apply_points_torch'

What should be used instead to perform batch prompting using points?

@stharajkiran
Copy link

stharajkiran commented May 15, 2023

Did you check the shape of your transformed_points and transformed_labels? transformed_points should be a a BxNx2 tensor and transformed_labels should be a BxN tensor. N is the number of sampled points here.

From SAM github,
point_coords (torch.Tensor or None): BxNx2
point_labels (torch.Tensor or None): A BxN array of labels
boxes (np.ndarray or None): A Bx4 array given a box prompt to the model, in XYXY format.

Can I ask what B and N mean here in BxNx2 tensor size? Also, points and boxes share B as a common parameter. So, do I need the same number of points and boxes?

Thank you.

@JasseurHadded1
Copy link

Did you check the shape of your transformed_points and transformed_labels? transformed_points should be a a BxNx2 tensor and transformed_labels should be a BxN tensor. N is the number of sampled points here.

From SAM github, point_coords (torch.Tensor or None): BxNx2 point_labels (torch.Tensor or None): A BxN array of labels boxes (np.ndarray or None): A Bx4 array given a box prompt to the model, in XYXY format.

Can I ask what B and N mean here in BxNx2 tensor size? Also, points and boxes share B as a common parameter. So, do I need the same number of points and boxes?

Thank you.

The first question:
B: number of boxes
N: number of point samples for each box

The second question: No, for each box you need a number of points (I think you need a fixed number of points for each box, it means you can't use 4 points for the first box and 5 points for the second box, but I'm not sure about that)

@HobbitArmy
Copy link

That is correct, each box needs a sequence of point coords.

@ByungKwanLee
Copy link

https://github.com/ByungKwanLee/Full-Segment-Anything addresses the ciritical issues of SAM, which supports batch-input on the full-grid prompt (automatic mask generation) with post-processing: removing duplicated or small regions and holes, under flexible input image size

@Jordan-Pierce
Copy link
Author

Thanks @ByungKwanLee

@stharajkiran
Copy link

stharajkiran commented Oct 28, 2023

Did you check the shape of your transformed_points and transformed_labels? transformed_points should be a a BxNx2 tensor and transformed_labels should be a BxN tensor. N is the number of sampled points here.

From SAM github, point_coords (torch.Tensor or None): BxNx2 point_labels (torch.Tensor or None): A BxN array of labels boxes (np.ndarray or None): A Bx4 array given a box prompt to the model, in XYXY format.
Can I ask what B and N mean here in BxNx2 tensor size? Also, points and boxes share B as a common parameter. So, do I need the same number of points and boxes?
Thank you.

The first question: B: number of boxes N: number of point samples for each box

The second question: No, for each box you need a number of points (I think you need a fixed number of points for each box, it means you can't use 4 points for the first box and 5 points for the second box, but I'm not sure about that)

image

Here, I have met all the size requirements, still I get an error saying "too many indices for tensor of dimension 3" for the points. I have no idea how to figure the issue. if i use only the boxes, it runs fine though!

Here, B = 4, N = 1

@heyoeyo
Copy link

heyoeyo commented Oct 29, 2023

Here, I have met all the size requirements, still I get an error saying "too many indices for tensor of dimension 3" for the points. I have no idea how to figure the issue. if i use only the boxes, it runs fine though!

It's hard to say without seeing the error, but one possible issue is that the predict_torch(...) function is being given point_labels= input_label instead of transformed_labels. Otherwise, the input shapes seem fine (I can run this without errors using randomly generated data of the same shape at least).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
how-to Request for help on how to do something specific
Projects
None yet
Development

No branches or pull requests

15 participants