-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finetuning #5
Comments
Information on fine tuning would be great. |
this thread is referenced as the answer for similar questions, but I don't think there is an answer here for transfer learning? |
Look forward to finetuning |
I would love to be able to fine tune the model for specific datasets as well. |
Do we wait for Meta to provide a training/fine-tuning script? Or should the open source hivemind write it? |
Has anyone tried the idea of what may be called "point prompt engineering"? i.e. training a separate model that learns how to put positive prompt points and negative prompt points, such that these points prompt SAM to select target objects from a custom dataset. Or we can just summarize strategies and best practices in terms of placing positive and negative prompt points/prompt boxes, similar to how GPT/DALLE users summarize the best ways to write prompts. Wonder if this could be one way to fine-tune the SAM model when only a limited amount of annotations are available. Happy to discuss more if anyone wants to work together and try it out. |
+1, Looking forward to fine-tuning the SAM model on the custom dataset.:) |
I am attempting some fine tuning in this repo. Perhaps people can find use in it. The biggest thing I figured out is that you have to break up the |
Could you please recommend the minimum hardware configuration for fine-tuning the SAM? eg. 4090 x 4? |
I can get the smallest pre-trained model ( |
I have access to a 4 x A100 /w 80G if you want me to test something. |
hi @hu-po , Thanks for sharing the fine-tuning code very much. Would it be possible for you to give guidance on how to prepare the customized dataset (e.g., data format and folder structures)? |
Thank me when I get it to work 😭 this is more complicated than anticipated. |
+1, interested in fine-tuning it for coral reef images. |
+1 interested in fine-tuning it for cracking on roads. |
+1 🙌 |
+1 interested in fine-tuning! |
+1, I'd like to do some vehicle detection on low quality images! |
+1 interested in fine tunning prompt encoder or mask decoder! |
+1! I would be interested in fine-tuning the model for medical image analysis |
I'm curious that is it possible to point out an unknown object have not been learned (like anomaly detection) by text prompt if I fine-tune with custom data. |
+1! |
@hu-po |
Thanks for the sharing. From the code in the link, what you mean is in order to further improve the fine-tuned SAM we need to retrain the model on SAM dataset with the architecture close to the original SAM right? Currently, is there anyway to fine tune the encoder in the original SAM? |
The blog that @alex-encord shared works fine if you have 1 bounding box per image. But in reality you will have multiple bounding box in an image. Below code fails if you have multiple bounding box
Am I missing something? |
Can you explain how fail @nahidalam? What is the error or stacktrace? |
@nhhung1810 @alex-encord this is the error I get
I get this error because I have 19 bounding boxes in my first image Full stack trace
|
Hey @nahidalam Check this line and this line, you will see that your I suggest that you remove that |
Hi @nahidalam |
Question: Can we fine-tune the decoder to infer without prompt ?Possible Solution (Just thinking out loud): Fine-Tuning without Prompt or fixed static prompt >> Hence infer without prompt using fine-tuned model?
Hence, are there any experiments around where prompt was set to some pre-fixed static value or skipped completely and SAM decoder was fine-tuned to segment only the object of interest.
|
It is what I did. Finetune without any prompt because I do not have during inference time. |
@bertinma Thanks for your inputs. I was actually planning to try this out over the weekend. If possible, can you please share some insights on how you fine-tuned the decoder ? As in:
Thanks in advance and thanks again for sharing your initial inputs . |
@yogendra-yatnalkar I do not give any prompt. All my prompt values are None as points/masks/bboxes. |
Yes, as you said, the SAM models take N point prompts and converts it to shape: Nx256. Later, the encoder embedding and prompt embedding have 2 cross-attention blocks in the decoder. So since your model did not converge and during fine-tuning did not throw an error as well, guessing from my current shallow knowledge:
Thanks again for your insights, this will surely help me. |
@yogendra-yatnalkar I plan to do something similar. May you share with me your result ? Did you succeed ? If you did, may I ask how have you implemented that ? Thank you very much. |
Hi @TKPhuong, yes, I tried it on a very small sample and the results are very promising. I currently tried it on some data which I cannot share. I will surely replicate it on some kaggle or open-source dataset over the weekend and share that over here. |
Hi @yogendra-yatnalkar, interested to hear if you ended up replicating it with an open-source dataset? Thanks. |
Hi @TKPhuong @bertinma @AdrienneBerghAMAT Sorry for the late response. I actually had to opensource this NB with a small blog but I did not find time for that hence sharing the kaggle NB right away:
How does SAM work (high-level):
What I tried with model:
Training:
Results:
Future:
Will just convert this long text to a blog soon. Please let me know your thoughts on this experiments as well. |
FYI, here is one tutorial for finetuning SAM on your data: https://auto.gluon.ai/stable/tutorials/multimodal/image_segmentation/beginner_semantic_seg.html |
This is very useful, thanks @yogendra-yatnalkar |
|
Hi @wm-Githuber , @nahidalam , Did you find an answer for this? Can one fine tune with a mask containing multiple objects and multiple boxes or how should one do this? I created a training entry for each mask with one box and seems to work fine, but I wonder if using multiple boxes / objects is possible cause it would make training faster. Until now, nobody has done this. |
Hi readers, just 2 small updates from my previous message.
The kaggle NB: https://www.kaggle.com/code/yogendrayatnalkar/promptless-taskspecific-finetuning-of-metaai-sam |
@yogendra-yatnalkar Thanks for the nice work. Can you specify the GPU memory usage of fine tuning either SAM original mask decoder or your custom decoder without using prompts. I will be fine tuning SAM mask decoder only on my few shot medical images (maximum 50). But I have only 8GB GPU memory allocated from my university. Hence I am confused that 8GB would be enough or not to fine tune only mask decoder with few medical images. Thanks! |
Hi, I have finetuned a Sam model on a custom dataset, however i want to use the AutomaticMaskGenerator. When loading my finetuned model i get the following error: AttributeError: 'SamModel' object has no attribute 'image_encoder' Does anyone know why this is and are there any solutions ? |
@williamhoole I do not know the answer, but encounter the same error |
Where do you import the MaskDecoder from? Because I get this error when using
|
Where do you import the MaskDecoder from? Because I get this error when using
|
I am currently out of office, but I figured out what caused my error. I fine tuned my model using the segment anything model from hugging face. The issue is that hugging face models mask decoder is named differently. So when fine tuning the model using hugging faces SAM model and weights you cannot use Metas SAM automatic mask generator. I resolved my issue by using hugging faces automatic mask generation pipeline beginning of next week I am back in office and can take a closer look at your issue. Hopefully this helps you further for now. |
I squeezed the mask variable to have size of (# of batches, 1,size of mask,size of mask) rather than (# of batches, 1,1 ,size of mask,size of mask) |
Is there any plans to release scripts for finetuning the model?
Also you did such a great work! Thank you very much!
The text was updated successfully, but these errors were encountered: