New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
For OWL-ViT, is there a demo which shows the way using image patch as querys to do one-shot detection? #325
Comments
Hi, we're actively working on this demo and will let you know when it's available, hopefully some time next week. |
@mjlm And what prompt is used in coco evaluation? In the paper, it says it uses the seven best prompts, so what are the seven best text prompts? Thanks. |
The prompts can be found in the CLIP repository. During inference we used the 7 ensembling prompts from the colab. |
Is this still in the works? I've been interested in seeing how image input queries could be used as well |
Hi, is this one-shot detection demo finished? I'm also very interested in it and want to try. |
We're still working on this and will let you know here when the demo is ready. I re-opened the issue to keep track. |
That would be very nice, thank you! |
We just added a Playground Colab with an interactive demo of both text-conditioned and image-conditioned detection: The underlying code illustrates how to extract an embedding for a given image patch, specifically here: https://github.com/google-research/scenic/blob/main/scenic/projects/owl_vit/notebooks/inference.py#L110-L131 Let us know if you have any questions! |
Thanks for your reply!I don't have any problems now. |
Hi, @mjlm , thanks for your great work! I wonder if there are any plans to implement multi-query image-conditioned detection. Sometimes a single query image is often unable to capture all the features of an object, and using multiple query images to represent it can yield better results. thanks again! |
You can simply average the embeddings of multiple boxes to get a query embedding. This is how we implemented few-shot (i.e. more than one-shot) detection in the paper. #890 will add example code for image-conditioned detection to the colab. The example shows how to get a |
Hi, thanks for your great work. And the demo of text zero-shot is amazing.
For OWL-ViT, is there a demo which shows the way using image patch as querys to do one-shot detection?
Thanks.
The text was updated successfully, but these errors were encountered: