Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Grounding DINO #2114

Open
innat opened this issue Oct 23, 2023 · 3 comments
Open

Add Grounding DINO #2114

innat opened this issue Oct 23, 2023 · 3 comments
Assignees

Comments

@innat
Copy link
Contributor

innat commented Oct 23, 2023

Short Description

Zero-shot object detection model.

image

Papers

https://arxiv.org/abs/2303.05499

Existing Implementations

https://github.com/IDEA-Research/GroundingDINO

Other Information

  • Combination with (a). stable diffusion or (b). segment anything, etc, the applications possibility are huge.
  • pre-requisite:
    • image backbone: swin-transformer
    • text backbone: bert
@innat
Copy link
Contributor Author

innat commented Jan 15, 2024

TODO
Components

@tirthasheshpatel
Copy link
Contributor

tirthasheshpatel commented Feb 1, 2024

@innat Are you planning/volunteering to work on this or any of the components?

I see you proposed #2319 which seems like a replication of the SWIN transformer of the Grounding DINO implementation. Thanks for the PR!

This is next on my TODO list. Let me know if you want to take up something if you have time, I can help review and test! I can take the rest of the components and weights transfer. BTW the list of components with references is very useful, thanks!

@innat
Copy link
Contributor Author

innat commented Feb 2, 2024

@tirthasheshpatel
The #2319 is about video-swin modelling, and I think the grounding-dino (g-dino) needs image-swin model, so this issue needs to be progressed first as a prerequisite of current issue. Here is one of the reimplementation of image-swin model in keras 2.

The above components of g-dino are some of high level components. But same as DETR, it also has custom cuda operations which might make complication to add. But other compoents can be added one by one initially. If you are currently working on it, please continue. If I could manage some time, I will contribute rest of the components. This kind of model (zsl detection) is quite useful and surly it will add value to keras-cv.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants