Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3D object detection #369

Open
hoangsep opened this issue Jan 28, 2024 · 5 comments
Open

3D object detection #369

hoangsep opened this issue Jan 28, 2024 · 5 comments

Comments

@hoangsep
Copy link

What do you guys think about using multiple cameras with dinov2 for 3D object detection for robotics? Does it make sense?

@ccharest93
Copy link

the model takes one image as an input. You can process your multiple image sequentially, but then they wouldn't share any information. There is probably better models out there for that, but it could still be interesting to try

@dingkwang
Copy link

That's certainly possible. @hoangsep We can work on this together.

@hoangsep
Copy link
Author

@ccharest93 are you aware of any better model for this task? I am a total noob so I am not sure how this can be done. I wonder how companies like Tesla do 3D object detection.

I am thinking of something like stitching multiple camera image together (maybe side by side) and run them through the network? Or have multiple networks running in parallel, then take all the output (from 1 of the top layers) and pass them though a second network.

@hoangsep
Copy link
Author

@dingkwang I would love to. I am a total noob so I probably won't be able to do much, but I would love to explore this.

@ccharest93
Copy link

ccharest93 commented Jan 31, 2024

I haven't looked at 3D models, you would probably need something more than stitching. Models are great at learning but you want to give them as much prior information as possible. Stitching two images together kinda defeats that purpose, since the model would have to learn to unstitch them first (not to mention the poor scaling as image number increases; transformer networks dont scale linearly with input size). I do like the idea of first passing each image through a normal model like dino and then doing something with the resulting patch embeddings so to create information channels between similar patches. As for the exact architecture, thats something youd have to figure out yourself. I good starting point would be setting up this model in inference mode, passing your image sets through it and then doing statistical analysis on the resulting patch embeddings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants