-
Notifications
You must be signed in to change notification settings - Fork 682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3D object detection #369
Comments
the model takes one image as an input. You can process your multiple image sequentially, but then they wouldn't share any information. There is probably better models out there for that, but it could still be interesting to try |
That's certainly possible. @hoangsep We can work on this together. |
@ccharest93 are you aware of any better model for this task? I am a total noob so I am not sure how this can be done. I wonder how companies like Tesla do 3D object detection. I am thinking of something like stitching multiple camera image together (maybe side by side) and run them through the network? Or have multiple networks running in parallel, then take all the output (from 1 of the top layers) and pass them though a second network. |
@dingkwang I would love to. I am a total noob so I probably won't be able to do much, but I would love to explore this. |
I haven't looked at 3D models, you would probably need something more than stitching. Models are great at learning but you want to give them as much prior information as possible. Stitching two images together kinda defeats that purpose, since the model would have to learn to unstitch them first (not to mention the poor scaling as image number increases; transformer networks dont scale linearly with input size). I do like the idea of first passing each image through a normal model like dino and then doing something with the resulting patch embeddings so to create information channels between similar patches. As for the exact architecture, thats something youd have to figure out yourself. I good starting point would be setting up this model in inference mode, passing your image sets through it and then doing statistical analysis on the resulting patch embeddings |
What do you guys think about using multiple cameras with dinov2 for 3D object detection for robotics? Does it make sense?
The text was updated successfully, but these errors were encountered: