-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possibility of reducing the acquisition of embeddings of objects. #1430
Comments
This is a valid point @ozayr. It highly depends on the deployment environment. Shadows/no shadows being cast in the environment, appearance from one angle may look different from another (lets say somebody carries a backpack), how objects move within the environment (they may move away from the camera) changing the amount of details that the camera can take up due to resolution... Lets take up a specific example. If somebody enters the field of view of the camera where a heavy shadow is cast, what is visible about this person may be half a body, let's also assume that this person carries a backpack. After walking a bit, this person with full body visible may turn around facing the camera. Then the first captured embedding would not be representable of this person at all. |
I'm thinking, once an object say the person with the back pack has been assigned an ID
|
👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs. |
Dropping some ideas here:
For every n frames, we rely entirely on motion-based tracking algorithms (such as Kalman Filter, optical flow, or other predictive models) to estimate the object's location. After n frames, we generate a new embedding to revalidate the identity of the object. This approach can significantly reduce the computational load, as embedding generation is usually the most resource-intensive part in tracking. The risk here is that if the object's appearance changes significantly within those n frames (due to occlusion, lighting changes, or orientation changes), the tracker might lose accuracy.
Introduce a lightweight neural network model to perform quick re-identification checks. This network can be less complex than the main embedding generator but sufficient to catch obvious mismatches.
Maintain a buffer of recent embeddings and motion vectors. Use these historical data points to ensure that the object’s identity remains consistent over time, reducing the need for constant re-embedding. Each of these approaches comes with trade-offs in terms of complexity, computational savings, and potential loss of accuracy. It’s important to benchmark these methods in the specific deployment environment to understand their impact fully. Experimenting with a combination of these strategies might yield the best results in balancing efficiency and accuracy. |
I have also seen that Nvidia have dropped SV3DT this is something I have been thinking about for a while, occlusions are my worsed enemy. if one uses something like UCMC track uses to estimate the camera parameters and then track based on projections to the ground plane, this should also help with tracker accuracy assuming all objects one would like to track are confined to the same ground plane which most of the time is the case. |
Yes, it would be interesting to provide the option to feed a camera configuration file in order to convert the 2D object to 3D. Then it would be possible to do motion tracking on the ground plane, which according to UCMC is more reliable. |
Search before asking
Question
Just a thought, I wonder if it's possible to not have to generate embeddings for objects on each frame if we have some kind of certainty that this object is the object tracked from the previous frame. not sure if what im saying makes sense. but this could significantly increase the speed of the tracking part of the pipeline.
The text was updated successfully, but these errors were encountered: