Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibility of reducing the acquisition of embeddings of objects. #1430

Open
1 task done
ozayr opened this issue May 12, 2024 · 6 comments
Open
1 task done

Possibility of reducing the acquisition of embeddings of objects. #1430

ozayr opened this issue May 12, 2024 · 6 comments
Labels
question Further information is requested

Comments

@ozayr
Copy link

ozayr commented May 12, 2024

Search before asking

  • I have searched the Yolo Tracking issues and found no similar bug report.

Question

Just a thought, I wonder if it's possible to not have to generate embeddings for objects on each frame if we have some kind of certainty that this object is the object tracked from the previous frame. not sure if what im saying makes sense. but this could significantly increase the speed of the tracking part of the pipeline.

@ozayr ozayr added the question Further information is requested label May 12, 2024
@mikel-brostrom
Copy link
Owner

mikel-brostrom commented May 12, 2024

This is a valid point @ozayr. It highly depends on the deployment environment. Shadows/no shadows being cast in the environment, appearance from one angle may look different from another (lets say somebody carries a backpack), how objects move within the environment (they may move away from the camera) changing the amount of details that the camera can take up due to resolution...

Lets take up a specific example. If somebody enters the field of view of the camera where a heavy shadow is cast, what is visible about this person may be half a body, let's also assume that this person carries a backpack. After walking a bit, this person with full body visible may turn around facing the camera. Then the first captured embedding would not be representable of this person at all.

@ozayr
Copy link
Author

ozayr commented May 18, 2024

I'm thinking, once an object say the person with the back pack has been assigned an ID

  • Could there be a faster way to reaffirm that ID(smaller network)
  • Is it possible to just make an assumption and skip generating embeddings every n frames ie use
    Motion completely for every n frame

Copy link

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

@github-actions github-actions bot added the Stale label May 29, 2024
@mikel-brostrom
Copy link
Owner

mikel-brostrom commented May 29, 2024

Dropping some ideas here:

  • Periodic Embeddings:

For every n frames, we rely entirely on motion-based tracking algorithms (such as Kalman Filter, optical flow, or other predictive models) to estimate the object's location. After n frames, we generate a new embedding to revalidate the identity of the object. This approach can significantly reduce the computational load, as embedding generation is usually the most resource-intensive part in tracking. The risk here is that if the object's appearance changes significantly within those n frames (due to occlusion, lighting changes, or orientation changes), the tracker might lose accuracy.

  • Lightweight Reaffirmation Network:

Introduce a lightweight neural network model to perform quick re-identification checks. This network can be less complex than the main embedding generator but sufficient to catch obvious mismatches.
Fallback Mechanism: If the lightweight model signals a potential mismatch, the system can fall back to generating a full embedding for a more thorough check.

  • Temporal Consistency Checks:

Maintain a buffer of recent embeddings and motion vectors. Use these historical data points to ensure that the object’s identity remains consistent over time, reducing the need for constant re-embedding.

Each of these approaches comes with trade-offs in terms of complexity, computational savings, and potential loss of accuracy. It’s important to benchmark these methods in the specific deployment environment to understand their impact fully. Experimenting with a combination of these strategies might yield the best results in balancing efficiency and accuracy.

@ozayr
Copy link
Author

ozayr commented May 29, 2024

I have also seen that Nvidia have dropped SV3DT this is something I have been thinking about for a while, occlusions are my worsed enemy.

if one uses something like UCMC track uses to estimate the camera parameters and then track based on projections to the ground plane, this should also help with tracker accuracy assuming all objects one would like to track are confined to the same ground plane which most of the time is the case.

@mikel-brostrom
Copy link
Owner

mikel-brostrom commented May 29, 2024

Yes, it would be interesting to provide the option to feed a camera configuration file in order to convert the 2D object to 3D. Then it would be possible to do motion tracking on the ground plane, which according to UCMC is more reliable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants