Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cosine similarity between CLIP-Reid features of this repo and original repo #1288

Closed
1 task done
sourabh-patil opened this issue Feb 6, 2024 · 2 comments
Closed
1 task done
Labels
question Further information is requested

Comments

@sourabh-patil
Copy link

Search before asking

  • I have searched the Yolo Tracking issues and found no similar bug report.

Question

Hi! Thanks for sharing this awesome work. I wanted to improve the features of CLIP-Reid model. The reason is that it is trained on Market1501 dataset which has similar lighting conditions for different instances of the same person. But when we try to reid a person from different lighting conditions (as the camera setup is some other area), sometimes it fails. So, I wanted to fine-tune the CLIP model for our dataset. But before doing so, just for a sanity check, I checked features from the CLIP model used in your repo and features from the CLIP original repo. There was a huge difference in terms of cosine similarity (calculated between different IDs). The features from this repo were much better (there was a considerable gap between the same IDs and different IDs) as compared to the features from the original repo (the gap was very low). So, my question is, did you retrain or fintune the original CLIP model or directly use it as it is (I supposed that you used it as it is)? Also, do you have any suggestions or comments for the real-world problem that we face (different lighting conditions for the same person id) while doing reid on people?

@sourabh-patil sourabh-patil added the question Further information is requested label Feb 6, 2024
@mikel-brostrom
Copy link
Owner

mikel-brostrom commented Feb 6, 2024

So, my question is, did you retrain or fintune the original CLIP model or directly use it as it is

I used it as it is.

There was a huge difference in terms of cosine similarity (calculated between different IDs). The features from this repo were much better (there was a considerable gap between the same IDs and different IDs) as compared to the features from the original repo (the gap was very low)

The only thing that I can think of is that the features are always normalized when inferring with the ReID models in this repo, as seen here:

https://github.com/mikel-brostrom/yolo_tracking/blob/df424189f658dfeecac16eb67a816bc987271dfa/boxmot/appearance/reid_multibackend.py#L310

but this should not affect the cosine similarity as pre-normalizing features to unit length does not affect the outcome. Cosine similarity measures the cosine of the angle between two vectors, providing a similarity score based on their orientation in space, rather than their size or length.

Also, do you have any suggestions or comments for the real-world problem that we face (different lighting conditions for the same person id) while doing reid on people?

The best results are always achieved by fine-tuning for your specific use-case

@sourabh-patil
Copy link
Author

Thanks for the reply. As you rightly said cosine similarity should not be affected by normalization, but anyway I am comparing the normalized features from both models. Need to check the way I am getting the embeddings, I suppose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants