-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use CLIP-ReID as feature extractor? #10
Comments
|
Thanks for the example code @awarebayes! Really appreciate it |
We are working on some fixes on the repo. We have found it makes metrics worse, but separation with cosine distances is way better. Can show you the graphs of average distances between same vs others |
Have you visualized the feature learning process? Would make sense if the features tend to cluster like beams shooting out of origo. Would be enlightening if you could share the graphs ✨ |
Let me know when this is in usable state 😄 . I have been trying to get the Traceback (most recent call last):
File "/home/mikel.brostrom/CLIP-ReID/feature_extraction_test.py", line 52, in <module>
main()
File "/home/mikel.brostrom/CLIP-ReID/feature_extraction_test.py", line 40, in main
model.load_param(model_path)
File "/home/mikel.brostrom/CLIP-ReID/model/make_model.py", line 121, in load_param
self.state_dict()[i.replace('module.', '')].copy_(param_dict[i])
RuntimeError: The size of tensor a (2) must match the size of tensor b (15) at non-singleton dimension 0 You can reproduce this with: import os
import torch
import yaml
import argparse
from config import cfg_base as cfg
from model.make_model import make_model
def forward_override(self, x: torch.Tensor, cv_emb = None, old_forward = None):
_, image_features, image_features_proj = old_forward(x, cv_emb)
return torch.cat([image_features[:,0], image_features_proj[:,0]], dim=1)
def main():
parser = argparse.ArgumentParser(description="ReID Baseline Training")
parser.add_argument(
"--config_file",
default="/home/mikel.brostrom/CLIP-ReID/MSMT17_clipreid_12x12sie_ViT-B-16_60_test_log.yml",
help="path to config file",
type=str
)
parser.add_argument(
"opts",
help="Modify config options using the command-line",
default=None,
nargs=argparse.REMAINDER
)
args = parser.parse_args()
if args.config_file != "":
cfg.merge_from_file(args.config_file)
cfg.merge_from_list(args.opts)
cfg.freeze()
model = make_model(cfg, num_class=1501, camera_num=2, view_num = 1)
model_path = 'MSMT17_clipreid_12x12sie_ViT-B-16_60.pth'
batch_size= 1
print(f"Loading model for eval from {model_path}. Batch size = {batch_size}")
model.load_param(model_path)
model = model.image_encoder
old_forward = model.forward
model.forward = lambda *args, **kwargs: forward_override(model, old_forward=old_forward, *args, **kwargs)
device = torch.device('cuda:0')
model = model.eval().to(device)
main() Had to delete some stuff from the configs as well to get to the weight loading part... |
You can override config parameters as:
|
|
They have a bottleneck of size 2, which they visualize. I dont think we can have that with person reid
|
Thx, but it seems that the loaded |
Tested with ViT-CLIP-ReID Market With that you can keep config the default one
|
I changed the following content in: config/defaults.py -_C.MODEL.NAME = 'resnet50'
+_C.MODEL.NAME = 'ViT-B-16' and adapted
but get the following error Loading model for eval from /home/mikel.brostrom/CLIP-ReID/MSMT17_clipreid_12x12sie_ViT-B-16_60.pth. Batch size = 1
Traceback (most recent call last):
File "/home/mikel.brostrom/CLIP-ReID/feature_extraction_test.py", line 73, in <module>
main()
File "/home/mikel.brostrom/CLIP-ReID/feature_extraction_test.py", line 48, in main
model.load_param(model_path)
File "/home/mikel.brostrom/CLIP-ReID/model/make_model.py", line 121, in load_param
self.state_dict()[i.replace('module.', '')].copy_(param_dict[i])
KeyError: 'cv_embed' |
I changed load_param to be
|
Ok, think I am loading the SIE-OLP model. Let me try the one you linked |
Loading Loading model for eval from /home/mikel.brostrom/CLIP-ReID/Market1501_clipreid_ViT-B-16_60.pth. Batch size = 1
Traceback (most recent call last):
File "/home/mikel.brostrom/CLIP-ReID/feature_extraction_test.py", line 73, in <module>
main()
File "/home/mikel.brostrom/CLIP-ReID/feature_extraction_test.py", line 48, in main
model.load_param(model_path)
File "/home/mikel.brostrom/CLIP-ReID/model/make_model.py", line 121, in load_param
self.state_dict()[i.replace('module.', '')].copy_(param_dict[i])
RuntimeError: The size of tensor a (193) must match the size of tensor b (129) at non-singleton dimension 0 |
Still same issue after updating |
Hmm, which key is it? |
Sorry, no telegram, only whatsapp 😞 |
Added a print in
It stops at
|
I believe config is to blame:
Config I load with is https://gist.github.com/awarebayes/271beb52cabc9cf0bc77f592764e1b62, maybe change it. |
Stride is 16 in both confs... |
At the end of the day I have these values here: |
By the way I noticed ViT with tensorrt + fp16 does not work any slower than resnet50 |
ok. Fixed. # Size of the image during training
_C.INPUT.SIZE_TRAIN = [256, 128]
# Size of the image during test
_C.INPUT.SIZE_TEST = [256, 128] instead of # Size of the image during training
_C.INPUT.SIZE_TRAIN = [384, 128]
# Size of the image during test
_C.INPUT.SIZE_TEST = [384, 128] |
Thx for you patience 😄. |
I am kind of surprised by the inference time. For a |
|
Are you responsible for this repo now @awarebayes? If this is the case would you be okay with me implementing these models here: https://github.com/mikel-brostrom/yolo_tracking. @Syliz517 ? I intend to use them for associating detections across frames both by visual means only, as well as motion plus appearance information. |
Any slower than the regular |
Clearly cosine distance separates the embeddings better. I am however not entirely sure why the large tails on the distributions. |
I can help you with implementing clip reid into yolo tracking. |
I have it working in real-time now |
git clone https://github.com/mikel-brostrom/yolo_tracking
git checkout clip-reid
python examples/track.py --reid-model clip_market1501.pt
# or
python examples/track.py --reid-model clip_duke.pt haven't evaluated on any MOT dataset yet |
I dont know if keeping this guys repo issues section as a communication medium is a good idea, but whatever. https://paperswithcode.com/task/video-based-person-re-identification They basically work the same way, but during learning, they take an average of the tracklet's features instead of individual image's. |
I think I have telegram, just uninstalled it because didn't use it. Let me check
Interesting, never heard of this! Could be quite heavy though for real-time as crop-outs are stacked. This is however handled to some extent in modern multi-object trackers as individual feature-maps are averaged over several frames. Like here: |
Of course, I notice you had some problems before, have you resolved them now? |
Thanks for your response @Syliz517! @awarebayes helped me out to get it working 😄 |
Hi @awarebayes , |
I would suggest changing the size, seeing where it fails, and debugging from there |
Hi! Would first of all like to know whether your are okay with me implementing these models here: https://github.com/mikel-brostrom/yolo_tracking. Then I would also like to know if there is any easy way of extracting features with these models. Keep up the great work!
The text was updated successfully, but these errors were encountered: