New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. #2
Comments
Have you solved this problem? |
same here when running on cuda on linux |
Sorry, cannot reproduce this error on Linux |
I'm working on AWS EC2 type I ran:
And I got:
P.S. It works if I run with CPU, and also it works on this VM with YOLOV5-STRONGSORT. THX (: |
python track.py --source v.mp4 --yolo-weights yolov7-e6e.pt --img 1280 --device 0 |
@mikel-brostrom Same error. Did you checked it with aws ec2 g4dn? (If you want, we can schedule like half hour zoom to try to fix it.) |
Have not tried to deploy this on any cloud platform. I am available 11-12AM CET tomorrow. Otherwise, Wednesday 8-12. |
I solved the problem. |
modify self.numpy() to self.cpu().numpy() |
you can try it |
@Zhengzhiyang0000 yeah! now it works. @mikel-brostrom do you know how to fix it in the code? (If you want I'm available tomorrow - you can set half hour in google calendar - yagelgen@gmail.com) |
Your fix is within torch @Zhengzhiyang0000? That is wierd |
instead of
But, it's too slow. |
I fixed it by changing stride = model.stride.max() to stride = int(model.stride.max()) in track.py line 105 and also removing the .cpu().numpy() in the same file StrongSORT is still very slow in itself so I see no application for it in real time scenarios (~ 0.1 seconds for just strongsort per frame on a 1660ti mobile while my custom trained yolov7 tiny needs an order of magnitude less than that. ).. |
I achieve the following inference times on my webcam with a modest Quadro P2000. Which is way below a 1660ti in terms of specs @Jimmeimetis. Yolov5s.pt + mobilenetv2_x1_0_msmt17.pt 0: 480x640 1 person, 3 cars, Done. YOLO:(0.024s), StrongSORT:(0.047s)
0: 480x640 1 person, 5 cars, Done. YOLO:(0.019s), StrongSORT:(0.031s)
0: 480x640 1 person, 5 cars, Done. YOLO:(0.018s), StrongSORT:(0.032s)
0: 480x640 1 person, 5 cars, Done. YOLO:(0.019s), StrongSORT:(0.030s)
0: 480x640 1 person, 4 cars, Done. YOLO:(0.018s), StrongSORT:(0.027s)
0: 480x640 1 person, 4 cars, Done. YOLO:(0.018s), StrongSORT:(0.027s)
0: 480x640 1 person, 4 cars, Done. YOLO:(0.019s), StrongSORT:(0.025s)
Yolov5s.engine + mobilenetv2_x1_0_msmt17.engine 0: 640x640 1 class0, 2 class2s, Done. YOLO:(0.018s), StrongSORT:(0.018s)
0: 640x640 1 class0, 3 class2s, Done. YOLO:(0.019s), StrongSORT:(0.020s)
0: 640x640 1 class0, 3 class2s, Done. YOLO:(0.017s), StrongSORT:(0.020s)
0: 640x640 1 class0, 3 class2s, Done. YOLO:(0.019s), StrongSORT:(0.020s)
0: 640x640 1 class0, 2 class2s, Done. YOLO:(0.018s), StrongSORT:(0.017s)
0: 640x640 1 class0, 2 class2s, Done. YOLO:(0.018s), StrongSORT:(0.016s)
0: 640x640 1 class0, 2 class2s, Done. YOLO:(0.018s), StrongSORT:(0.017s)
0: 640x640 1 class0, 2 class2s, Done. YOLO:(0.017s), StrongSORT:(0.017s)
Notice that my main work is in my Yolov5StrongSORT repo which is currently ahead of Yolov7StrongSORT. |
These look much more reasonable given the GFLOPS of the models used in StrongSORT, lots of weird behavior on my turing GPU (1660ti) compared to my pascal one (1070). Cuda 11 makes my 1660ti detect nothing on yolov7 and on cuda 10.2 that im running as a workaround , fp16 is significantly slower vs fp32 . Also thanks for letting me know about your work on the yolov5 repo. Will test it later! |
Ok tested it, StrongSORT run time is proper on the yolov5 repo so I will use that implementation or v7. Lastly, just disabling half precision on my cuda11 environment with the 1660ti seems to do the trick inference wise (it now detects). Will test it on a 3090 soon enough in attempt to try and find the culprit. Thanks! |
Notice that the more detection you have the longer time it will take for StrongSORT to finish the association process. Btw, I don't think 1660ti supports half precision inference... |
It does and the issue is likely some poor interaction with pytorch/cuda. Even if it didn't support accelerated fp16 at 2x the rate of fp32 the performance should have been roughly the same and not degraded ~10x like it is on my side. I will get to the bottom of this eventually but its not a priority right now. Thanks and have a good night |
👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs. |
@Jimmeimetis have you found the culprit of this issue, i am using 1660 too and strongsort process 0.2s per frame, pretty slow |
@NQHuy1905 I ported the strongsort tracker from the v5 repo to the v7 and the execution times lined up with the v5 ones. That being said, while it was able to run in real time using a very fast inference model I did not consider it being worth using over deepsort due to the higher execution time as is(even used significantly smaller models for strong sort and still it wasn't good enough for my standards).. The porting i did of the code + testing actually took place the following day from my last post here. I did it as fast as possible to get the results i needed so the changes are somewhat poorly made. Either way if you want to try it, I can try uploading the project somewhere this weekend |
@Jimmeimetis So you mean the reason of high execution time is because strongsort. I haven't try deepsort with yolov7 but have you tried and did execution time is lower? I tried tracker with v5 and v7 repo of smaller yolo and strongsort models and it wasn't good enough for my standards too |
@NQHuy1905 Yes I have been running yolo v7 and v8 with deepsort. It has its own problems but at this point I don't have the time yet to dive into other trackers. There are public repos out there that have paired v7 with deepsort if you want to try |
Search before asking
Yolov7_StrongSORT_OSNet Component
Tracking
Bug
(pytorch1.7) PS D:\Github\Yolov7_StrongSORT_OSNet> python track.py --source .\test.mp4 --strong-sort-weights osnet_x0_25_market1501.pt
D:\Github\Yolov7_StrongSORT_OSNet\strong_sort/deep/reid\torchreid\metrics\rank.py:11: UserWarning: Cython evaluation (very fast so highly recommended) is unavailable, now use python evaluation.
warnings.warn(
Fusing layers...
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block
Model: osnet_x0_25
Successfully loaded pretrained weights from "osnet_x0_25_market1501.pt"
** The following layers are discarded due to unmatched keys or layer size: ['classifier.weight', 'classifier.bias']
(1, 256, 128, 3)
img = letterbox(img0, self.img_size, stride=self.stride)[0]
File "D:\Github\Yolov7_StrongSORT_OSNet\yolov7\utils\datasets.py", line 1000, in letterbox
dw, dh = np.mod(dw, stride), np.mod(dh, stride) # wh padding
File "C:\Users\xgx\Anaconda3\envs\pytorch1.7\lib\site-packages\torch\tensor.py", line 630, in array
return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
Environment
v1.0
osnet_x0_25_market1501
windows 10 64bit
python 3.8
pytorch 1.7.1 + cu101
Minimal Reproducible Example
python track.py --source .\test.mp4 --strong-sort-weights osnet_x0_25_market1501.pt
The text was updated successfully, but these errors were encountered: