TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. #2

xugaoxiang · 2022-08-11T14:10:33Z

Search before asking

I have searched the Yolov7_StrongSORT_OSNet issues and discussions and found no similar questions.

Yolov7_StrongSORT_OSNet Component

Tracking

Bug

(pytorch1.7) PS D:\Github\Yolov7_StrongSORT_OSNet> python track.py --source .\test.mp4 --strong-sort-weights osnet_x0_25_market1501.pt
D:\Github\Yolov7_StrongSORT_OSNet\strong_sort/deep/reid\torchreid\metrics\rank.py:11: UserWarning: Cython evaluation (very fast so highly recommended) is unavailable, now use python evaluation.
warnings.warn(
Fusing layers...
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block
Model: osnet_x0_25

params: 203,568
flops: 82,316,000
Successfully loaded pretrained weights from "osnet_x0_25_market1501.pt"
** The following layers are discarded due to unmatched keys or layer size: ['classifier.weight', 'classifier.bias']
(1, 256, 128, 3)
img = letterbox(img0, self.img_size, stride=self.stride)[0]
File "D:\Github\Yolov7_StrongSORT_OSNet\yolov7\utils\datasets.py", line 1000, in letterbox
dw, dh = np.mod(dw, stride), np.mod(dh, stride) # wh padding
File "C:\Users\xgx\Anaconda3\envs\pytorch1.7\lib\site-packages\torch\tensor.py", line 630, in array
return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Environment

v1.0
osnet_x0_25_market1501
windows 10 64bit
python 3.8
pytorch 1.7.1 + cu101

Minimal Reproducible Example

python track.py --source .\test.mp4 --strong-sort-weights osnet_x0_25_market1501.pt

Zhengzhiyang0000 · 2022-08-12T02:29:34Z

Have you solved this problem？

yagelgen · 2022-08-14T14:54:53Z

same here when running on cuda on linux

mikel-brostrom · 2022-08-14T19:19:31Z

Sorry, cannot reproduce this error on Linux

yagelgen · 2022-08-14T20:00:02Z

I'm working on AWS EC2 type g4dn.xlarge.

I ran:

python track.py --source v.mp4 --yolo-weights yolov7-e6e.pt --img 1280

And I got:

Downloading https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-e6e.pt to yolov7-e6e.pt...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 290M/290M [00:19<00:00, 15.4MB/s]

Fusing layers... 
Downloading...
From: https://drive.google.com/uc?id=1Kkx2zW89jq_NETu4u42CFZTMVD5Hwm6e
To: /home/ec2-user/Yolov7_StrongSORT_OSNet/weights/osnet_x0_25_msmt17.pt
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.34M/9.34M [00:00<00:00, 17.9MB/s]
Model: osnet_x0_25
- params: 203,568
- flops: 82,316,000
Successfully loaded pretrained weights from "/home/ec2-user/Yolov7_StrongSORT_OSNet/weights/osnet_x0_25_msmt17.pt"
** The following layers are discarded due to unmatched keys or layer size: ['classifier.weight', 'classifier.bias']
(1, 256, 128, 3)
video 1/1 (1/1100) /home/ec2-user/Yolov7_StrongSORT_OSNet/v.mp4: Traceback (most recent call last):
  File "/home/ec2-user/Yolov7_StrongSORT_OSNet/track.py", line 332, in <module>
    main(opt)
  File "/home/ec2-user/Yolov7_StrongSORT_OSNet/track.py", line 327, in main
    run(**vars(opt))
  File "/home/ec2-user/.local/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/ec2-user/Yolov7_StrongSORT_OSNet/track.py", line 149, in run
    for frame_idx, (path, im, im0s, vid_cap) in enumerate(dataset):
  File "/home/ec2-user/Yolov7_StrongSORT_OSNet/yolov7/utils/datasets.py", line 191, in __next__
    img = letterbox(img0, self.img_size, stride=self.stride)[0]
  File "/home/ec2-user/Yolov7_StrongSORT_OSNet/yolov7/utils/datasets.py", line 1000, in letterbox
    dw, dh = np.mod(dw, stride), np.mod(dh, stride)  # wh padding
  File "/home/ec2-user/.local/lib/python3.9/site-packages/torch/_tensor.py", line 732, in __array__
    return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

P.S. It works if I run with CPU, and also it works on this VM with YOLOV5-STRONGSORT.

THX (:

mikel-brostrom · 2022-08-15T06:06:39Z

git pull and try again, please @yagelgen . Let's if I manged to fix it now. Still can't reproduce this behavior on a newly cloned repo by:

python track.py --source v.mp4 --yolo-weights yolov7-e6e.pt --img 1280 --device 0

yagelgen · 2022-08-15T08:20:13Z

@mikel-brostrom Same error. Did you checked it with aws ec2 g4dn?

(If you want, we can schedule like half hour zoom to try to fix it.)

mikel-brostrom · 2022-08-15T13:17:59Z

Have not tried to deploy this on any cloud platform. I am available 11-12AM CET tomorrow. Otherwise, Wednesday 8-12.

Zhengzhiyang0000 · 2022-08-15T13:32:50Z

I solved the problem.
You can try under this file File "/home/ec2-user/.local/lib/python3.9/site-packages/torch/_tensor.py", line 732, in array
return self.numpy()
modify self.numpy() to self.cpu().numpy

Zhengzhiyang0000 · 2022-08-15T13:33:53Z

modify self.numpy() to self.cpu().numpy()
After I revised it, there was no error reported

Zhengzhiyang0000 · 2022-08-15T13:34:15Z

you can try it

yagelgen · 2022-08-15T13:53:45Z

@Zhengzhiyang0000 yeah! now it works.

@mikel-brostrom do you know how to fix it in the code?

(If you want I'm available tomorrow - you can set half hour in google calendar - yagelgen@gmail.com)

mikel-brostrom · 2022-08-15T14:14:15Z

I solved the problem.
You can try under this file File "/home/ec2-user/.local/lib/python3.9/site-packages/torch/_tensor.py", line 732, in array
return self.numpy()
modify self.numpy() to self.cpu().numpy

Your fix is within torch @Zhengzhiyang0000? That is wierd

xugaoxiang · 2022-08-15T14:57:53Z

@mikel-brostrom

dataset = LoadImages(source, img_size=imgsz, stride=stride.cpu().numpy())

instead of

dataset = LoadImages(source, img_size=imgsz, stride=stride)

But, it's too slow.

Jimmeimetis · 2022-09-11T10:45:09Z

I fixed it by changing

stride = model.stride.max()

to

stride = int(model.stride.max())

in track.py line 105 and also removing the .cpu().numpy() in the same file

StrongSORT is still very slow in itself so I see no application for it in real time scenarios (~ 0.1 seconds for just strongsort per frame on a 1660ti mobile while my custom trained yolov7 tiny needs an order of magnitude less than that. )..

mikel-brostrom · 2022-09-11T13:11:39Z

I achieve the following inference times on my webcam with a modest Quadro P2000. Which is way below a 1660ti in terms of specs @Jimmeimetis.

Yolov5s.pt + mobilenetv2_x1_0_msmt17.pt

0: 480x640 1 person, 3 cars, Done. YOLO:(0.024s), StrongSORT:(0.047s)
0: 480x640 1 person, 5 cars, Done. YOLO:(0.019s), StrongSORT:(0.031s)
0: 480x640 1 person, 5 cars, Done. YOLO:(0.018s), StrongSORT:(0.032s)
0: 480x640 1 person, 5 cars, Done. YOLO:(0.019s), StrongSORT:(0.030s)
0: 480x640 1 person, 4 cars, Done. YOLO:(0.018s), StrongSORT:(0.027s)
0: 480x640 1 person, 4 cars, Done. YOLO:(0.018s), StrongSORT:(0.027s)
0: 480x640 1 person, 4 cars, Done. YOLO:(0.019s), StrongSORT:(0.025s)

~20FPS

Yolov5s.engine + mobilenetv2_x1_0_msmt17.engine

0: 640x640 1 class0, 2 class2s, Done. YOLO:(0.018s), StrongSORT:(0.018s)
0: 640x640 1 class0, 3 class2s, Done. YOLO:(0.019s), StrongSORT:(0.020s)
0: 640x640 1 class0, 3 class2s, Done. YOLO:(0.017s), StrongSORT:(0.020s)
0: 640x640 1 class0, 3 class2s, Done. YOLO:(0.019s), StrongSORT:(0.020s)
0: 640x640 1 class0, 2 class2s, Done. YOLO:(0.018s), StrongSORT:(0.017s)
0: 640x640 1 class0, 2 class2s, Done. YOLO:(0.018s), StrongSORT:(0.016s)
0: 640x640 1 class0, 2 class2s, Done. YOLO:(0.018s), StrongSORT:(0.017s)
0: 640x640 1 class0, 2 class2s, Done. YOLO:(0.017s), StrongSORT:(0.017s)

~27FPS

Notice that my main work is in my Yolov5StrongSORT repo which is currently ahead of Yolov7StrongSORT.

Jimmeimetis · 2022-09-11T16:48:20Z

These look much more reasonable given the GFLOPS of the models used in StrongSORT, lots of weird behavior on my turing GPU (1660ti) compared to my pascal one (1070). Cuda 11 makes my 1660ti detect nothing on yolov7 and on cuda 10.2 that im running as a workaround , fp16 is significantly slower vs fp32 .

Also thanks for letting me know about your work on the yolov5 repo. Will test it later!

Jimmeimetis · 2022-09-11T18:10:45Z

Ok tested it, StrongSORT run time is proper on the yolov5 repo so I will use that implementation or v7. Lastly, just disabling half precision on my cuda11 environment with the 1660ti seems to do the trick inference wise (it now detects).

Will test it on a 3090 soon enough in attempt to try and find the culprit. Thanks!

mikel-brostrom · 2022-09-11T19:22:38Z

Notice that the more detection you have the longer time it will take for StrongSORT to finish the association process. Btw, I don't think 1660ti supports half precision inference...

Jimmeimetis · 2022-09-11T21:31:05Z

Btw, I don't think 1660ti supports half precision inference...

It does and the issue is likely some poor interaction with pytorch/cuda. Even if it didn't support accelerated fp16 at 2x the rate of fp32 the performance should have been roughly the same and not degraded ~10x like it is on my side. I will get to the bottom of this eventually but its not a priority right now.

https://www.nvidia.com/en-us/geforce/news/geforce-gtx-1660-ti-advanced-shaders-streaming-multiprocessor/

Thanks and have a good night

github-actions · 2022-10-12T02:20:33Z

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

NQHuy1905 · 2023-02-02T15:17:40Z

@Jimmeimetis have you found the culprit of this issue, i am using 1660 too and strongsort process 0.2s per frame, pretty slow

Jimmeimetis · 2023-02-02T16:42:49Z

@NQHuy1905 I ported the strongsort tracker from the v5 repo to the v7 and the execution times lined up with the v5 ones. That being said, while it was able to run in real time using a very fast inference model I did not consider it being worth using over deepsort due to the higher execution time as is(even used significantly smaller models for strong sort and still it wasn't good enough for my standards)..

The porting i did of the code + testing actually took place the following day from my last post here. I did it as fast as possible to get the results i needed so the changes are somewhat poorly made.

Either way if you want to try it, I can try uploading the project somewhere this weekend

NQHuy1905 · 2023-02-03T20:06:18Z

@Jimmeimetis So you mean the reason of high execution time is because strongsort. I haven't try deepsort with yolov7 but have you tried and did execution time is lower? I tried tracker with v5 and v7 repo of smaller yolo and strongsort models and it wasn't good enough for my standards too

Jimmeimetis · 2023-02-03T20:20:36Z

@NQHuy1905 Yes I have been running yolo v7 and v8 with deepsort. It has its own problems but at this point I don't have the time yet to dive into other trackers. There are public repos out there that have paired v7 with deepsort if you want to try

xugaoxiang added the bug Something isn't working label Aug 11, 2022

github-actions bot added the Stale label Oct 12, 2022

github-actions bot closed this as completed Oct 17, 2022

AT9991 mentioned this issue Jan 3, 2023

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. #38

Closed

1 task

TaplierShiru mentioned this issue Feb 8, 2023

fix: take stride as numpy from model in track.py #39

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. #2

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. #2

xugaoxiang commented Aug 11, 2022

Zhengzhiyang0000 commented Aug 12, 2022

yagelgen commented Aug 14, 2022

mikel-brostrom commented Aug 14, 2022

yagelgen commented Aug 14, 2022

mikel-brostrom commented Aug 15, 2022

yagelgen commented Aug 15, 2022

mikel-brostrom commented Aug 15, 2022

Zhengzhiyang0000 commented Aug 15, 2022

Zhengzhiyang0000 commented Aug 15, 2022

Zhengzhiyang0000 commented Aug 15, 2022

yagelgen commented Aug 15, 2022

mikel-brostrom commented Aug 15, 2022

xugaoxiang commented Aug 15, 2022 •

edited

Jimmeimetis commented Sep 11, 2022

mikel-brostrom commented Sep 11, 2022 •

edited

Jimmeimetis commented Sep 11, 2022

Jimmeimetis commented Sep 11, 2022

mikel-brostrom commented Sep 11, 2022 •

edited

Jimmeimetis commented Sep 11, 2022

github-actions bot commented Oct 12, 2022

NQHuy1905 commented Feb 2, 2023

Jimmeimetis commented Feb 2, 2023

NQHuy1905 commented Feb 3, 2023 •

edited

Jimmeimetis commented Feb 3, 2023

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. #2

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. #2

Comments

xugaoxiang commented Aug 11, 2022

Search before asking

Yolov7_StrongSORT_OSNet Component

Bug

Environment

Minimal Reproducible Example

Zhengzhiyang0000 commented Aug 12, 2022

yagelgen commented Aug 14, 2022

mikel-brostrom commented Aug 14, 2022

yagelgen commented Aug 14, 2022

mikel-brostrom commented Aug 15, 2022

yagelgen commented Aug 15, 2022

mikel-brostrom commented Aug 15, 2022

Zhengzhiyang0000 commented Aug 15, 2022

Zhengzhiyang0000 commented Aug 15, 2022

Zhengzhiyang0000 commented Aug 15, 2022

yagelgen commented Aug 15, 2022

mikel-brostrom commented Aug 15, 2022

xugaoxiang commented Aug 15, 2022 • edited

Jimmeimetis commented Sep 11, 2022

mikel-brostrom commented Sep 11, 2022 • edited

Jimmeimetis commented Sep 11, 2022

Jimmeimetis commented Sep 11, 2022

mikel-brostrom commented Sep 11, 2022 • edited

Jimmeimetis commented Sep 11, 2022

github-actions bot commented Oct 12, 2022

NQHuy1905 commented Feb 2, 2023

Jimmeimetis commented Feb 2, 2023

NQHuy1905 commented Feb 3, 2023 • edited

Jimmeimetis commented Feb 3, 2023

xugaoxiang commented Aug 15, 2022 •

edited

mikel-brostrom commented Sep 11, 2022 •

edited

mikel-brostrom commented Sep 11, 2022 •

edited

NQHuy1905 commented Feb 3, 2023 •

edited