Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first. #2

Closed
1 task done
xugaoxiang opened this issue Aug 11, 2022 · 24 comments
Labels
bug Something isn't working Stale

Comments

@xugaoxiang
Copy link

Search before asking

  • I have searched the Yolov7_StrongSORT_OSNet issues and discussions and found no similar questions.

Yolov7_StrongSORT_OSNet Component

Tracking

Bug

(pytorch1.7) PS D:\Github\Yolov7_StrongSORT_OSNet> python track.py --source .\test.mp4 --strong-sort-weights osnet_x0_25_market1501.pt
D:\Github\Yolov7_StrongSORT_OSNet\strong_sort/deep/reid\torchreid\metrics\rank.py:11: UserWarning: Cython evaluation (very fast so highly recommended) is unavailable, now use python evaluation.
warnings.warn(
Fusing layers...
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block
Model: osnet_x0_25

  • params: 203,568
  • flops: 82,316,000
    Successfully loaded pretrained weights from "osnet_x0_25_market1501.pt"
    ** The following layers are discarded due to unmatched keys or layer size: ['classifier.weight', 'classifier.bias']
    (1, 256, 128, 3)
    img = letterbox(img0, self.img_size, stride=self.stride)[0]
    File "D:\Github\Yolov7_StrongSORT_OSNet\yolov7\utils\datasets.py", line 1000, in letterbox
    dw, dh = np.mod(dw, stride), np.mod(dh, stride) # wh padding
    File "C:\Users\xgx\Anaconda3\envs\pytorch1.7\lib\site-packages\torch\tensor.py", line 630, in array
    return self.numpy()
    TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Environment

v1.0
osnet_x0_25_market1501
windows 10 64bit
python 3.8
pytorch 1.7.1 + cu101

Minimal Reproducible Example

python track.py --source .\test.mp4 --strong-sort-weights osnet_x0_25_market1501.pt

@xugaoxiang xugaoxiang added the bug Something isn't working label Aug 11, 2022
@Zhengzhiyang0000
Copy link

Have you solved this problem?

@yagelgen
Copy link

same here when running on cuda on linux

@mikel-brostrom
Copy link
Owner

Sorry, cannot reproduce this error on Linux

@yagelgen
Copy link

I'm working on AWS EC2 type g4dn.xlarge.

I ran:

python track.py --source v.mp4 --yolo-weights yolov7-e6e.pt --img 1280

And I got:

Downloading https://github.com/WongKinYiu/yolov7/releases/download/v0.1/yolov7-e6e.pt to yolov7-e6e.pt...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 290M/290M [00:19<00:00, 15.4MB/s]

Fusing layers... 
Downloading...
From: https://drive.google.com/uc?id=1Kkx2zW89jq_NETu4u42CFZTMVD5Hwm6e
To: /home/ec2-user/Yolov7_StrongSORT_OSNet/weights/osnet_x0_25_msmt17.pt
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9.34M/9.34M [00:00<00:00, 17.9MB/s]
Model: osnet_x0_25
- params: 203,568
- flops: 82,316,000
Successfully loaded pretrained weights from "/home/ec2-user/Yolov7_StrongSORT_OSNet/weights/osnet_x0_25_msmt17.pt"
** The following layers are discarded due to unmatched keys or layer size: ['classifier.weight', 'classifier.bias']
(1, 256, 128, 3)
video 1/1 (1/1100) /home/ec2-user/Yolov7_StrongSORT_OSNet/v.mp4: Traceback (most recent call last):
  File "/home/ec2-user/Yolov7_StrongSORT_OSNet/track.py", line 332, in <module>
    main(opt)
  File "/home/ec2-user/Yolov7_StrongSORT_OSNet/track.py", line 327, in main
    run(**vars(opt))
  File "/home/ec2-user/.local/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/home/ec2-user/Yolov7_StrongSORT_OSNet/track.py", line 149, in run
    for frame_idx, (path, im, im0s, vid_cap) in enumerate(dataset):
  File "/home/ec2-user/Yolov7_StrongSORT_OSNet/yolov7/utils/datasets.py", line 191, in __next__
    img = letterbox(img0, self.img_size, stride=self.stride)[0]
  File "/home/ec2-user/Yolov7_StrongSORT_OSNet/yolov7/utils/datasets.py", line 1000, in letterbox
    dw, dh = np.mod(dw, stride), np.mod(dh, stride)  # wh padding
  File "/home/ec2-user/.local/lib/python3.9/site-packages/torch/_tensor.py", line 732, in __array__
    return self.numpy()
TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

P.S. It works if I run with CPU, and also it works on this VM with YOLOV5-STRONGSORT.

THX (:

@mikel-brostrom
Copy link
Owner

git pull and try again, please @yagelgen . Let's if I manged to fix it now. Still can't reproduce this behavior on a newly cloned repo by:

python track.py --source v.mp4 --yolo-weights yolov7-e6e.pt --img 1280 --device 0

@yagelgen
Copy link

@mikel-brostrom Same error. Did you checked it with aws ec2 g4dn?

(If you want, we can schedule like half hour zoom to try to fix it.)

@mikel-brostrom
Copy link
Owner

Have not tried to deploy this on any cloud platform. I am available 11-12AM CET tomorrow. Otherwise, Wednesday 8-12.

@Zhengzhiyang0000
Copy link

I solved the problem.
You can try under this file File "/home/ec2-user/.local/lib/python3.9/site-packages/torch/_tensor.py", line 732, in array
return self.numpy()
modify self.numpy() to self.cpu().numpy

@Zhengzhiyang0000
Copy link

modify self.numpy() to self.cpu().numpy()
After I revised it, there was no error reported

@Zhengzhiyang0000
Copy link

you can try it

@yagelgen
Copy link

@Zhengzhiyang0000 yeah! now it works.

@mikel-brostrom do you know how to fix it in the code?

(If you want I'm available tomorrow - you can set half hour in google calendar - yagelgen@gmail.com)

@mikel-brostrom
Copy link
Owner

I solved the problem.
You can try under this file File "/home/ec2-user/.local/lib/python3.9/site-packages/torch/_tensor.py", line 732, in array
return self.numpy()
modify self.numpy() to self.cpu().numpy

Your fix is within torch @Zhengzhiyang0000? That is wierd

@xugaoxiang
Copy link
Author

xugaoxiang commented Aug 15, 2022

@mikel-brostrom

dataset = LoadImages(source, img_size=imgsz, stride=stride.cpu().numpy())

instead of

dataset = LoadImages(source, img_size=imgsz, stride=stride)

But, it's too slow.

@Jimmeimetis
Copy link

I fixed it by changing

stride = model.stride.max()

to

stride = int(model.stride.max())

in track.py line 105 and also removing the .cpu().numpy() in the same file

StrongSORT is still very slow in itself so I see no application for it in real time scenarios (~ 0.1 seconds for just strongsort per frame on a 1660ti mobile while my custom trained yolov7 tiny needs an order of magnitude less than that. )..

@mikel-brostrom
Copy link
Owner

mikel-brostrom commented Sep 11, 2022

I achieve the following inference times on my webcam with a modest Quadro P2000. Which is way below a 1660ti in terms of specs @Jimmeimetis.

Yolov5s.pt + mobilenetv2_x1_0_msmt17.pt

0: 480x640 1 person, 3 cars, Done. YOLO:(0.024s), StrongSORT:(0.047s)
0: 480x640 1 person, 5 cars, Done. YOLO:(0.019s), StrongSORT:(0.031s)
0: 480x640 1 person, 5 cars, Done. YOLO:(0.018s), StrongSORT:(0.032s)
0: 480x640 1 person, 5 cars, Done. YOLO:(0.019s), StrongSORT:(0.030s)
0: 480x640 1 person, 4 cars, Done. YOLO:(0.018s), StrongSORT:(0.027s)
0: 480x640 1 person, 4 cars, Done. YOLO:(0.018s), StrongSORT:(0.027s)
0: 480x640 1 person, 4 cars, Done. YOLO:(0.019s), StrongSORT:(0.025s)

~20FPS

Yolov5s.engine + mobilenetv2_x1_0_msmt17.engine

0: 640x640 1 class0, 2 class2s, Done. YOLO:(0.018s), StrongSORT:(0.018s)
0: 640x640 1 class0, 3 class2s, Done. YOLO:(0.019s), StrongSORT:(0.020s)
0: 640x640 1 class0, 3 class2s, Done. YOLO:(0.017s), StrongSORT:(0.020s)
0: 640x640 1 class0, 3 class2s, Done. YOLO:(0.019s), StrongSORT:(0.020s)
0: 640x640 1 class0, 2 class2s, Done. YOLO:(0.018s), StrongSORT:(0.017s)
0: 640x640 1 class0, 2 class2s, Done. YOLO:(0.018s), StrongSORT:(0.016s)
0: 640x640 1 class0, 2 class2s, Done. YOLO:(0.018s), StrongSORT:(0.017s)
0: 640x640 1 class0, 2 class2s, Done. YOLO:(0.017s), StrongSORT:(0.017s)

~27FPS

Notice that my main work is in my Yolov5StrongSORT repo which is currently ahead of Yolov7StrongSORT.

@Jimmeimetis
Copy link

These look much more reasonable given the GFLOPS of the models used in StrongSORT, lots of weird behavior on my turing GPU (1660ti) compared to my pascal one (1070). Cuda 11 makes my 1660ti detect nothing on yolov7 and on cuda 10.2 that im running as a workaround , fp16 is significantly slower vs fp32 .

Also thanks for letting me know about your work on the yolov5 repo. Will test it later!

@Jimmeimetis
Copy link

Ok tested it, StrongSORT run time is proper on the yolov5 repo so I will use that implementation or v7. Lastly, just disabling half precision on my cuda11 environment with the 1660ti seems to do the trick inference wise (it now detects).

Will test it on a 3090 soon enough in attempt to try and find the culprit. Thanks!

@mikel-brostrom
Copy link
Owner

mikel-brostrom commented Sep 11, 2022

Notice that the more detection you have the longer time it will take for StrongSORT to finish the association process. Btw, I don't think 1660ti supports half precision inference...

@Jimmeimetis
Copy link

Btw, I don't think 1660ti supports half precision inference...

It does and the issue is likely some poor interaction with pytorch/cuda. Even if it didn't support accelerated fp16 at 2x the rate of fp32 the performance should have been roughly the same and not degraded ~10x like it is on my side. I will get to the bottom of this eventually but its not a priority right now.

https://www.nvidia.com/en-us/geforce/news/geforce-gtx-1660-ti-advanced-shaders-streaming-multiprocessor/

Thanks and have a good night

@github-actions
Copy link

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

@NQHuy1905
Copy link

@Jimmeimetis have you found the culprit of this issue, i am using 1660 too and strongsort process 0.2s per frame, pretty slow

@Jimmeimetis
Copy link

@NQHuy1905 I ported the strongsort tracker from the v5 repo to the v7 and the execution times lined up with the v5 ones. That being said, while it was able to run in real time using a very fast inference model I did not consider it being worth using over deepsort due to the higher execution time as is(even used significantly smaller models for strong sort and still it wasn't good enough for my standards)..

The porting i did of the code + testing actually took place the following day from my last post here. I did it as fast as possible to get the results i needed so the changes are somewhat poorly made.

Either way if you want to try it, I can try uploading the project somewhere this weekend

@NQHuy1905
Copy link

NQHuy1905 commented Feb 3, 2023

@Jimmeimetis So you mean the reason of high execution time is because strongsort. I haven't try deepsort with yolov7 but have you tried and did execution time is lower? I tried tracker with v5 and v7 repo of smaller yolo and strongsort models and it wasn't good enough for my standards too

@Jimmeimetis
Copy link

@NQHuy1905 Yes I have been running yolo v7 and v8 with deepsort. It has its own problems but at this point I don't have the time yet to dive into other trackers. There are public repos out there that have paired v7 with deepsort if you want to try

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale
Projects
None yet
Development

No branches or pull requests

6 participants