mmdeploy - ERROR - Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #682

fengzfcany · 2022-06-30T07:32:45Z

model settings

model = dict(
    type='CascadeRCNN',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch',
        init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,
        feat_channels=256,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[.0, .0, .0, .0],
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)),
    roi_head=dict(
        type='CascadeRoIHead',
        num_stages=3,
        stage_loss_weights=[1, 0.5, 0.25],
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=[
            dict(
                type='Shared2FCBBoxHead',
                in_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=7,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0., 0., 0., 0.],
                    target_stds=[0.1, 0.1, 0.2, 0.2]),
                reg_class_agnostic=True,
                loss_cls=dict(
                    type='CrossEntropyLoss',
                    use_sigmoid=False,
                    loss_weight=1.0),
                loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
                               loss_weight=1.0)),
            dict(
                type='Shared2FCBBoxHead',
                in_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=7,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0., 0., 0., 0.],
                    target_stds=[0.05, 0.05, 0.1, 0.1]),
                reg_class_agnostic=True,
                loss_cls=dict(
                    type='CrossEntropyLoss',
                    use_sigmoid=False,
                    loss_weight=1.0),
                loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
                               loss_weight=1.0)),
            dict(
                type='Shared2FCBBoxHead',
                in_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=7,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0., 0., 0., 0.],
                    target_stds=[0.033, 0.033, 0.067, 0.067]),
                reg_class_agnostic=True,
                loss_cls=dict(
                    type='CrossEntropyLoss',
                    use_sigmoid=False,
                    loss_weight=1.0),
                loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))
        ]),
    # model training and testing settings
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=-1,
                add_gt_as_proposals=False),
            allowed_border=0,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_pre=2000,
            max_per_img=2000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=[
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.5,
                    neg_iou_thr=0.5,
                    min_pos_iou=0.5,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                pos_weight=-1,
                debug=False),
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.6,
                    neg_iou_thr=0.6,
                    min_pos_iou=0.6,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                pos_weight=-1,
                debug=False),
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.7,
                    neg_iou_thr=0.7,
                    min_pos_iou=0.7,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                pos_weight=-1,
                debug=False)
        ]),
    test_cfg=dict(
        rpn=dict(
            nms_pre=1000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            score_thr=0.05,
            nms=dict(type='nms', iou_threshold=0.5),
            max_per_img=100)))

# dataset settings
dataset_type = 'CocoDataset'
data_root = 'data/coco/'
img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Resize', img_scale=(512, 400), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(512, 400),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]

classes = ("pokong", "duanlu", "duanlu1", "dukong", "quekou", "zhenkong", "cantong")

data = dict(
    samples_per_gpu=8,
    workers_per_gpu=4,
    train=dict(
        type=dataset_type,
        classes=classes,
        ann_file=data_root + 'annotations/instances_train2017.json',
        img_prefix=data_root + 'train2017/',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        classes=classes,
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        classes=classes,
        ann_file=data_root + 'annotations/instances_val2017.json',
        img_prefix=data_root + 'val2017/',
        pipeline=test_pipeline))
evaluation = dict(interval=1, metric='bbox')

# default_runtime
checkpoint_config = dict(interval=1)
# yapf:disable
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook')
    ])
# yapf:enable
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1), ('val', 1)]
# disable opencv multithreading to avoid system being overloaded
opencv_num_threads = 0
# set multi-process start method as `fork` to speed up the training
mp_start_method = 'fork'
# Default setting for scaling LR automatically
#   - `enable` means enable scaling LR automatically
#       or not by default.
#   - `base_batch_size` = (8 GPUs) x (2 samples per GPU).
auto_scale_lr = dict(enable=False, base_batch_size=32)

# schedule_1x
# optimizer
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)

pth2onnx

import os.path as osp
from mmdeploy.apis import torch2onnx
from mmdeploy.utils import get_root_logger

logger = get_root_logger(log_level='INFO')

deploy_cfg_path = '/root/fengzf/jupyter_workdir/mmdeploy/mmdeploy/configs/mmdet/detection/detection_onnxruntime_dynamic.py'
model_cfg_path = "/root/fengzf/mmdetection/work_dirs/cascade_rcnn_r50_fpn_1x_coco/cascade_rcnn_r50_fpn_1x_coco.py"
checkpoint_path = '/root/fengzf/mmdetection/work_dirs/cascade_rcnn_r50_fpn_1x_coco/epoch_4.pth'
img = '/root/fengzf/jupyter_workdir/mmdeploy/mmdeploy/demo/demo.jpg'
output_path = '/tmp/cascade_model/1.onnx'
work_dir, save_file = osp.split(output_path)
device = 'cuda:0'

logger.info(f'torch2onnx: \n\tmodel_cfg: {model_cfg_path} '
            f'\n\tdeploy_cfg: {deploy_cfg_path}')
try:
    torch2onnx(
        img,
        work_dir,
        save_file,
        deploy_cfg=deploy_cfg_path,
        model_cfg=model_cfg_path,
        model_checkpoint=checkpoint_path,
        device=device)
    logger.info('torch2onnx success.')
except Exception as e:
    logger.error(e)
    logger.error('torch2onnx failed.')

[W shape_type_inference.cpp:425] Warning: Constant folding in symbolic shape inference fails: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (function ComputeConstantFolding)
[W shape_type_inference.cpp:425] Warning: Constant folding in symbolic shape inference fails: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (function ComputeConstantFolding)
[W shape_type_inference.cpp:425] Warning: Constant folding in symbolic shape inference fails: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (function ComputeConstantFolding)
[W shape_type_inference.cpp:425] Warning: Constant folding in symbolic shape inference fails: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (function ComputeConstantFolding)
[W shape_type_inference.cpp:425] Warning: Constant folding in symbolic shape inference fails: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (function ComputeConstantFolding)
[W shape_type_inference.cpp:425] Warning: Constant folding in symbolic shape inference fails: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (function ComputeConstantFolding)
[W shape_type_inference.cpp:425] Warning: Constant folding in symbolic shape inference fails: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (function ComputeConstantFolding)
[W shape_type_inference.cpp:425] Warning: Constant folding in symbolic shape inference fails: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (function ComputeConstantFolding)
[W shape_type_inference.cpp:425] Warning: Constant folding in symbolic shape inference fails: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (function ComputeConstantFolding)
2022-06-30 15:30:01,609 - mmdeploy - ERROR - Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
2022-06-30 15:30:01,609 - mmdeploy - ERROR - torch2onnx failed.

Process finished with exit code 0

The text was updated successfully, but these errors were encountered:

grimoire · 2022-07-01T04:35:23Z

Please post your environment with https://github.com/open-mmlab/mmdeploy/blob/master/tools/check_env.py.

fengzfcany · 2022-07-01T09:39:51Z

Device uses CPU, and PTH can be successfully converted to onnx

fengzfcany · 2022-07-01T09:42:16Z

ssh://root@10.6.5.8:22/usr/local/anaconda3/envs/mmdeploy/bin/python -u /root/fengzf/mmdeploy/tools/check_env.py
2022-07-01 17:40:55,776 - mmdeploy - INFO - 

2022-07-01 17:40:55,776 - mmdeploy - INFO - **********Environmental information**********
2022-07-01 17:40:56,153 - mmdeploy - INFO - sys.platform: linux
2022-07-01 17:40:56,153 - mmdeploy - INFO - Python: 3.7.0 (default, Oct  9 2018, 10:31:47) [GCC 7.3.0]
2022-07-01 17:40:56,153 - mmdeploy - INFO - CUDA available: True
2022-07-01 17:40:56,154 - mmdeploy - INFO - GPU 0,1: NVIDIA GeForce RTX 3090
2022-07-01 17:40:56,154 - mmdeploy - INFO - CUDA_HOME: /usr/local/cuda
2022-07-01 17:40:56,154 - mmdeploy - INFO - NVCC: Cuda compilation tools, release 11.3, V11.3.58
2022-07-01 17:40:56,154 - mmdeploy - INFO - GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
2022-07-01 17:40:56,154 - mmdeploy - INFO - PyTorch: 1.12.0+cu113
2022-07-01 17:40:56,154 - mmdeploy - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  - CuDNN 8.3.2  (built against CUDA 11.5)
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

2022-07-01 17:40:56,154 - mmdeploy - INFO - TorchVision: 0.13.0+cu113
2022-07-01 17:40:56,154 - mmdeploy - INFO - OpenCV: 4.6.0
2022-07-01 17:40:56,154 - mmdeploy - INFO - MMCV: 1.5.3
2022-07-01 17:40:56,154 - mmdeploy - INFO - MMCV Compiler: GCC 9.4
2022-07-01 17:40:56,154 - mmdeploy - INFO - MMCV CUDA Compiler: 11.3
2022-07-01 17:40:56,154 - mmdeploy - INFO - MMDeploy: 0.5.0+0cac515
2022-07-01 17:40:56,154 - mmdeploy - INFO - 

2022-07-01 17:40:56,154 - mmdeploy - INFO - **********Backend information**********
2022-07-01 17:40:56,700 - mmdeploy - INFO - onnxruntime: 1.11.1	ops_is_avaliable : True
2022-07-01 17:40:56,702 - mmdeploy - INFO - tensorrt: None	ops_is_avaliable : False
2022-07-01 17:40:56,722 - mmdeploy - INFO - ncnn: None	ops_is_avaliable : False
2022-07-01 17:40:56,723 - mmdeploy - INFO - pplnn_is_avaliable: False
2022-07-01 17:40:56,724 - mmdeploy - INFO - openvino_is_avaliable: False
2022-07-01 17:40:56,724 - mmdeploy - INFO - 

2022-07-01 17:40:56,724 - mmdeploy - INFO - **********Codebase information**********
2022-07-01 17:40:56,726 - mmdeploy - INFO - mmdet:	2.24.1
2022-07-01 17:40:56,727 - mmdeploy - INFO - mmseg:	None
2022-07-01 17:40:56,727 - mmdeploy - INFO - mmcls:	None
2022-07-01 17:40:56,727 - mmdeploy - INFO - mmocr:	None
2022-07-01 17:40:56,727 - mmdeploy - INFO - mmedit:	None
2022-07-01 17:40:56,727 - mmdeploy - INFO - mmdet3d:	None
2022-07-01 17:40:56,727 - mmdeploy - INFO - mmpose:	None
2022-07-01 17:40:56,727 - mmdeploy - INFO - mmrotate:	None

Process finished with exit code 0

grimoire · 2022-07-04T08:52:53Z

I can not reproduce the error.
Please try set the input device and model device in https://github.com/open-mmlab/mmdeploy/blob/master/mmdeploy/apis/onnx/export.py.

grimoire · 2022-09-07T08:06:00Z

Close due to no reply. Please feel free to reopen if you still need help.

ahmedmustahid · 2023-03-07T08:06:54Z

I have been facing the same issue while using ncnn backend.
Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument tensors in method wrapper___cat)
I am trying to convert retinanet into ncnn according to this using the command:

python ./tools/deploy.py configs/mmdet/detection/single-stage_ncnn_static-800x1344.py $PATH_TO_MMDET/configs/retinanet/retinanet_r18_fpn_1x_coco.py $PATH_TO_MMDET/checkpoints/retinanet_r18_fpn_1x_coco_20220407_171055-614fd399.pth $PATH_TO_MMDET/demo/demo.jpg     --work-dir work_dir     --show     --device cuda:0     --dump-info

The output of this command:

root@ahmed:~/workspace/mmdeploy# python ./tools/deploy.py configs/mmdet/detection/single-stage_ncnn_static-800x1344.py $PATH_TO_MMDET/configs/retinanet/retinanet_r18_fpn_1x_coco.py $PATH_TO_MMDET/checkpoints/retinanet_r18_fpn_1x_coco_20220407_171055-614fd399.pth $PATH_TO_MMDET/demo/demo.jpg     --work-dir work_dir     --show     --device cuda:0     --dump-info
/root/workspace/mmdetection/mmdet/datasets/utils.py:66: UserWarning: "ImageToTensor" pipeline is replaced by "DefaultFormatBundle" for batch inference. It is recommended to manually replace it in the test data pipeline in your config file.
  warnings.warn(
2023-03-07 07:57:11,025 - mmdeploy - INFO - Start pipeline mmdeploy.apis.pytorch2onnx.torch2onnx in subprocess
load checkpoint from local path: /root/workspace/mmdetection/checkpoints/retinanet_r18_fpn_1x_coco_20220407_171055-614fd399.pth
/root/workspace/mmdetection/mmdet/datasets/utils.py:66: UserWarning: "ImageToTensor" pipeline is replaced by "DefaultFormatBundle" for batch inference. It is recommended to manually replace it in the test data pipeline in your config file.
  warnings.warn(
2023-03-07 07:57:13,877 - mmdeploy - WARNING - DeprecationWarning: get_onnx_config will be deprecated in the future. 
2023-03-07 07:57:13,877 - mmdeploy - INFO - Export PyTorch model to ONNX: work_dir/end2end.onnx.
2023-03-07 07:57:13,986 - mmdeploy - WARNING - Can not find torch._C._jit_pass_onnx_autograd_function_process, function rewrite will not be applied
/root/workspace/mmdeploy/mmdeploy/pytorch/functions/getattribute.py:18: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  ret = torch.Size([int(s) for s in ret])
/root/workspace/mmdeploy/mmdeploy/codebase/mmdet/models/detectors/base.py:24: TracerWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
  img_shape = [int(val) for val in img_shape]
/root/workspace/mmdeploy/mmdeploy/codebase/mmdet/models/detectors/base.py:24: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  img_shape = [int(val) for val in img_shape]
/root/workspace/mmdeploy/mmdeploy/codebase/mmdet/models/dense_heads/base_dense_head.py:381: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  vars = torch.tensor(self.bbox_coder.stds)
/root/workspace/mmdeploy/mmdeploy/pytorch/functions/size.py:22: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  ret = [int(r) for r in ret]
Process Process-2:
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/root/workspace/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/root/workspace/mmdeploy/mmdeploy/apis/pytorch2onnx.py", line 98, in torch2onnx
    export(
  File "/root/workspace/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 356, in _wrap
    return self.call_function(func_name_, *args, **kwargs)
  File "/root/workspace/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 326, in call_function
    return self.call_function_local(func_name, *args, **kwargs)
  File "/root/workspace/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 275, in call_function_local
    return pipe_caller(*args, **kwargs)
  File "/root/workspace/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/root/workspace/mmdeploy/mmdeploy/apis/onnx/export.py", line 122, in export
    torch.onnx.export(
  File "/opt/conda/lib/python3.8/site-packages/torch/onnx/__init__.py", line 316, in export
    return utils.export(model, args, f, export_params, verbose, training,
  File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 107, in export
    _export(model, args, f, export_params, verbose, training, input_names, output_names,
  File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 724, in _export
    _model_to_graph(model, args, verbose, input_names,
  File "/root/workspace/mmdeploy/mmdeploy/core/rewriters/rewriter_utils.py", line 402, in wrapper
    return self.func(self, *args, **kwargs)
  File "/root/workspace/mmdeploy/mmdeploy/apis/onnx/optimizer.py", line 10, in model_to_graph__custom_optimizer
    graph, params_dict, torch_out = ctx.origin_func(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 493, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 437, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/opt/conda/lib/python3.8/site-packages/torch/onnx/utils.py", line 388, in _trace_and_get_graph_from_model
    torch.jit._get_trace_graph(model, args, strict=False, _force_outplace=False, _return_inputs_states=True)
  File "/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py", line 1166, in _get_trace_graph
    outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py", line 127, in forward
    graph, out = torch._C._create_graph_by_tracing(
  File "/opt/conda/lib/python3.8/site-packages/torch/jit/_trace.py", line 118, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1090, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/root/workspace/mmdeploy/mmdeploy/core/rewriters/rewriter_utils.py", line 402, in wrapper
    return self.func(self, *args, **kwargs)
  File "/root/workspace/mmdeploy/mmdeploy/codebase/mmdet/models/detectors/base.py", line 70, in base_detector__forward
    return __forward_impl(ctx, self, img, img_metas=img_metas, **kwargs)
  File "/root/workspace/mmdeploy/mmdeploy/core/optimizers/function_marker.py", line 261, in g
    rets = f(*args, **kwargs)
  File "/root/workspace/mmdeploy/mmdeploy/codebase/mmdet/models/detectors/base.py", line 26, in __forward_impl
    return self.simple_test(img, img_metas, **kwargs)
  File "/root/workspace/mmdeploy/mmdeploy/core/rewriters/rewriter_utils.py", line 402, in wrapper
    return self.func(self, *args, **kwargs)
  File "/root/workspace/mmdeploy/mmdeploy/codebase/mmdet/models/detectors/single_stage.py", line 28, in single_stage_detector__simple_test
    return self.bbox_head.simple_test(feat, img_metas, **kwargs)
  File "/root/workspace/mmdetection/mmdet/models/dense_heads/base_dense_head.py", line 360, in simple_test
    return self.simple_test_bboxes(feats, img_metas, rescale=rescale)
  File "/root/workspace/mmdetection/mmdet/models/dense_heads/dense_test_mixins.py", line 37, in simple_test_bboxes
    results_list = self.get_bboxes(
  File "/root/workspace/mmdeploy/mmdeploy/core/rewriters/rewriter_utils.py", line 402, in wrapper
    return self.func(self, *args, **kwargs)
  File "/root/workspace/mmdeploy/mmdeploy/codebase/mmdet/models/dense_heads/base_dense_head.py", line 491, in base_dense_head__get_bboxes__ncnn
    batch_mlvl_priors = torch.cat([batch_mlvl_priors, batch_mlvl_vars], dim=1)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument tensors in method wrapper___cat)
2023-03-07 07:57:15,187 - mmdeploy - ERROR - `mmdeploy.apis.pytorch2onnx.torch2onnx` with Call id: 0 failed. exit.

I am running my code inside mmdeploy docker master gpu.
My environment:

root@ahmed:~/workspace/mmdeploy# python tools/check_env.py 
2023-03-07 08:02:18,410 - mmdeploy - INFO - 

2023-03-07 08:02:18,410 - mmdeploy - INFO - **********Environmental information**********
2023-03-07 08:02:18,544 - mmdeploy - INFO - sys.platform: linux
2023-03-07 08:02:18,545 - mmdeploy - INFO - Python: 3.8.16 (default, Jan 17 2023, 23:13:24) [GCC 11.2.0]
2023-03-07 08:02:18,545 - mmdeploy - INFO - CUDA available: True
2023-03-07 08:02:18,545 - mmdeploy - INFO - GPU 0: NVIDIA GeForce RTX 3060 Laptop GPU
2023-03-07 08:02:18,545 - mmdeploy - INFO - CUDA_HOME: /usr/local/cuda
2023-03-07 08:02:18,545 - mmdeploy - INFO - NVCC: Cuda compilation tools, release 11.6, V11.6.124
2023-03-07 08:02:18,545 - mmdeploy - INFO - GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
2023-03-07 08:02:18,545 - mmdeploy - INFO - PyTorch: 1.10.0
2023-03-07 08:02:18,545 - mmdeploy - INFO - PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.2
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

2023-03-07 08:02:18,545 - mmdeploy - INFO - TorchVision: 0.11.0
2023-03-07 08:02:18,545 - mmdeploy - INFO - OpenCV: 4.7.0
2023-03-07 08:02:18,545 - mmdeploy - INFO - MMCV: 1.5.3
2023-03-07 08:02:18,545 - mmdeploy - INFO - MMCV Compiler: GCC 7.3
2023-03-07 08:02:18,545 - mmdeploy - INFO - MMCV CUDA Compiler: 11.3
2023-03-07 08:02:18,545 - mmdeploy - INFO - MMDeploy: 0.13.0+7de413a
2023-03-07 08:02:18,545 - mmdeploy - INFO - 

2023-03-07 08:02:18,545 - mmdeploy - INFO - **********Backend information**********
2023-03-07 08:02:18,582 - mmdeploy - INFO - tensorrt:	8.2.4.2
2023-03-07 08:02:18,583 - mmdeploy - INFO - tensorrt custom ops:	Available
2023-03-07 08:02:18,601 - mmdeploy - INFO - ONNXRuntime:	None
2023-03-07 08:02:18,601 - mmdeploy - INFO - ONNXRuntime-gpu:	1.8.1
2023-03-07 08:02:18,601 - mmdeploy - INFO - ONNXRuntime custom ops:	Available
2023-03-07 08:02:18,602 - mmdeploy - INFO - pplnn:	None
2023-03-07 08:02:18,604 - mmdeploy - INFO - ncnn:	None
2023-03-07 08:02:18,606 - mmdeploy - INFO - snpe:	None
2023-03-07 08:02:18,607 - mmdeploy - INFO - openvino:	None
2023-03-07 08:02:18,608 - mmdeploy - INFO - torchscript:	1.10.0
2023-03-07 08:02:18,608 - mmdeploy - INFO - torchscript custom ops:	NotAvailable
2023-03-07 08:02:18,628 - mmdeploy - INFO - rknn-toolkit:	None
2023-03-07 08:02:18,628 - mmdeploy - INFO - rknn2-toolkit:	None
2023-03-07 08:02:18,630 - mmdeploy - INFO - ascend:	None
2023-03-07 08:02:18,631 - mmdeploy - INFO - coreml:	None
2023-03-07 08:02:19,256 - mmdeploy - INFO - tvm:	None
2023-03-07 08:02:19,256 - mmdeploy - INFO - 

2023-03-07 08:02:19,256 - mmdeploy - INFO - **********Codebase information**********
2023-03-07 08:02:19,257 - mmdeploy - INFO - mmdet:	2.28.2
2023-03-07 08:02:19,257 - mmdeploy - INFO - mmseg:	None
2023-03-07 08:02:19,257 - mmdeploy - INFO - mmcls:	None
2023-03-07 08:02:19,257 - mmdeploy - INFO - mmocr:	None
2023-03-07 08:02:19,257 - mmdeploy - INFO - mmedit:	None
2023-03-07 08:02:19,257 - mmdeploy - INFO - mmdet3d:	None
2023-03-07 08:02:19,257 - mmdeploy - INFO - mmpose:	None
2023-03-07 08:02:19,257 - mmdeploy - INFO - mmrotate:	None
2023-03-07 08:02:19,257 - mmdeploy - INFO - mmaction:	None

lvhan028 assigned grimoire Jun 30, 2022

grimoire closed this as completed Sep 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mmdeploy - ERROR - Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #682

mmdeploy - ERROR - Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #682

fengzfcany commented Jun 30, 2022 •

edited

Loading

grimoire commented Jul 1, 2022

fengzfcany commented Jul 1, 2022

fengzfcany commented Jul 1, 2022 •

edited

Loading

grimoire commented Jul 4, 2022

grimoire commented Sep 7, 2022

ahmedmustahid commented Mar 7, 2023 •

edited

Loading

mmdeploy - ERROR - Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #682

mmdeploy - ERROR - Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #682

Comments

fengzfcany commented Jun 30, 2022 • edited Loading

model settings

pth2onnx

grimoire commented Jul 1, 2022

fengzfcany commented Jul 1, 2022

fengzfcany commented Jul 1, 2022 • edited Loading

grimoire commented Jul 4, 2022

grimoire commented Sep 7, 2022

ahmedmustahid commented Mar 7, 2023 • edited Loading

fengzfcany commented Jun 30, 2022 •

edited

Loading

fengzfcany commented Jul 1, 2022 •

edited

Loading

ahmedmustahid commented Mar 7, 2023 •

edited

Loading