You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug/ 问题描述 (Mandatory / 必填)
A clear and concise description of what the bug is.
hrnet_w32、hrnet_w48执行静态图模式分布式训练均报错
Hardware Environment(Ascend/GPU/CPU) / 硬件环境:
Please delete the backend not involved / 请删除不涉及的后端:
/device ascend
Software Environment / 软件环境 (Mandatory / 必填):
-- MindSpore version (e.g., 1.7.0.Bxxx) :mindspore_v2.2.1 mindcv_0.2.2
-- Python version (e.g., Python 3.7.5) :3.7.5
-- OS platform and distribution (e.g., Linux Ubuntu 16.04):EulerOS2.8
-- GCC/Compiler version (if compiled from source):7.3.0
If this is your first time, please read our contributor guidelines:
https://github.com/mindspore-lab/mindcv/blob/main/CONTRIBUTING.md
Describe the bug/ 问题描述 (Mandatory / 必填)
A clear and concise description of what the bug is.
hrnet_w32、hrnet_w48执行静态图模式分布式训练均报错
Ascend
/GPU
/CPU
) / 硬件环境:Software Environment / 软件环境 (Mandatory / 必填):
-- MindSpore version (e.g., 1.7.0.Bxxx) :mindspore_v2.2.1 mindcv_0.2.2
-- Python version (e.g., Python 3.7.5) :3.7.5
-- OS platform and distribution (e.g., Linux Ubuntu 16.04):EulerOS2.8
-- GCC/Compiler version (if compiled from source):7.3.0
Excute Mode / 执行模式 (Mandatory / 必填)(
PyNative
/Graph
):To Reproduce / 重现步骤 (Mandatory / 必填)
Steps to reproduce the behavior:
Expected behavior / 预期结果 (Mandatory / 必填)
可跑通静态图分布式训练
Screenshots/ 日志 / 截图 (Mandatory / 必填)
If applicable, add screenshots to help explain your problem.
[2023-11-19 10:29:13] mindcv.scheduler.scheduler_factory WARNING - warmup_epochs + decay_epochs > num_epochs. Please check and reduce decay_epochs!
[2023-11-19 10:29:16] mindcv.train INFO - Essential Experiment Configurations:
MindSpore mode[GRAPH(0)/PYNATIVE(1)]: 0
Distributed mode: True
Number of devices: 8
Number of training samples: 800000
Number of validation samples: None
Number of classes: 1000
Number of batches: 781
Batch size: 128
Auto augment: randaug-m7-mstd0.5
MixUp: 0.2
CutMix: 1.0
Model: hrnet_w32
Model parameters: 41303464
Number of epochs: 5
Optimizer: adamw
Learning rate: 0.001
LR Scheduler: cosine_decay
Momentum: 0.9
Weight decay: 0.05
Auto mixed precision: O2
Loss scale: 1024(fixed)
[2023-11-19 10:29:16] mindcv.train INFO - Start training
[ERROR] PIPELINE(171895,ffff914f2190,python):2023-11-19-10:29:53.881.102 [mindspore/ccsrc/pipeline/jit/ps/fallback.cc:464] GeneratePyExecuteNodeWithScriptSrc] Not found PyExecute input. script: x[i] = self.branchesi
[ERROR] PIPELINE(171893,ffffbe9fb190,python):2023-11-19-10:29:54.378.528 [mindspore/ccsrc/pipeline/jit/ps/fallback.cc:464] GeneratePyExecuteNodeWithScriptSrc] Not found PyExecute input. script: x[i] = self.branchesi
[ERROR] PIPELINE(171887,ffff9b3ad190,python):2023-11-19-10:29:54.825.669 [mindspore/ccsrc/pipeline/jit/ps/fallback.cc:464] GeneratePyExecuteNodeWithScriptSrc] Not found PyExecute input. script: x[i] = self.branchesi
[ERROR] PIPELINE(171889,ffff87cee190,python):2023-11-19-10:29:55.189.347 [mindspore/ccsrc/pipeline/jit/ps/fallback.cc:464] GeneratePyExecuteNodeWithScriptSrc] Not found PyExecute input. script: x[i] = self.branchesi
[ERROR] PIPELINE(171890,ffff91938190,python):2023-11-19-10:29:55.439.711 [mindspore/ccsrc/pipeline/jit/ps/fallback.cc:464] GeneratePyExecuteNodeWithScriptSrc] Not found PyExecute input. script: x[i] = self.branchesi
[ERROR] PIPELINE(171894,ffff929f0190,python):2023-11-19-10:29:55.738.301 [mindspore/ccsrc/pipeline/jit/ps/fallback.cc:464] GeneratePyExecuteNodeWithScriptSrc] Not found PyExecute input. script: x[i] = self.branchesi
[ERROR] PIPELINE(171888,ffff8a2c7190,python):2023-11-19-10:29:56.666.323 [mindspore/ccsrc/pipeline/jit/ps/fallback.cc:464] GeneratePyExecuteNodeWithScriptSrc] Not found PyExecute input. script: x[i] = self.branchesi
[ERROR] PIPELINE(171891,ffffb5509190,python):2023-11-19-10:29:57.019.842 [mindspore/ccsrc/pipeline/jit/ps/fallback.cc:464] GeneratePyExecuteNodeWithScriptSrc] Not found PyExecute input. script: x[i] = self.branchesi
[WARNING] MD(171895,fffc8ffff1e0,python):2023-11-19-10:30:19.682.318 [mindspore/ccsrc/minddata/dataset/engine/datasetops/data_queue_op.cc:1168] DetectPerBatchTime] Bad performance attention, it takes more than 25 seconds to fetch a batch of data from dataset pipeline, which might result
GetNext
timeout problem. You may test dataset processing performance(with creating dataset iterator) and optimize it.Traceback (most recent call last):
File "/data3/zl/jenkins/workspace/Kits/source_code/mindcv//train.py", line 323, in
train(args)
File "/data3/zl/jenkins/workspace/Kits/source_code/mindcv//train.py", line 309, in train
trainer.train(args.epoch_size, loader_train, callbacks=callbacks, dataset_sink_mode=args.dataset_sink_mode)
File "/root/archiconda3/envs/Python380/lib/python3.8/site-packages/mindspore/train/model.py", line 1068, in train
self._train(epoch,
File "/root/archiconda3/envs/Python380/lib/python3.8/site-packages/mindspore/train/model.py", line 114, in wrapper
func(self, *args, **kwargs)
File "/root/archiconda3/envs/Python380/lib/python3.8/site-packages/mindspore/train/model.py", line 623, in _train
self._train_dataset_sink_process(epoch, train_dataset, list_callback,
File "/root/archiconda3/envs/Python380/lib/python3.8/site-packages/mindspore/train/model.py", line 708, in _train_dataset_sink_process
outputs = train_network(*inputs)
File "/root/archiconda3/envs/Python380/lib/python3.8/site-packages/mindspore/nn/cell.py", line 680, in call
out = self.compile_and_run(*args, **kwargs)
File "/root/archiconda3/envs/Python380/lib/python3.8/site-packages/mindspore/nn/cell.py", line 1020, in compile_and_run
self.compile(*args, **kwargs)
File "/root/archiconda3/envs/Python380/lib/python3.8/site-packages/mindspore/nn/cell.py", line 997, in compile
_cell_graph_executor.compile(self, phase=self.phase,
File "/root/archiconda3/envs/Python380/lib/python3.8/site-packages/mindspore/common/api.py", line 1547, in compile
result = self._graph_executor.compile(obj, args, kwargs, phase, self._use_vm_mode())
RuntimeError: For operation 'setitem', current input arguments types are <Tuple, Number, Tensor>. The 1-th argument type 'Tuple' is not supported now.
the support arguments types of 'setitem' operation as follows:
<List, Number, Number>
<List, Number, String>
<List, Number, List>
<List, Number, Tuple>
<List, Number, Tensor>
<List, Slice, Number>
<List, Slice, List>
<List, Slice, Tuple>
<List, Slice, Tensor>
<Tensor, None, Number>
<Tensor, None, List>
<Tensor, None, Tuple>
<Tensor, None, Tensor>
<Tensor, Ellipsis, Number>
<Tensor, Ellipsis, List>
<Tensor, Ellipsis, Tuple>
<Tensor, Ellipsis, Tensor>
<Tensor, Number, Number>
<Tensor, Number, List>
<Tensor, Number, Tuple>
<Tensor, Number, Tensor>
<Tensor, List, Number>
<Tensor, List, List>
<Tensor, List, Tuple>
<Tensor, List, Tensor>
<Tensor, Tuple, Number>
<Tensor, Tuple, List>
<Tensor, Tuple, Tuple>
<Tensor, Tuple, Tensor>
<Tensor, Slice, Number>
<Tensor, Slice, List>
<Tensor, Slice, Tuple>
<Tensor, Slice, Tensor>
<Tensor, Tensor, Number>
<Tensor, Tensor, List>
<Tensor, Tensor, Tuple>
<Tensor, Tensor, Tensor>
<Dictionary, Number, Number>
<Dictionary, Number, List>
<Dictionary, Number, Tuple>
<Dictionary, Number, Tensor>
<Dictionary, Number, Dictionary>
<Dictionary, String, Number>
<Dictionary, String, List>
<Dictionary, String, Tuple>
<Dictionary, String, Tensor>
<Dictionary, String, Dictionary>
<Dictionary, Tuple, Number>
<Dictionary, Tuple, List>
<Dictionary, Tuple, Tuple>
<Dictionary, Tuple, Tensor>
<Dictionary, Tuple, Dictionary>
<Dictionary, Tensor, Number>
<Dictionary, Tensor, List>
<Dictionary, Tensor, Tuple>
<Dictionary, Tensor, Tensor>
<Dictionary, Tensor, Dictionary>
<MapTensor, Tensor, Tensor>
For more details with 'setitem', please refer to https://mindspore.cn/search/en?inputValue=Index%20value%20assignment
Additional context / 备注 (Optional / 选填)
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: