Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA kernel failed : no kernel image is available for execution on the device #4

Closed
Zixin-Tang opened this issue Jan 10, 2021 · 9 comments

Comments

@Zixin-Tang
Copy link

您好,我电脑上有两张显卡,一张2080Ti,一张1080,驱动为NVIDIA-SMI 450.66,CUDA10.1

我想在1080上跑测试,于是我将command_test.sh中的CUDA_VISIBLE_DEVICES=0改为了CUDA_VISIBLE_DEVICES=1

但当我运行sh command_test.sh时,会报如下错误:

CUDA kernel failed : no kernel image is available for execution on the device
void furthest_point_sampling_kernel_wrapper(int, int, int, const float*, float*, int*) at L:233 in /home/agent/grasp/graspnet-baseline/pointnet2/_ext_src/src/sampling_gpu.cu

该错误发生的位置大概是inference()中的

for batch_idx, batch_data in enumerate(TEST_DATALOADER)

请问是什么原因导致的呢?

@chenxi-wang
Copy link
Collaborator

Not sure whether it is because of pytorch installation.
For CUDA 10.1, the official instruction is
pip install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html(https://pytorch.org/get-started/previous-versions/).
You may try reinstalling pytorch if you installed it using pip install -r requirements.txt.

By the way, does the same error occur on 2080 Ti or on other repos with pytorch?

@Zixin-Tang
Copy link
Author

@chenxi-wang 目前在用2080Ti训练graspnet,是没有问题的,之前也用它测试过,也可以。就是尝试用1080的时候出现了问题,这张显卡是最近才装上来的,是还需要什么操作吗

@chenxi-wang
Copy link
Collaborator

@wozxfdha Does 1080 work fine running other programs? If not, you may check your nvidia driver and cuda installation.

@GouMinghao
Copy link

@wozxfdha I have done a test on GTX 1080 with python=3.6 cuda=10.2 pytorch=1.6.0. It works well, try to resetup the environment and check you driver and consumption of GPU memory.

@GouMinghao
Copy link

@wozxfdha I have reproduced your problem. The solution is to recompile pointnet2 and knn module. The reason maybe the change of hardware or the change of CUDA driver.

@Qidian213
Copy link

I miss the same problem, with cuda11.0, pytorch1.7, python3.8

@Fang-Haoshu
Copy link
Member

More information is needed. @Qidian213
You can see if the pytorch work properly with the GPU driver or cuda version

@Qidian213
Copy link

@Fang-Haoshu 问题已经解决了,因为我使用的显卡是GTX3060,算力为 'compute_86', 但是我装的cuda11.0及相应的pytorch只能支持到 'compute_80' 与 'compute_75' ,当我将cuda升级到11.3 安装对应的pytorch就没问题了。

@little-little-mnstr
Copy link

哥,我也遇到了这个问题,也没解决,有什么好的办法吗
Could not import cythonized box intersection. Consider compiling box_intersection.pyx for faster training.
2024-03-22 20:21:16,477 - pretrain_JD_pc2img - INFO - Copy the Config file from cfgs/pretrain_JD_pc2img.yaml to ./experiments/pretrain_JD_pc2img/cfgs/pimae/config.yaml
2024-03-22 20:21:16,479 - pretrain_JD_pc2img - INFO - args.wandb : False
2024-03-22 20:21:16,479 - pretrain_JD_pc2img - INFO - args.design : None
2024-03-22 20:21:16,479 - pretrain_JD_pc2img - INFO - args.model : mae_vit_base_patch16
2024-03-22 20:21:16,479 - pretrain_JD_pc2img - INFO - args.mask_ratio : 0.75
2024-03-22 20:21:16,479 - pretrain_JD_pc2img - INFO - args.norm_pix_loss : False
2024-03-22 20:21:16,479 - pretrain_JD_pc2img - INFO - args.start_ckpts_img : None
2024-03-22 20:21:16,479 - pretrain_JD_pc2img - INFO - args.img_size : (256, 352)
2024-03-22 20:21:16,479 - pretrain_JD_pc2img - INFO - args.config : cfgs/pretrain_JD_pc2img.yaml
2024-03-22 20:21:16,479 - pretrain_JD_pc2img - INFO - args.launcher : none
2024-03-22 20:21:16,479 - pretrain_JD_pc2img - INFO - args.local_rank : 0
2024-03-22 20:21:16,479 - pretrain_JD_pc2img - INFO - args.num_workers : 4
2024-03-22 20:21:16,479 - pretrain_JD_pc2img - INFO - args.seed : 0
2024-03-22 20:21:16,479 - pretrain_JD_pc2img - INFO - args.deterministic : False
2024-03-22 20:21:16,479 - pretrain_JD_pc2img - INFO - args.sync_bn : False
2024-03-22 20:21:16,479 - pretrain_JD_pc2img - INFO - args.exp_name : pimae
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - args.loss : cd1
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - args.start_ckpts : None
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - args.ckpts : None
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - args.val_freq : 1
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - args.vote : False
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - args.resume : False
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - args.test : False
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - args.finetune_model : False
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - args.scratch_model : False
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - args.mode : None
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - args.way : -1
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - args.shot : -1
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - args.fold : -1
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - args.experiment_path : ./experiments/pretrain_JD_pc2img/cfgs/pimae
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - args.tfboard_path : ./experiments/pretrain_JD_pc2img/cfgs/TFBoard/pimae
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - args.log_name : pretrain_JD_pc2img
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - args.use_gpu : True
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - args.distributed : False
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - config.optimizer = edict()
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - config.optimizer.type : AdamW
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - config.optimizer.kwargs = edict()
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - config.optimizer.kwargs.lr : 6.25e-05
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - config.optimizer.kwargs.weight_decay : 0.05
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - config.scheduler = edict()
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - config.scheduler.type : CosLR
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - config.scheduler.kwargs = edict()
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - config.scheduler.kwargs.epochs : 400
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - config.scheduler.kwargs.initial_epochs : 15
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - config.dataset = edict()
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - config.dataset.train = edict()
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - config.dataset.train.base = edict()
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - config.dataset.train.base.NAME : ShapeNet
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - config.dataset.train.base.DATA_PATH : /home/lyh/Anton/data/ShapeNet55-34/ShapeNet-55
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - config.dataset.train.base.N_POINTS : 8192
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - config.dataset.train.base.PC_PATH : /home/lyh/Anton/data/ShapeNet55-34/shapenet_pc
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - config.dataset.train.others = edict()
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - config.dataset.train.others.subset : train
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - config.dataset.train.others.npoints : 1024
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - config.dataset.train.others.bs : 8
2024-03-22 20:21:16,480 - pretrain_JD_pc2img - INFO - config.dataset.val = edict()
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.dataset.val.base = edict()
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.dataset.val.base.NAME : ShapeNet
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.dataset.val.base.DATA_PATH : /home/lyh/Anton/data/ShapeNet55-34/ShapeNet-55
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.dataset.val.base.N_POINTS : 8192
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.dataset.val.base.PC_PATH : /home/lyh/Anton/data/ShapeNet55-34/shapenet_pc
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.dataset.val.others = edict()
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.dataset.val.others.subset : test
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.dataset.val.others.npoints : 1024
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.dataset.val.others.bs : 16
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.dataset.test = edict()
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.dataset.test.base = edict()
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.dataset.test.base.NAME : ShapeNet
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.dataset.test.base.DATA_PATH : /home/lyh/Anton/data/ShapeNet55-34/ShapeNet-55
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.dataset.test.base.N_POINTS : 8192
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.dataset.test.base.PC_PATH : /home/lyh/Anton/data/ShapeNet55-34/shapenet_pc
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.dataset.test.others = edict()
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.dataset.test.others.subset : test
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.dataset.test.others.npoints : 1024
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.dataset.test.others.bs : 8
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.img_model = edict()
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.img_model.NAME : MAE
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.img_model.patch_size : 16
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.img_model.encoder = edict()
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.img_model.encoder.trans_dim : 256
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.img_model.encoder.depth : 3
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.img_model.encoder.num_heads : 4
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.img_model.encoder.mlp_ratio : 4.0
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.img_model.decoder = edict()
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.img_model.decoder.trans_dim : 192
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.img_model.decoder.depth : 2
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.img_model.decoder.num_heads : 3
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.img_model.decoder.mlp_ratio : 4.0
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.pc_model = edict()
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.pc_model.NAME : Point_MAE
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.pc_model.group_size : 32
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.pc_model.num_group : 128
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.pc_model.loss : cdl2
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.pc_model.transformer_config = edict()
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.pc_model.transformer_config.mask_ratio : 0.6
2024-03-22 20:21:16,481 - pretrain_JD_pc2img - INFO - config.pc_model.transformer_config.mask_type : rand
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.pc_model.transformer_config.trans_dim : 256
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.pc_model.transformer_config.encoder_dims : 256
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.pc_model.transformer_config.depth : 3
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.pc_model.transformer_config.drop_path_rate : 0.1
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.pc_model.transformer_config.num_heads : 4
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.pc_model.transformer_config.dim_feedforward : 128
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.pc_model.transformer_config.dropout : 0.1
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.pc_model.transformer_config.decoder_trans_dim : 192
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.pc_model.transformer_config.decoder_depth : 2
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.pc_model.transformer_config.decoder_num_heads : 3
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.joint_model = edict()
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.joint_model.NAME : MultiMAE
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.joint_model.hp_overlap_ratio : 0.0
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.joint_model.mask_ratio : 0.6
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.joint_model.token_fusion : False
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.joint_model.pc2img = edict()
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.joint_model.pc2img.rgb : False
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.joint_model.pc2img.feat : True
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.joint_model.encoder = edict()
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.joint_model.encoder.trans_dim : 256
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.joint_model.encoder.depth : 3
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.joint_model.encoder.num_heads : 4
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.joint_model.encoder.mlp_ratio : 4.0
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.joint_model.encoder.dim_feedforward : 128
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.joint_model.encoder.dropout : 0.1
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.joint_model.decoder = edict()
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.joint_model.decoder.trans_dim : 192
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.joint_model.decoder.depth : 1
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.joint_model.decoder.num_heads : 3
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.joint_model.decoder.mlp_ratio : 4.0
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.npoints : 2048
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.total_bs : 8
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.step_per_update : 2
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.max_epoch : 400
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.train_aug : False
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - config.description : 3+3 structure with smaller decoder settings and no augmenation, with none align and joint decoder(2+1+1)
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - Distributed training: False
2024-03-22 20:21:16,482 - pretrain_JD_pc2img - INFO - Set random seed to 0, deterministic: False
2024-03-22 20:21:16,655 - Point_MAE - INFO - [Point_MAE]
2024-03-22 20:21:16,655 - Transformer - INFO - [args] {'mask_ratio': 0.6, 'mask_type': 'rand', 'trans_dim': 256, 'encoder_dims': 256, 'depth': 3, 'drop_path_rate': 0.1, 'num_heads': 4, 'dim_feedforward': 128, 'dropout': 0.1, 'decoder_trans_dim': 192, 'decoder_depth': 2, 'decoder_num_heads': 3}
2024-03-22 20:21:16,679 - Point_MAE - INFO - [Point_MAE] divide point cloud into G128 x S32 points ...
2024-03-22 20:21:19,173 - pretrain_JD_pc2img - INFO - Using Data parallel ...
CUDA kernel failed : no kernel image is available for execution on the device
void furthest_point_sampling_kernel_wrapper(int, int, int, const float*, float*, int*) at L:228 in /media/wywd/PSSD/PiMAE/Pretrain/Pointnet2_PyTorch/pointnet2_ops_lib/pointnet2_ops/_ext-src/src/sampling_gpu.cu

driver 535
cuda 12.2
pytorch 1.7.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants