-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASSERT FAILED at /opt/conda/conda-bld/pytorch-nightly_1539602533843/work/aten/src/ATen/core/blob.h:79 #13304
Comments
Have the same problem. Have you managed how to fix it? |
@gf19880710, Same error is generated with "e2e_mask_rcnn_R-50-FPN_1x.yaml" also. |
The problem is already trained model available in OUTPUT_DIR which create conflict with test_model somehow. after moving already trained model it is working fine. |
Hey, I'm experiencing the same error while running at inference time, so no test model that can conflict. Any clue what could be going wrong? |
@tleers , Just check have you trained same model with any other configuration? if so, mode files from output directory to other directory. |
I am experiencing the same issue during inference time, have not trained the same model with any other configuration, and there is nothing in OUTPUT_DIR (made a new one). Does anyone know how to solve this? |
Hi @arjun-kava, @qvks, @tleers ,
I am using e2e_faster_rcnn_R-50-FPN_1x.yaml. I have trained FASTER-RCNN with FPN on a custom dataset with 12 classes. Also, there is no other trained model in OUTPUT_DIR. I am using google colab, so I have CUDA 10.0 with CUDNN 7.501 environment available. |
Hi. Not sure this might help but basically I'm getting this same error, Detectron inference with pretrained model from Model Zoo, and --cfg configs/12_2017_baselines/e2e_keypoint_rcnn_R-50-FPN_1x.yaml
I'm using Python 2.7.14 Anaconda, Ubuntu 18.04, and using this same machine for the other ML training project, so CUDA and CuDNN should be working properly. |
馃悰 Bug
Hello Great programmers:
When I was using FAIR's platform Detectron to do training with e2e_mask_rcnn_R-101-FPN_3x_gn.yaml config file, I faced this issue which indicated me to report one BUG to Pytorch.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
no exception or bug for training and testing.
Environment
Please copy and paste the output from our
environment collection script
(or fill out the checklist below manually).
You can get the script and run it with:
gengfeng@ai-work-4:~/Downloads$ python collect_env.py
Collecting environment information...
PyTorch version: 1.0.0.dev20181015
Is debug build: No
CUDA used to build PyTorch: 9.0.176
OS: Ubuntu 18.04.1 LTS
GCC version: (Ubuntu 5.5.0-12ubuntu1) 5.5.0 20171010
CMake version: version 3.11.4
Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 9.0.176
GPU models and configuration: GPU 0: GeForce GTX 1080
Nvidia driver version: 390.87
cuDNN version: Probably one of the following:
/usr/local/cuda-9.0/lib64/libcudnn.so
/usr/local/cuda-9.0/lib64/libcudnn.so.7
/usr/local/cuda-9.0/lib64/libcudnn.so.7.2.1
/usr/local/cuda-9.0/lib64/libcudnn_static.a
/usr/local/cuda-9.1/lib64/libcudnn.so
/usr/local/cuda-9.1/lib64/libcudnn.so.7.1.3
/usr/local/cuda-9.1/lib64/libcudnn_static.a
Versions of relevant libraries:
[pip] Could not collect
[conda] cuda91 1.0 h4c16780_0 pytorch
[conda] pytorch-nightly 1.0.0.dev20181015 py3.6_cuda9.0.176_cudnn7.1.2_0 pytorch
[conda] torch 0.4.0
[conda] torchvision 0.2.1
Additional context
By the way, e2e_mask_rcnn_R-50-FPN_1x.yaml config works fine for me.
Waiting your response, thank you .
The text was updated successfully, but these errors were encountered: