-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Description
🐛 Describe the bug
Hi,
I am using your training procedure for object detection https://github.com/pytorch/vision/blob/main/references/detection/train.py
with a custom dataset. When I evaluate my model, the output seems correct for the very first epoch but for the following epochs, the metrics fall to 0.
However, this is not related to the model performance as I can use a checkpoint and evaluate it in another process, which gives back expected values.
From the code, the difference between COCO dataset and a custom dataset happens here:
vision/references/detection/coco_utils.py
Line 203 in f4fd193
return convert_to_coco_api(dataset) |
I suppose that the current behavior is not expected. Have you ever faced a similar issue and how can I correct it ?
Renaud
Versions
Collecting environment information...
PyTorch version: 1.10.1
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A
OS: CentOS Linux release 8.2.2004 (Core) (x86_64)
GCC version: (GCC) 8.3.1 20191121 (Red Hat 8.3.1-5)
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.28
Python version: 3.9.7 (default, Sep 16 2021, 13:09:58) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-4.18.0-193.6.3.el8_2.x86_64-x86_64-with-glibc2.28
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: GeForce RTX 2080 Ti
Nvidia driver version: 450.57
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.21.2
[pip3] torch==1.10.1
[pip3] torchvision==0.11.2
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.3.1 h2bc3f7f_2
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py39h7f8727e_0
[conda] mkl_fft 1.3.1 py39hd3c417c_0
[conda] mkl_random 1.2.2 py39h51133e4_0
[conda] numpy 1.21.2 py39h20f2e39_0
[conda] numpy-base 1.21.2 py39h79a1101_0
[conda] pytorch 1.10.1 py3.9_cuda11.3_cudnn8.2.0_0 pytorch
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torchvision 0.11.2 py39_cu113 pytorch
cc @datumbox