Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: out of memory -- although GPU is empty #288

Open
Masum06 opened this issue Jul 19, 2022 · 0 comments
Open

RuntimeError: CUDA error: out of memory -- although GPU is empty #288

Masum06 opened this issue Jul 19, 2022 · 0 comments

Comments

@Masum06
Copy link

Masum06 commented Jul 19, 2022

Only reason I can think of is my cuda version is 11.7, but the latest version of PyTorch available is for cuda 11.6. Could that be the issue?

image

Log:

 python tools/test.py \
    --cfg experiments/coco/hrnet/w48_384x288_adam_lr1e-3.yaml \
    TEST.MODEL_FILE models/pytorch/pose_hrnet_w48_384x288.pth \
    TEST.USE_GT_BBOX False
=> creating log/coco/pose_hrnet/w48_384x288_adam_lr1e-3_2022-07-19-07-56
Namespace(cfg='experiments/coco/hrnet/w48_384x288_adam_lr1e-3.yaml', opts=['TEST.MODEL_FILE', 'models/pytorch/pose_hrnet_w48_384x288.pth', 'TEST.USE_GT_BBOX', 'False'], modelDir='', logDir='', dataDir='', prevModelDir='')
AUTO_RESUME: True
CUDNN:
  BENCHMARK: True
  DETERMINISTIC: False
  ENABLED: True
DATASET:
  COLOR_RGB: True
  DATASET: coco
  DATA_FORMAT: jpg
  FLIP: True
  HYBRID_JOINTS_TYPE: 
  NUM_JOINTS_HALF_BODY: 8
  PROB_HALF_BODY: 0.3
  ROOT: data/coco/
  ROT_FACTOR: 45
  SCALE_FACTOR: 0.35
  SELECT_DATA: False
  TEST_SET: val2017
  TRAIN_SET: train2017
DATA_DIR: 
DEBUG:
  DEBUG: True
  SAVE_BATCH_IMAGES_GT: True
  SAVE_BATCH_IMAGES_PRED: True
  SAVE_HEATMAPS_GT: True
  SAVE_HEATMAPS_PRED: True
GPUS: (0, 1, 2, 3)
LOG_DIR: log
LOSS:
  TOPK: 8
  USE_DIFFERENT_JOINTS_WEIGHT: False
  USE_OHKM: False
  USE_TARGET_WEIGHT: True
MODEL:
  EXTRA:
    FINAL_CONV_KERNEL: 1
    PRETRAINED_LAYERS: ['conv1', 'bn1', 'conv2', 'bn2', 'layer1', 'transition1', 'stage2', 'transition2', 'stage3', 'transition3', 'stage4']
    STAGE2:
      BLOCK: BASIC
      FUSE_METHOD: SUM
      NUM_BLOCKS: [4, 4]
      NUM_BRANCHES: 2
      NUM_CHANNELS: [48, 96]
      NUM_MODULES: 1
    STAGE3:
      BLOCK: BASIC
      FUSE_METHOD: SUM
      NUM_BLOCKS: [4, 4, 4]
      NUM_BRANCHES: 3
      NUM_CHANNELS: [48, 96, 192]
      NUM_MODULES: 4
    STAGE4:
      BLOCK: BASIC
      FUSE_METHOD: SUM
      NUM_BLOCKS: [4, 4, 4, 4]
      NUM_BRANCHES: 4
      NUM_CHANNELS: [48, 96, 192, 384]
      NUM_MODULES: 3
  HEATMAP_SIZE: [72, 96]
  IMAGE_SIZE: [288, 384]
  INIT_WEIGHTS: True
  NAME: pose_hrnet
  NUM_JOINTS: 17
  PRETRAINED: models/pytorch/imagenet/hrnet_w48-8ef0771d.pth
  SIGMA: 3
  TAG_PER_JOINT: True
  TARGET_TYPE: gaussian
OUTPUT_DIR: output
PIN_MEMORY: True
PRINT_FREQ: 100
RANK: 0
TEST:
  BATCH_SIZE_PER_GPU: 24
  BBOX_THRE: 1.0
  COCO_BBOX_FILE: data/coco/person_detection_results/COCO_val2017_detections_AP_H_56_person.json
  FLIP_TEST: True
  IMAGE_THRE: 0.0
  IN_VIS_THRE: 0.2
  MODEL_FILE: models/pytorch/pose_hrnet_w48_384x288.pth
  NMS_THRE: 1.0
  OKS_THRE: 0.9
  POST_PROCESS: True
  SHIFT_HEATMAP: True
  SOFT_NMS: False
  USE_GT_BBOX: False
TRAIN:
  BATCH_SIZE_PER_GPU: 24
  BEGIN_EPOCH: 0
  CHECKPOINT: 
  END_EPOCH: 210
  GAMMA1: 0.99
  GAMMA2: 0.0
  LR: 0.001
  LR_FACTOR: 0.1
  LR_STEP: [170, 200]
  MOMENTUM: 0.9
  NESTEROV: False
  OPTIMIZER: adam
  RESUME: False
  SHUFFLE: True
  WD: 0.0001
WORKERS: 24
=> loading model from models/pytorch/pose_hrnet_w48_384x288.pth
loading annotations into memory...
Done (t=0.12s)
creating index...
index created!
=> classes: ['__background__', 'person']
=> num_images: 5000
=> Total boxes: 104125
=> Total boxes after fliter low score@0.0: 104125
=> load 104125 samples
Traceback (most recent call last):
  File "/mnt/e/hi_5/deep-high-resolution-net.pytorch/tools/test.py", line 130, in <module>
    main()
  File "/mnt/e/hi_5/deep-high-resolution-net.pytorch/tools/test.py", line 125, in main
    validate(cfg, valid_loader, valid_dataset, model, criterion,
  File "/mnt/e/hi_5/deep-high-resolution-net.pytorch/tools/../lib/core/function.py", line 118, in validate
    for i, (input, target, target_weight, meta) in enumerate(val_loader):
  File "/home/roc-hci/anaconda3/envs/hi5/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 652, in __next__
    data = self._next_data()
  File "/home/roc-hci/anaconda3/envs/hi5/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1347, in _next_data
    return self._process_data(data)
  File "/home/roc-hci/anaconda3/envs/hi5/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1373, in _process_data
    data.reraise()
  File "/home/roc-hci/anaconda3/envs/hi5/lib/python3.10/site-packages/torch/_utils.py", line 461, in reraise
    raise exception
RuntimeError: Caught RuntimeError in pin memory thread for device 0.
Original Traceback (most recent call last):
  File "/home/roc-hci/anaconda3/envs/hi5/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 34, in _pin_memory_loop
    data = pin_memory(data, device)
  File "/home/roc-hci/anaconda3/envs/hi5/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 65, in pin_memory
    return type(data)([pin_memory(sample, device) for sample in data])  # type: ignore[call-arg]
  File "/home/roc-hci/anaconda3/envs/hi5/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 65, in <listcomp>
    return type(data)([pin_memory(sample, device) for sample in data])  # type: ignore[call-arg]
  File "/home/roc-hci/anaconda3/envs/hi5/lib/python3.10/site-packages/torch/utils/data/_utils/pin_memory.py", line 50, in pin_memory
    return data.pin_memory(device)
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

terminate called without an active exception```


### --------------------------------


> nvidia-smi
```Tue Jul 19 07:54:49 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.57       Driver Version: 516.59       CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:0A:00.0  On |                  N/A |
|  0%   49C    P8    24W / 370W |    683MiB / 24576MiB |      2%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  On   | 00000000:0B:00.0 Off |                  N/A |
|  0%   39C    P8    14W / 370W |      0MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant