Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory #8

Closed
vaslamp opened this issue Jul 7, 2020 · 2 comments

Comments

@vaslamp
Copy link

vaslamp commented Jul 7, 2020

Can you please interpret me the following error? Is it a problem with CUDA version? I am not that much experienced and I would like to know so that I can solve it and continue.

�[33mWARNING:�[0m NVIDIA binaries may not be bound with --writable
�[32m[0706 13:49:52 @voc.py:279]�[0m Register dataset ['VOC2007/instances_trainval', 'VOC2007/instances_test', 'VOC2012/instances_trainval']
�[32m[0706 13:49:52 @coco.py:271]�[0m Register dataset ['VOC2007/instances_trainval', 'VOC2007/instances_test', 'VOC2012/instances_trainval', 'train2017', 'val2017', 'coco_train2017', 'coco_val2017', 'coco_train2014', 'coco_val2014', 'coco_valminusminival2014', 'coco_minival2014', 'coco_val2017_100']
�[32m[0706 13:49:52 @coco.py:205]�[0m Register dataset ['VOC2007/instances_trainval', 'VOC2007/instances_test', 'VOC2012/instances_trainval', 'train2017', 'val2017', 'coco_train2017', 'coco_val2017', 'coco_train2014', 'coco_val2014', 'coco_valminusminival2014', 'coco_minival2014', 'coco_val2017_100', 'coco_train2017.1@1', 'coco_train2017.1@1-unlabeled', 'coco_train2017.1@2', 'coco_train2017.1@2-unlabeled', 'coco_train2017.1@5', 'coco_train2017.1@5-unlabeled', 'coco_train2017.1@10', 'coco_train2017.1@10-unlabeled', 'coco_train2017.1@20', 'coco_train2017.1@20-unlabeled', 'coco_train2017.1@30', 'coco_train2017.1@30-unlabeled', 'coco_train2017.1@40', 'coco_train2017.1@40-unlabeled', 'coco_train2017.1@50', 'coco_train2017.1@50-unlabeled', 'coco_train2017.2@1', 'coco_train2017.2@1-unlabeled', 'coco_train2017.2@2', 'coco_train2017.2@2-unlabeled', 'coco_train2017.2@5', 'coco_train2017.2@5-unlabeled', 'coco_train2017.2@10', 'coco_train2017.2@10-unlabeled', 'coco_train2017.2@20', 'coco_train2017.2@20-unlabeled', 'coco_train2017.2@30', 'coco_train2017.2@30-unlabeled', 'coco_train2017.2@40', 'coco_train2017.2@40-unlabeled', 'coco_train2017.2@50', 'coco_train2017.2@50-unlabeled', 'coco_train2017.3@1', 'coco_train2017.3@1-unlabeled', 'coco_train2017.3@2', 'coco_train2017.3@2-unlabeled', 'coco_train2017.3@5', 'coco_train2017.3@5-unlabeled', 'coco_train2017.3@10', 'coco_train2017.3@10-unlabeled', 'coco_train2017.3@20', 'coco_train2017.3@20-unlabeled', 'coco_train2017.3@30', 'coco_train2017.3@30-unlabeled', 'coco_train2017.3@40', 'coco_train2017.3@40-unlabeled', 'coco_train2017.3@50', 'coco_train2017.3@50-unlabeled', 'coco_train2017.4@1', 'coco_train2017.4@1-unlabeled', 'coco_train2017.4@2', 'coco_train2017.4@2-unlabeled', 'coco_train2017.4@5', 'coco_train2017.4@5-unlabeled', 'coco_train2017.4@10', 'coco_train2017.4@10-unlabeled', 'coco_train2017.4@20', 'coco_train2017.4@20-unlabeled', 'coco_train2017.4@30', 'coco_train2017.4@30-unlabeled', 'coco_train2017.4@40', 'coco_train2017.4@40-unlabeled', 'coco_train2017.4@50', 'coco_train2017.4@50-unlabeled', 'coco_train2017.5@1', 'coco_train2017.5@1-unlabeled', 'coco_train2017.5@2', 'coco_train2017.5@2-unlabeled', 'coco_train2017.5@5', 'coco_train2017.5@5-unlabeled', 'coco_train2017.5@10', 'coco_train2017.5@10-unlabeled', 'coco_train2017.5@20', 'coco_train2017.5@20-unlabeled', 'coco_train2017.5@30', 'coco_train2017.5@30-unlabeled', 'coco_train2017.5@40', 'coco_train2017.5@40-unlabeled', 'coco_train2017.5@50', 'coco_train2017.5@50-unlabeled', 'coco_train2017.0@100-extra', 'coco_train2017.0@100-extra-unlabeled', 'coco_unlabeled2017']
�[32m[0706 13:49:52 @coco.py:260]�[0m Register dataset ['VOC2007/instances_trainval', 'VOC2007/instances_test', 'VOC2012/instances_trainval', 'train2017', 'val2017', 'coco_train2017', 'coco_val2017', 'coco_train2014', 'coco_val2014', 'coco_valminusminival2014', 'coco_minival2014', 'coco_val2017_100', 'coco_train2017.1@1', 'coco_train2017.1@1-unlabeled', 'coco_train2017.1@2', 'coco_train2017.1@2-unlabeled', 'coco_train2017.1@5', 'coco_train2017.1@5-unlabeled', 'coco_train2017.1@10', 'coco_train2017.1@10-unlabeled', 'coco_train2017.1@20', 'coco_train2017.1@20-unlabeled', 'coco_train2017.1@30', 'coco_train2017.1@30-unlabeled', 'coco_train2017.1@40', 'coco_train2017.1@40-unlabeled', 'coco_train2017.1@50', 'coco_train2017.1@50-unlabeled', 'coco_train2017.2@1', 'coco_train2017.2@1-unlabeled', 'coco_train2017.2@2', 'coco_train2017.2@2-unlabeled', 'coco_train2017.2@5', 'coco_train2017.2@5-unlabeled', 'coco_train2017.2@10', 'coco_train2017.2@10-unlabeled', 'coco_train2017.2@20', 'coco_train2017.2@20-unlabeled', 'coco_train2017.2@30', 'coco_train2017.2@30-unlabeled', 'coco_train2017.2@40', 'coco_train2017.2@40-unlabeled', 'coco_train2017.2@50', 'coco_train2017.2@50-unlabeled', 'coco_train2017.3@1', 'coco_train2017.3@1-unlabeled', 'coco_train2017.3@2', 'coco_train2017.3@2-unlabeled', 'coco_train2017.3@5', 'coco_train2017.3@5-unlabeled', 'coco_train2017.3@10', 'coco_train2017.3@10-unlabeled', 'coco_train2017.3@20', 'coco_train2017.3@20-unlabeled', 'coco_train2017.3@30', 'coco_train2017.3@30-unlabeled', 'coco_train2017.3@40', 'coco_train2017.3@40-unlabeled', 'coco_train2017.3@50', 'coco_train2017.3@50-unlabeled', 'coco_train2017.4@1', 'coco_train2017.4@1-unlabeled', 'coco_train2017.4@2', 'coco_train2017.4@2-unlabeled', 'coco_train2017.4@5', 'coco_train2017.4@5-unlabeled', 'coco_train2017.4@10', 'coco_train2017.4@10-unlabeled', 'coco_train2017.4@20', 'coco_train2017.4@20-unlabeled', 'coco_train2017.4@30', 'coco_train2017.4@30-unlabeled', 'coco_train2017.4@40', 'coco_train2017.4@40-unlabeled', 'coco_train2017.4@50', 'coco_train2017.4@50-unlabeled', 'coco_train2017.5@1', 'coco_train2017.5@1-unlabeled', 'coco_train2017.5@2', 'coco_train2017.5@2-unlabeled', 'coco_train2017.5@5', 'coco_train2017.5@5-unlabeled', 'coco_train2017.5@10', 'coco_train2017.5@10-unlabeled', 'coco_train2017.5@20', 'coco_train2017.5@20-unlabeled', 'coco_train2017.5@30', 'coco_train2017.5@30-unlabeled', 'coco_train2017.5@40', 'coco_train2017.5@40-unlabeled', 'coco_train2017.5@50', 'coco_train2017.5@50-unlabeled', 'coco_train2017.0@100-extra', 'coco_train2017.0@100-extra-unlabeled', 'coco_unlabeled2017', 'coco_unlabeledtrainval20class']
�[32m[0706 13:49:52 @logger.py:138]�[0m Directory '/home/vlamp/Documents/STAC/RESULTS' backuped to '/home/vlamp/Documents/STAC/RESULTS0706-134952'
�[32m[0706 13:49:52 @logger.py:92]�[0m Argv: /home/vlamp/Documents/STAC/detection/train_stg1_bdd.py --logdir /home/vlamp/Documents/STAC/RESULTS/ --simple_path --config BACKBONE.WEIGHTS=/home/vlamp/Documents/STAC/DATA_STAC/coco/ImageNet-R50-AlignPadding.npz DATA.BASEDIR=/home/vlamp/Documents/STAC/DATA_STAC/coco MODE_MASK=False FRCNN.BATCH_PER_IM=64 PREPROC.TRAIN_SHORT_EDGE_SIZE=[500,800] TRAIN.EVAL_PERIOD=20 TRAIN.AUGTYPE_LAB=default
�[32m[0706 13:49:54 @train_stg1_bdd.py:87]�[0m Environment Information:


sys.platform linux
Python 3.6.9 (default, Apr 18 2020, 01:56:04) [GCC 8.4.0]
Tensorpack v0.10.1-9-g9c1b1b7b-dirty
Numpy 1.16.4
TensorFlow 1.14.0/v1.14.0-rc1-22-gaf24dc91b5
TF Compiler Version 4.8.5
TF CUDA support True
TF MKL support False
TF XLA support False
Nvidia Driver /.singularity.d/libs/libnvidia-ml.so
CUDA /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudart.so.10.1.243
CUDNN /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.4
NCCL
CUDA_VISIBLE_DEVICES 0,1
GPU 0,1 Tesla T4
Free RAM 369.15/376.54 GB
CPU Count 40
cv2 4.2.0
msgpack 1.0.0
python-prctl False


list(_C.DATA.TRAIN) = ['train2017']
list(_C.DATA.VAL) = ('val2017',)
datasets = ['train2017', 'val2017']
_C.DATA.CLASS_NAMES = ['BG', 'car', 'pedestrian', 'big vehicle', 'bicycle', 'motorcycle']
�[32m[0706 13:49:54 @config.py:352]�[0m Config: ------------------------------------------
{'BACKBONE': {'FREEZE_AFFINE': False,
'FREEZE_AT': 2,
'NORM': 'FreezeBN',
'RESNET_NUM_BLOCKS': [3, 4, 6, 3],
'STRIDE_1X1': False,
'TF_PAD_MODE': False,
'WEIGHTS': '/home/vlamp/Documents/STAC/DATA_STAC/coco/ImageNet-R50-AlignPadding.npz'},
'CASCADE': {'BBOX_REG_WEIGHTS': [[10.0, 10.0, 5.0, 5.0], [20.0, 20.0, 10.0, 10.0],
[30.0, 30.0, 15.0, 15.0]],
'IOUS': [0.5, 0.6, 0.7]},
'DATA': {'ABSOLUTE_COORD': True,
'BASEDIR': '/home/vlamp/Documents/STAC/DATA_STAC/coco',
'CLASS_NAMES': ['BG', 'car', 'pedestrian', 'big vehicle', 'bicycle', 'motorcycle'],
'NUM_CATEGORY': 5,
'NUM_WORKERS': 24,
'TRAIN': ('train2017',),
'UNLABEL': ('',),
'VAL': ('val2017',)},
'EVAL': {'PSEUDO_INFERENCE': False},
'FPN': {'ANCHOR_SIZES': (32, 64, 128, 256, 512),
'ANCHOR_STRIDES': (4, 8, 16, 32, 64),
'CASCADE': False,
'FRCNN_CONV_HEAD_DIM': 256,
'FRCNN_FC_HEAD_DIM': 1024,
'FRCNN_HEAD_FUNC': 'fastrcnn_2fc_head',
'MRCNN_HEAD_FUNC': 'maskrcnn_up4conv_head',
'NORM': 'None',
'NUM_CHANNEL': 256,
'PROPOSAL_MODE': 'Level',
'RESOLUTION_REQUIREMENT': 32},
'FRCNN': {'BATCH_PER_IM': 64,
'BBOX_REG_WEIGHTS': [10.0, 10.0, 5.0, 5.0],
'FG_RATIO': 0.25,
'FG_THRESH': 0.5},
'MODE_FPN': True,
'MODE_MASK': False,
'MRCNN': {'ACCURATE_PASTE': True, 'HEAD_DIM': 256},
'PREPROC': {'MAX_SIZE': 1344.0,
'PIXEL_MEAN': [123.675, 116.28, 103.53],
'PIXEL_STD': [58.395, 57.12, 57.375],
'TEST_SHORT_EDGE_SIZE': 800,
'TRAIN_SHORT_EDGE_SIZE': [500, 800]},
'RPN': {'ANCHOR_RATIOS': (0.5, 1.0, 2.0),
'ANCHOR_SIZES': (32, 64, 128, 256, 512),
'ANCHOR_STRIDE': 16,
'BATCH_PER_IM': 256,
'CROWD_OVERLAP_THRESH': 9.99,
'FG_RATIO': 0.5,
'HEAD_DIM': 1024,
'MIN_SIZE': 0,
'NEGATIVE_ANCHOR_THRESH': 0.3,
'NUM_ANCHOR': 15,
'POSITIVE_ANCHOR_THRESH': 0.7,
'PROPOSAL_NMS_THRESH': 0.7,
'TEST_PER_LEVEL_NMS_TOPK': 1000,
'TEST_POST_NMS_TOPK': 1000,
'TEST_PRE_NMS_TOPK': 6000,
'TRAIN_PER_LEVEL_NMS_TOPK': 2000,
'TRAIN_POST_NMS_TOPK': 2000,
'TRAIN_PRE_NMS_TOPK': 12000},
'TEST': {'FRCNN_NMS_THRESH': 0.5,
'RESULTS_PER_IM': 100,
'RESULT_SCORE_THRESH': 0.05,
'RESULT_SCORE_THRESH_VIS': 0.5},
'TRAIN': {'AUGTYPE': 'strong',
'AUGTYPE_LAB': 'default',
'BASE_LR': 0.01,
'CHECKPOINT_PERIOD': 20,
'CONFIDENCE': 0.9,
'EVAL_PERIOD': 20,
'GAMMA': 0.1,
'LR_SCHEDULE': [120000, 160000, 180000],
'NO_PRN_LOSS': False,
'NUM_GPUS': 2,
'STAGE': 1,
'STARTING_EPOCH': 1,
'STEPS_PER_EPOCH': 500,
'WARMUP': 1000,
'WARMUP_INIT_LR': 0.0033000000000000004,
'WEIGHT_DECAY': 0.0001,
'WU': 2.0},
'TRAINER': 'replicated'}
�[32m[0706 13:49:54 @train_stg1_bdd.py:106]�[0m Warm Up Schedule (steps, value): [(0, 0.0033000000000000004), (1000, 0.01)]
�[32m[0706 13:49:54 @train_stg1_bdd.py:107]�[0m LR Schedule (epochs, value): [(2, 0.01), (960.0, 0.001), (1280.0, 0.00010000000000000002)]
loading annotations into memory...
Done (t=5.18s)
creating index...
index created!
�[32m[0706 13:49:59 @coco.py:60]�[0m Instances loaded from /home/vlamp/Documents/STAC/DATA_STAC/coco/annotations/instances_train2017.json.

0%| | 0/69403 [00:00<?, ?it/s]
3%|3 | 2090/69403 [00:00<00:03, 20895.19it/s]
6%|5 | 4034/69403 [00:00<00:03, 20434.79it/s]
9%|8 | 6073/69403 [00:00<00:03, 20416.41it/s]
12%|#1 | 8201/69403 [00:00<00:02, 20666.09it/s]
15%|#4 | 10336/69403 [00:00<00:02, 20866.20it/s]
18%|#7 | 12465/69403 [00:00<00:02, 20991.31it/s]
21%|##1 | 14620/69403 [00:00<00:02, 21155.12it/s]
24%|##4 | 16775/69403 [00:00<00:02, 21271.79it/s]
27%|##7 | 18896/69403 [00:00<00:02, 21253.07it/s]
30%|### | 21042/69403 [00:01<00:02, 21313.93it/s]
33%|###3 | 23115/69403 [00:01<00:02, 21052.23it/s]
36%|###6 | 25181/69403 [00:01<00:02, 20796.20it/s]
39%|###9 | 27234/69403 [00:01<00:02, 20696.98it/s]
42%|####2 | 29285/69403 [00:01<00:01, 20509.34it/s]
45%|####5 | 31323/69403 [00:01<00:01, 20425.01it/s]
48%|####8 | 33357/69403 [00:01<00:01, 20302.50it/s]
51%|##### | 35382/69403 [00:01<00:01, 20251.87it/s]
54%|#####3 | 37403/69403 [00:01<00:01, 20201.65it/s]
57%|#####6 | 39488/69403 [00:01<00:01, 20390.27it/s]
60%|#####9 | 41550/69403 [00:02<00:01, 20456.26it/s]
63%|######2 | 43660/69403 [00:02<00:01, 20643.18it/s]
66%|######5 | 45767/69403 [00:02<00:01, 20768.95it/s]
69%|######8 | 47887/69403 [00:02<00:01, 20894.81it/s]
72%|#######2 | 50002/69403 [00:02<00:00, 20968.20it/s]
75%|#######5 | 52146/69403 [00:02<00:00, 21105.63it/s]
78%|#######8 | 54280/69403 [00:02<00:00, 21174.64it/s]
81%|########1 | 56406/69403 [00:02<00:00, 21198.35it/s]
84%|########4 | 58537/69403 [00:02<00:00, 21230.58it/s]
87%|########7 | 60701/69403 [00:02<00:00, 21351.07it/s]
91%|######### | 62872/69403 [00:03<00:00, 21456.21it/s]
94%|#########3| 65018/69403 [00:03<00:00, 21151.33it/s]
97%|#########6| 67169/69403 [00:03<00:00, 21256.36it/s]
100%|#########9| 69342/69403 [00:03<00:00, 21396.14it/s]
100%|##########| 69403/69403 [00:03<00:00, 20915.84it/s]�[32m[0706 13:50:03 @timer.py:45]�[0m Load annotations for instances_train2017.json finished, time:3.3659 sec.
�[32m[0706 13:50:05 @data.py:79]�[0m Ground-Truth category distribution:
�[36m| class | #box | class | #box | class | #box |
|:-------:|:-------|:----------:|:-------|:-----------:|:-------|
| car | 713210 | pedestrian | 91349 | big vehicle | 41643 |
| bicycle | 7210 | motorcycle | 3002 | | |
| total | 856414 | | | | |�[0m
�[32m[0706 13:50:05 @data.py:416]�[0m Filtered 0 images which contain no non-crowd groudtruth boxes. Total #images for training: 69403
�[32m[0706 13:50:05 @augmentation.py:171]�[0m ----------------------------------------------------------------------------------------------------
�[32m[0706 13:50:05 @augmentation.py:172]�[0m Augmentation type default: []
�[32m[0706 13:50:05 @augmentation.py:173]�[0m ----------------------------------------------------------------------------------------------------
�[32m[0706 13:50:05 @data.py:107]�[0m Use affine-enabled TrainingDataPreprocessor_aug
�[32m[0706 13:50:05 @train_stg1_bdd.py:112]�[0m Total passes of the training set is: 20.748
�[32m[0706 13:50:05 @sessinit.py:294]�[0m Loading dictionary from /home/vlamp/Documents/STAC/DATA_STAC/coco/ImageNet-R50-AlignPadding.npz ...
�[32m[0706 13:50:06 @training.py:48]�[0m [DataParallel] Training a model of 2 towers.
�[32m[0706 13:50:06 @interface.py:41]�[0m Automatically applying StagingInput on the DataFlow.
�[32m[0706 13:50:06 @input_source.py:221]�[0m Setting up the queue 'QueueInput/input_queue' for CPU prefetching ...
�[32m[0706 13:50:06 @training.py:108]�[0m Building graph for training tower 0 on device /gpu:0 ...
�[32m[0706 13:50:06 @argtools.py:138]�[0m �[5m�[31mWRN�[0m Some BatchNorm layer uses moving_mean/moving_variance in training.
�[32m[0706 13:50:06 @registry.py:90]�[0m 'conv0': [1, 3, ?, ?] --> [1, 64, ?, ?]
�[32m[0706 13:50:06 @registry.py:90]�[0m 'pool0': [1, 64, ?, ?] --> [1, 64, ?, ?]
�[32m[0706 13:50:06 @registry.py:90]�[0m 'group0/block0/conv1': [1, 64, ?, ?] --> [1, 64, ?, ?]
�[32m[0706 13:50:06 @registry.py:90]�[0m 'group0/block0/conv2': [1, 64, ?, ?] --> [1, 64, ?, ?]
�[32m[0706 13:50:06 @registry.py:90]�[0m 'group0/block0/conv3': [1, 64, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:06 @registry.py:90]�[0m 'group0/block0/convshortcut': [1, 64, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:06 @registry.py:90]�[0m 'group0/block1/conv1': [1, 256, ?, ?] --> [1, 64, ?, ?]
�[32m[0706 13:50:06 @registry.py:90]�[0m 'group0/block1/conv2': [1, 64, ?, ?] --> [1, 64, ?, ?]
�[32m[0706 13:50:06 @registry.py:90]�[0m 'group0/block1/conv3': [1, 64, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:06 @registry.py:90]�[0m 'group0/block2/conv1': [1, 256, ?, ?] --> [1, 64, ?, ?]
�[32m[0706 13:50:06 @registry.py:90]�[0m 'group0/block2/conv2': [1, 64, ?, ?] --> [1, 64, ?, ?]
�[32m[0706 13:50:06 @registry.py:90]�[0m 'group0/block2/conv3': [1, 64, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:06 @registry.py:90]�[0m 'group1/block0/conv1': [1, 256, ?, ?] --> [1, 128, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group1/block0/conv2': [1, 128, ?, ?] --> [1, 128, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group1/block0/conv3': [1, 128, ?, ?] --> [1, 512, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group1/block0/convshortcut': [1, 256, ?, ?] --> [1, 512, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group1/block1/conv1': [1, 512, ?, ?] --> [1, 128, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group1/block1/conv2': [1, 128, ?, ?] --> [1, 128, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group1/block1/conv3': [1, 128, ?, ?] --> [1, 512, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group1/block2/conv1': [1, 512, ?, ?] --> [1, 128, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group1/block2/conv2': [1, 128, ?, ?] --> [1, 128, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group1/block2/conv3': [1, 128, ?, ?] --> [1, 512, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group1/block3/conv1': [1, 512, ?, ?] --> [1, 128, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group1/block3/conv2': [1, 128, ?, ?] --> [1, 128, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group1/block3/conv3': [1, 128, ?, ?] --> [1, 512, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block0/conv1': [1, 512, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block0/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block0/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block0/convshortcut': [1, 512, ?, ?] --> [1, 1024, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block1/conv1': [1, 1024, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block1/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block1/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block2/conv1': [1, 1024, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block2/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block2/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block3/conv1': [1, 1024, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block3/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block3/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block4/conv1': [1, 1024, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block4/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block4/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block5/conv1': [1, 1024, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block5/conv2': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group2/block5/conv3': [1, 256, ?, ?] --> [1, 1024, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group3/block0/conv1': [1, 1024, ?, ?] --> [1, 512, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group3/block0/conv2': [1, 512, ?, ?] --> [1, 512, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group3/block0/conv3': [1, 512, ?, ?] --> [1, 2048, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group3/block0/convshortcut': [1, 1024, ?, ?] --> [1, 2048, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group3/block1/conv1': [1, 2048, ?, ?] --> [1, 512, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group3/block1/conv2': [1, 512, ?, ?] --> [1, 512, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group3/block1/conv3': [1, 512, ?, ?] --> [1, 2048, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group3/block2/conv1': [1, 2048, ?, ?] --> [1, 512, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group3/block2/conv2': [1, 512, ?, ?] --> [1, 512, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'group3/block2/conv3': [1, 512, ?, ?] --> [1, 2048, ?, ?]
�[32m[0706 13:50:07 @registry.py:80]�[0m 'fpn' input: [1, 256, ?, ?], [1, 512, ?, ?], [1, 1024, ?, ?], [1, 2048, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'fpn/lateral_1x1_c2': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'fpn/lateral_1x1_c3': [1, 512, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'fpn/lateral_1x1_c4': [1, 1024, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'fpn/lateral_1x1_c5': [1, 2048, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'fpn/upsample_lat5': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:07 @registry.py:90]�[0m 'fpn/upsample_lat4': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:08 @registry.py:90]�[0m 'fpn/upsample_lat3': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:08 @registry.py:90]�[0m 'fpn/posthoc_3x3_p2': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:08 @registry.py:90]�[0m 'fpn/posthoc_3x3_p3': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:08 @registry.py:90]�[0m 'fpn/posthoc_3x3_p4': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:08 @registry.py:90]�[0m 'fpn/posthoc_3x3_p5': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:08 @registry.py:90]�[0m 'fpn/maxpool_p6': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:08 @registry.py:93]�[0m 'fpn' output: [1, 256, ?, ?], [1, 256, ?, ?], [1, 256, ?, ?], [1, 256, ?, ?], [1, 256, ?, ?]
�[32m[0706 13:50:08 @registry.py:80]�[0m 'rpn' input: [1, 256, ?, ?]
�[32m[0706 13:50:08 @registry.py:90]�[0m 'rpn/conv0': [1, 256, ?, ?] --> [1, 256, ?, ?]
�[32m[0706 13:50:08 @registry.py:90]�[0m 'rpn/class': [1, 256, ?, ?] --> [1, 3, ?, ?]
�[32m[0706 13:50:08 @registry.py:90]�[0m 'rpn/box': [1, 256, ?, ?] --> [1, 12, ?, ?]
�[32m[0706 13:50:08 @registry.py:93]�[0m 'rpn' output: [?, ?, 3], [?, ?, 3, 4]
�[32m[0706 13:50:09 @registry.py:80]�[0m 'fastrcnn' input: [?, 256, 7, 7]
�[32m[0706 13:50:10 @registry.py:90]�[0m 'fastrcnn/fc6': [?, 256, 7, 7] --> [?, 1024]
�[32m[0706 13:50:10 @registry.py:90]�[0m 'fastrcnn/fc7': [?, 1024] --> [?, 1024]
�[32m[0706 13:50:10 @registry.py:93]�[0m 'fastrcnn' output: [?, 1024]
�[32m[0706 13:50:10 @registry.py:80]�[0m 'fastrcnn/outputs' input: [?, 1024]
�[32m[0706 13:50:10 @registry.py:90]�[0m 'fastrcnn/outputs/class': [?, 1024] --> [?, 6]
�[32m[0706 13:50:10 @registry.py:90]�[0m 'fastrcnn/outputs/box': [?, 1024] --> [?, 24]
�[32m[0706 13:50:10 @registry.py:93]�[0m 'fastrcnn/outputs' output: [?, 6], [?, 6, 4]
�[32m[0706 13:50:10 @regularize.py:97]�[0m regularize_cost() found 57 variables to regularize.
�[32m[0706 13:50:10 @regularize.py:21]�[0m The following tensors will be regularized: group1/block0/conv1/W:0, group1/block0/conv2/W:0, group1/block0/conv3/W:0, group1/block0/convshortcut/W:0, group1/block1/conv1/W:0, group1/block1/conv2/W:0, group1/block1/conv3/W:0, group1/block2/conv1/W:0, group1/block2/conv2/W:0, group1/block2/conv3/W:0, group1/block3/conv1/W:0, group1/block3/conv2/W:0, group1/block3/conv3/W:0, group2/block0/conv1/W:0, group2/block0/conv2/W:0, group2/block0/conv3/W:0, group2/block0/convshortcut/W:0, group2/block1/conv1/W:0, group2/block1/conv2/W:0, group2/block1/conv3/W:0, group2/block2/conv1/W:0, group2/block2/conv2/W:0, group2/block2/conv3/W:0, group2/block3/conv1/W:0, group2/block3/conv2/W:0, group2/block3/conv3/W:0, group2/block4/conv1/W:0, group2/block4/conv2/W:0, group2/block4/conv3/W:0, group2/block5/conv1/W:0, group2/block5/conv2/W:0, group2/block5/conv3/W:0, group3/block0/conv1/W:0, group3/block0/conv2/W:0, group3/block0/conv3/W:0, group3/block0/convshortcut/W:0, group3/block1/conv1/W:0, group3/block1/conv2/W:0, group3/block1/conv3/W:0, group3/block2/conv1/W:0, group3/block2/conv2/W:0, group3/block2/conv3/W:0, fpn/lateral_1x1_c2/W:0, fpn/lateral_1x1_c3/W:0, fpn/lateral_1x1_c4/W:0, fpn/lateral_1x1_c5/W:0, fpn/posthoc_3x3_p2/W:0, fpn/posthoc_3x3_p3/W:0, fpn/posthoc_3x3_p4/W:0, fpn/posthoc_3x3_p5/W:0, rpn/conv0/W:0, rpn/class/W:0, rpn/box/W:0, fastrcnn/fc6/W:0, fastrcnn/fc7/W:0, fastrcnn/outputs/class/W:0, fastrcnn/outputs/box/W:0
�[32m[0706 13:50:12 @training.py:108]�[0m Building graph for training tower 1 on device /gpu:1 ...
�[32m[0706 13:50:14 @regularize.py:97]�[0m regularize_cost() found 57 variables to regularize.
�[32m[0706 13:50:16 @collection.py:152]�[0m Size of these collections were changed in tower1: (tf.GraphKeys.MODEL_VARIABLES: 161->194)
�[32m[0706 13:50:16 @collection.py:165]�[0m These collections were modified but restored in tower1: (tf.GraphKeys.SUMMARIES: 76->77)
�[32m[0706 13:50:20 @training.py:350]�[0m 'sync_variables_from_main_tower' includes 607 operations.
�[32m[0706 13:50:20 @model_utils.py:67]�[0m �[36mList of Trainable Variables:
�[0mname shape #elements


group1/block0/conv1/W [1, 1, 256, 128] 32768
group1/block0/conv1/bn/gamma [128] 128
group1/block0/conv1/bn/beta [128] 128
group1/block0/conv2/W [3, 3, 128, 128] 147456
group1/block0/conv2/bn/gamma [128] 128
group1/block0/conv2/bn/beta [128] 128
group1/block0/conv3/W [1, 1, 128, 512] 65536
group1/block0/conv3/bn/gamma [512] 512
group1/block0/conv3/bn/beta [512] 512
group1/block0/convshortcut/W [1, 1, 256, 512] 131072
group1/block0/convshortcut/bn/gamma [512] 512
group1/block0/convshortcut/bn/beta [512] 512
group1/block1/conv1/W [1, 1, 512, 128] 65536
group1/block1/conv1/bn/gamma [128] 128
group1/block1/conv1/bn/beta [128] 128
group1/block1/conv2/W [3, 3, 128, 128] 147456
group1/block1/conv2/bn/gamma [128] 128
group1/block1/conv2/bn/beta [128] 128
group1/block1/conv3/W [1, 1, 128, 512] 65536
group1/block1/conv3/bn/gamma [512] 512
group1/block1/conv3/bn/beta [512] 512
group1/block2/conv1/W [1, 1, 512, 128] 65536
group1/block2/conv1/bn/gamma [128] 128
group1/block2/conv1/bn/beta [128] 128
group1/block2/conv2/W [3, 3, 128, 128] 147456
group1/block2/conv2/bn/gamma [128] 128
group1/block2/conv2/bn/beta [128] 128
group1/block2/conv3/W [1, 1, 128, 512] 65536
group1/block2/conv3/bn/gamma [512] 512
group1/block2/conv3/bn/beta [512] 512
group1/block3/conv1/W [1, 1, 512, 128] 65536
group1/block3/conv1/bn/gamma [128] 128
group1/block3/conv1/bn/beta [128] 128
group1/block3/conv2/W [3, 3, 128, 128] 147456
group1/block3/conv2/bn/gamma [128] 128
group1/block3/conv2/bn/beta [128] 128
group1/block3/conv3/W [1, 1, 128, 512] 65536
group1/block3/conv3/bn/gamma [512] 512
group1/block3/conv3/bn/beta [512] 512
group2/block0/conv1/W [1, 1, 512, 256] 131072
group2/block0/conv1/bn/gamma [256] 256
group2/block0/conv1/bn/beta [256] 256
group2/block0/conv2/W [3, 3, 256, 256] 589824
group2/block0/conv2/bn/gamma [256] 256
group2/block0/conv2/bn/beta [256] 256
group2/block0/conv3/W [1, 1, 256, 1024] 262144
group2/block0/conv3/bn/gamma [1024] 1024
group2/block0/conv3/bn/beta [1024] 1024
group2/block0/convshortcut/W [1, 1, 512, 1024] 524288
group2/block0/convshortcut/bn/gamma [1024] 1024
group2/block0/convshortcut/bn/beta [1024] 1024
group2/block1/conv1/W [1, 1, 1024, 256] 262144
group2/block1/conv1/bn/gamma [256] 256
group2/block1/conv1/bn/beta [256] 256
group2/block1/conv2/W [3, 3, 256, 256] 589824
group2/block1/conv2/bn/gamma [256] 256
group2/block1/conv2/bn/beta [256] 256
group2/block1/conv3/W [1, 1, 256, 1024] 262144
group2/block1/conv3/bn/gamma [1024] 1024
group2/block1/conv3/bn/beta [1024] 1024
group2/block2/conv1/W [1, 1, 1024, 256] 262144
group2/block2/conv1/bn/gamma [256] 256
group2/block2/conv1/bn/beta [256] 256
group2/block2/conv2/W [3, 3, 256, 256] 589824
group2/block2/conv2/bn/gamma [256] 256
group2/block2/conv2/bn/beta [256] 256
group2/block2/conv3/W [1, 1, 256, 1024] 262144
group2/block2/conv3/bn/gamma [1024] 1024
group2/block2/conv3/bn/beta [1024] 1024
group2/block3/conv1/W [1, 1, 1024, 256] 262144
group2/block3/conv1/bn/gamma [256] 256
group2/block3/conv1/bn/beta [256] 256
group2/block3/conv2/W [3, 3, 256, 256] 589824
group2/block3/conv2/bn/gamma [256] 256
group2/block3/conv2/bn/beta [256] 256
group2/block3/conv3/W [1, 1, 256, 1024] 262144
group2/block3/conv3/bn/gamma [1024] 1024
group2/block3/conv3/bn/beta [1024] 1024
group2/block4/conv1/W [1, 1, 1024, 256] 262144
group2/block4/conv1/bn/gamma [256] 256
group2/block4/conv1/bn/beta [256] 256
group2/block4/conv2/W [3, 3, 256, 256] 589824
group2/block4/conv2/bn/gamma [256] 256
group2/block4/conv2/bn/beta [256] 256
group2/block4/conv3/W [1, 1, 256, 1024] 262144
group2/block4/conv3/bn/gamma [1024] 1024
group2/block4/conv3/bn/beta [1024] 1024
group2/block5/conv1/W [1, 1, 1024, 256] 262144
group2/block5/conv1/bn/gamma [256] 256
group2/block5/conv1/bn/beta [256] 256
group2/block5/conv2/W [3, 3, 256, 256] 589824
group2/block5/conv2/bn/gamma [256] 256
group2/block5/conv2/bn/beta [256] 256
group2/block5/conv3/W [1, 1, 256, 1024] 262144
group2/block5/conv3/bn/gamma [1024] 1024
group2/block5/conv3/bn/beta [1024] 1024
group3/block0/conv1/W [1, 1, 1024, 512] 524288
group3/block0/conv1/bn/gamma [512] 512
group3/block0/conv1/bn/beta [512] 512
group3/block0/conv2/W [3, 3, 512, 512] 2359296
group3/block0/conv2/bn/gamma [512] 512
group3/block0/conv2/bn/beta [512] 512
group3/block0/conv3/W [1, 1, 512, 2048] 1048576
group3/block0/conv3/bn/gamma [2048] 2048
group3/block0/conv3/bn/beta [2048] 2048
group3/block0/convshortcut/W [1, 1, 1024, 2048] 2097152
group3/block0/convshortcut/bn/gamma [2048] 2048
group3/block0/convshortcut/bn/beta [2048] 2048
group3/block1/conv1/W [1, 1, 2048, 512] 1048576
group3/block1/conv1/bn/gamma [512] 512
group3/block1/conv1/bn/beta [512] 512
group3/block1/conv2/W [3, 3, 512, 512] 2359296
group3/block1/conv2/bn/gamma [512] 512
group3/block1/conv2/bn/beta [512] 512
group3/block1/conv3/W [1, 1, 512, 2048] 1048576
group3/block1/conv3/bn/gamma [2048] 2048
group3/block1/conv3/bn/beta [2048] 2048
group3/block2/conv1/W [1, 1, 2048, 512] 1048576
group3/block2/conv1/bn/gamma [512] 512
group3/block2/conv1/bn/beta [512] 512
group3/block2/conv2/W [3, 3, 512, 512] 2359296
group3/block2/conv2/bn/gamma [512] 512
group3/block2/conv2/bn/beta [512] 512
group3/block2/conv3/W [1, 1, 512, 2048] 1048576
group3/block2/conv3/bn/gamma [2048] 2048
group3/block2/conv3/bn/beta [2048] 2048
fpn/lateral_1x1_c2/W [1, 1, 256, 256] 65536
fpn/lateral_1x1_c2/b [256] 256
fpn/lateral_1x1_c3/W [1, 1, 512, 256] 131072
fpn/lateral_1x1_c3/b [256] 256
fpn/lateral_1x1_c4/W [1, 1, 1024, 256] 262144
fpn/lateral_1x1_c4/b [256] 256
fpn/lateral_1x1_c5/W [1, 1, 2048, 256] 524288
fpn/lateral_1x1_c5/b [256] 256
fpn/posthoc_3x3_p2/W [3, 3, 256, 256] 589824
fpn/posthoc_3x3_p2/b [256] 256
fpn/posthoc_3x3_p3/W [3, 3, 256, 256] 589824
fpn/posthoc_3x3_p3/b [256] 256
fpn/posthoc_3x3_p4/W [3, 3, 256, 256] 589824
fpn/posthoc_3x3_p4/b [256] 256
fpn/posthoc_3x3_p5/W [3, 3, 256, 256] 589824
fpn/posthoc_3x3_p5/b [256] 256
rpn/conv0/W [3, 3, 256, 256] 589824
rpn/conv0/b [256] 256
rpn/class/W [1, 1, 256, 3] 768
rpn/class/b [3] 3
rpn/box/W [1, 1, 256, 12] 3072
rpn/box/b [12] 12
fastrcnn/fc6/W [12544, 1024] 12845056
fastrcnn/fc6/b [1024] 1024
fastrcnn/fc7/W [1024, 1024] 1048576
fastrcnn/fc7/b [1024] 1024
fastrcnn/outputs/class/W [1024, 6] 6144
fastrcnn/outputs/class/b [6] 6
fastrcnn/outputs/box/W [1024, 24] 24576
fastrcnn/outputs/box/b [24] 24�[36m
Number of trainable variables: 156
Number of parameters (elements): 41147437
Storage space needed for all trainable variables: 156.97MB�[0m
�[32m[0706 13:50:20 @base.py:207]�[0m Setup callbacks graph ...

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_util.py:93: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
�[32m[0706 13:50:27 @argtools.py:138]�[0m �[5m�[31mWRN�[0m "import prctl" failed! Install python-prctl so that processes can be cleaned with guarantee.
�[32m[0706 13:50:29 @prof.py:291]�[0m [HostMemoryTracker] Free RAM in setup_graph() is 364.27 GB.
�[32m[0706 13:50:29 @tower.py:135]�[0m Building graph for predict tower 'tower-pred-0' on device /gpu:0 ...
�[32m[0706 13:50:30 @collection.py:152]�[0m Size of these collections were changed in tower-pred-0: (tf.GraphKeys.MODEL_VARIABLES: 194->227)
�[32m[0706 13:50:30 @collection.py:165]�[0m These collections were modified but restored in tower-pred-0: (tf.GraphKeys.SUMMARIES: 76->77)
�[32m[0706 13:50:30 @tower.py:135]�[0m Building graph for predict tower 'tower-pred-1' on device /gpu:1 with variable scope 'tower1'...
�[32m[0706 13:50:31 @collection.py:152]�[0m Size of these collections were changed in tower-pred-1: (tf.GraphKeys.MODEL_VARIABLES: 227->260)
�[32m[0706 13:50:31 @collection.py:165]�[0m These collections were modified but restored in tower-pred-1: (tf.GraphKeys.SUMMARIES: 76->77)
loading annotations into memory...
Done (t=0.75s)
creating index...
index created!
�[32m[0706 13:50:31 @coco.py:60]�[0m Instances loaded from /home/vlamp/Documents/STAC/DATA_STAC/coco/annotations/instances_val2017.json.

0%| | 0/9921 [00:00<?, ?it/s]
100%|##########| 9921/9921 [00:00<00:00, 725119.19it/s]�[32m[0706 13:50:31 @timer.py:45]�[0m Load annotations for instances_val2017.json finished, time:0.0151 sec.
�[32m[0706 13:50:31 @data.py:456]�[0m Found 9921 images for inference.
loading annotations into memory...
Done (t=0.83s)
creating index...
index created!
�[32m[0706 13:50:32 @coco.py:60]�[0m Instances loaded from /home/vlamp/Documents/STAC/DATA_STAC/coco/annotations/instances_val2017.json.

0%| | 0/9921 [00:00<?, ?it/s]
100%|##########| 9921/9921 [00:00<00:00, 739211.43it/s]�[32m[0706 13:50:32 @timer.py:45]�[0m Load annotations for instances_val2017.json finished, time:0.0150 sec.
�[32m[0706 13:50:32 @data.py:456]�[0m Found 9921 images for inference.
loading annotations into memory...
Done (t=0.82s)
creating index...
index created!
�[32m[0706 13:50:33 @coco.py:60]�[0m Instances loaded from /home/vlamp/Documents/STAC/DATA_STAC/coco/annotations/instances_val2017.json.

0%| | 0/9921 [00:00<?, ?it/s]
100%|##########| 9921/9921 [00:00<00:00, 744062.40it/s]�[32m[0706 13:50:33 @timer.py:45]�[0m Load annotations for instances_val2017.json finished, time:0.0149 sec.
�[32m[0706 13:50:33 @data.py:456]�[0m Found 9921 images for inference.
loading annotations into memory...
Done (t=0.77s)
creating index...
index created!
�[32m[0706 13:50:34 @coco.py:60]�[0m Instances loaded from /home/vlamp/Documents/STAC/DATA_STAC/coco/annotations/instances_val2017.json.

0%| | 0/9921 [00:00<?, ?it/s]
100%|##########| 9921/9921 [00:00<00:00, 713481.88it/s]�[32m[0706 13:50:34 @timer.py:45]�[0m Load annotations for instances_val2017.json finished, time:0.0153 sec.
�[32m[0706 13:50:34 @data.py:456]�[0m Found 9921 images for inference.
�[32m[0706 13:50:34 @summary.py:47]�[0m [MovingAverageSummary] 73 operations in collection 'MOVING_SUMMARY_OPS' will be run with session hooks.
�[32m[0706 13:50:34 @summary.py:94]�[0m Summarizing collection 'summaries' of size 76.
�[32m[0706 13:50:34 @base.py:228]�[0m Creating the session ...
2020-07-06 13:50:34.737615: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-07-06 13:50:34.743032: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-07-06 13:50:34.887781: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x14c78d20 executing computations on platform CUDA. Devices:
2020-07-06 13:50:34.887822: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2020-07-06 13:50:34.887827: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (1): Tesla T4, Compute Capability 7.5
2020-07-06 13:50:34.890055: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2494125000 Hz
2020-07-06 13:50:34.893901: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x14a0c4f0 executing computations on platform Host. Devices:
2020-07-06 13:50:34.893919: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): ,
2020-07-06 13:50:34.896069: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:3b:00.0Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slur
2020-07-06 13:50:34.896771: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 1 with properties:
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:d8:00.0
2020-07-06 13:50:34.897783: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] m/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/
2020-07-06 13:50:34.898069: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slurm/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/
2020-07-06 13:50:34.898242: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slurm/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/
2020-07-06 13:50:34.898401: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slurm/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/
2020-07-06 13:50:34.898538: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slurm/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/
2020-07-06 13:50:34.898705: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/cm/shared/apps/slurm/current/lib64:/cm/shared/apps/slurm/current/lib64/slurm:/.singularity.d/libs:/usr/local/cuda-10.0/lib64/
2020-07-06 13:50:34.901746: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-07-06 13:50:34.901764: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2020-07-06 13:50:34.901834: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-07-06 13:50:34.901840: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0 1
2020-07-06 13:50:34.901845: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N Y
2020-07-06 13:50:34.901848: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 1: Y N

MultiProcessMapDataZMQ successfully cleaned-up.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1356, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1339, in _run_fn
self._extend_graph()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1374, in _extend_graph
tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'NcclAllReduce' used by {{node AllReduceGrads/NcclAllReduce}}with these attrs: [shared_name="c0", T=DT_FLOAT, num_devices=2, reduction="sum"]
Registered devices: [CPU, XLA_CPU, XLA_GPU]
Registered kernels:
device='GPU'

 [[AllReduceGrads/NcclAllReduce]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/vlamp/Documents/STAC/detection/train_stg1_bdd.py", line 180, in
launch_train_with_config(traincfg, trainer)
File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/interface.py", line 99, in launch_train_with_config
extra_callbacks=config.extra_callbacks)
File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/base.py", line 342, in train_with_defaults
steps_per_epoch, starting_epoch, max_epoch)
File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/base.py", line 313, in train
self.initialize(session_creator, session_init)
File "/usr/local/lib/python3.6/dist-packages/tensorpack/utils/argtools.py", line 168, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/tower.py", line 147, in initialize
super(TowerTrainer, self).initialize(session_creator, session_init)
File "/usr/local/lib/python3.6/dist-packages/tensorpack/utils/argtools.py", line 168, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorpack/train/base.py", line 230, in initialize
self.sess = session_creator.create_session()
File "/usr/local/lib/python3.6/dist-packages/tensorpack/tfutils/sesscreate.py", line 88, in create_session
run(tf.global_variables_initializer())
File "/usr/local/lib/python3.6/dist-packages/tensorpack/tfutils/sesscreate.py", line 86, in run
sess.run(op)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 950, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1173, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1350, in _do_run
run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1370, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'NcclAllReduce' used by node AllReduceGrads/NcclAllReduce (defined at usr/local/lib/python3.6/dist-packages/tensorpack/graph_builder/utils.py:154) with these attrs: [shared_name="c0", T=DT_FLOAT, num_devices=2, reduction="sum"]
Registered devices: [CPU, XLA_CPU, XLA_GPU]
Registered kernels:
device='GPU'

 [[AllReduceGrads/NcclAllReduce]]

Errors may have originated from an input operation.
Input Source operations connected to node AllReduceGrads/NcclAllReduce:
tower0/gradients/AddN_126 (defined at usr/local/lib/python3.6/dist-packages/tensorpack/tfutils/optimizer.py:29)
/cm/local/apps/slurm/var/spool/job18434303/slurm_script: line 29: t: command not found

@zizhaozhang
Copy link
Collaborator

It seems the cuda version. Pls check if tensorflow version is 1.14 and cuda is a compatible version.

@sisrfeng
Copy link

sisrfeng commented Dec 23, 2020

I also encounter this:

2020-12-23 10:18:39.085280: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: can
not open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/cuda/lib64:/usr/local/nvidia/lib:/usr/loc
al/nvidia/lib64

But
image

In https://www.tensorflow.org/install/gpu
中文官网推荐CUDA 10.1
image
英文官网推荐CUDA 11
image
(中文官网滞后于英文?)
tensorflow >1.13 should goes right 10.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants