Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is your environment for testing your model? #64

Closed
lxtGH opened this issue Oct 7, 2021 · 11 comments
Closed

What is your environment for testing your model? #64

lxtGH opened this issue Oct 7, 2021 · 11 comments

Comments

@lxtGH
Copy link

lxtGH commented Oct 7, 2021

Dear authors:
Hi! Thanks for opensourcing this repo.
I meet several problems for runing this repo.

I got stuck when performing evaluation process according to the issue. #58
image

I make such change to run you model ().
9052315f1a86bd8764c659f0fc32726
I download the MotionDeeplab ckpt from this repo. I perform the evaluation process.
but the results are nearly zero.
image

Has anyone successfully run this repo????

I doubt it maybe enviroment problems.

I use RTX-3090 with tf2.5 cuda 11.1.

@markweberdev
Copy link
Collaborator

Dear @lxtGH,

My setup is CUDA 11.2.2 with tf 2.5.0. I don't need to do any changes. I get the same layout error, but everything still runs fine.

The following example is the Motion-DeepLab trained on KITTI checkpoint that we provide evaluated on KITTI:

python trainer/train.py --config_file="./configs/kitti/motion_deeplab/resnet50_os32.textproto" --num_gpus=1 --mode=eval --model_dir="/**retracted**/models/deeplab2/kitti/"

I don't have access to a RTX-3090, so I am unable to verify that It runs there too.

2021-10-08 09:12:02.213287: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
I1008 09:12:05.060556 139644910475072 train.py:65] Reading the config file.
I1008 09:12:05.065931 139644910475072 train.py:69] Starting the experiment.
2021-10-08 09:12:05.068325: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-10-08 09:12:05.160309: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:83:00.0 name: NVIDIA TITAN X (Pascal) computeCapability: 6.1
coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 447.48GiB/s
2021-10-08 09:12:05.160363: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-08 09:12:05.346183: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-10-08 09:12:05.346345: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-10-08 09:12:05.393284: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-10-08 09:12:05.518453: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-10-08 09:12:05.607084: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.11
2021-10-08 09:12:05.683005: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-10-08 09:12:05.686791: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-10-08 09:12:05.693328: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-10-08 09:12:05.694196: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-10-08 09:12:05.699542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:83:00.0 name: NVIDIA TITAN X (Pascal) computeCapability: 6.1
coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 11.91GiB deviceMemoryBandwidth: 447.48GiB/s
2021-10-08 09:12:05.706127: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-10-08 09:12:05.706191: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-10-08 09:12:06.506011: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-10-08 09:12:06.506057: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2021-10-08 09:12:06.506064: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2021-10-08 09:12:06.513654: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11436 MB memory) -> physical GPU (device: 0, name: NVIDIA TITAN X (Pascal), pci bus id: 0000:83:00.0, compute capability: 6.1)
I1008 09:12:06.516476 139644910475072 train_lib.py:105] Using strategy <class 'tensorflow.python.distribute.one_device_strategy.OneDeviceStrategy'> with 1 replicas
I1008 09:12:06.715466 139644910475072 motion_deeplab.py:53] Synchronized Batchnorm is used.
I1008 09:12:06.718576 139644910475072 axial_resnet_instances.py:144] Axial-ResNet final config: {'num_blocks': [3, 4, 6, 3], 'backbone_layer_multiplier': 1.0, 'width_multiplier': 1.0, 'stem_width_multiplier': 1.0, 'output_stride': 32, 'classification_mode': True, 'backbone_type': 'resnet', 'use_axial_beyond_stride': 0, 'backbone_use_transformer_beyond_stride': 0, 'extra_decoder_use_transformer_beyond_stride': 32, 'backbone_decoder_num_stacks': 0, 'backbone_decoder_blocks_per_stage': 1, 'extra_decoder_num_stacks': 0, 'extra_decoder_blocks_per_stage': 1, 'max_num_mask_slots': 128, 'num_mask_slots': 128, 'memory_channels': 256, 'base_transformer_expansion': 1.0, 'global_feed_forward_network_channels': 256, 'high_resolution_output_stride': 4, 'activation': 'relu', 'block_group_config': {'attention_bottleneck_expansion': 2, 'drop_path_keep_prob': 1.0, 'drop_path_beyond_stride': 16, 'drop_path_schedule': 'constant', 'positional_encoding_type': None, 'use_global_beyond_stride': 0, 'use_sac_beyond_stride': -1, 'use_squeeze_and_excite': False, 'conv_use_recompute_grad': False, 'axial_use_recompute_grad': True, 'recompute_within_stride': 0, 'transformer_use_recompute_grad': False, 'axial_layer_config': {'query_shape': (129, 129), 'key_expansion': 1, 'value_expansion': 2, 'memory_flange': (32, 32), 'double_global_attention': False, 'num_heads': 8, 'use_query_rpe_similarity': True, 'use_key_rpe_similarity': True, 'use_content_similarity': True, 'retrieve_value_rpe': True, 'retrieve_value_content': True, 'initialization_std_for_query_key_rpe': 1.0, 'initialization_std_for_value_rpe': 1.0, 'self_attention_activation': 'softmax'}, 'dual_path_transformer_layer_config': {'num_heads': 8, 'bottleneck_expansion': 2, 'key_expansion': 1, 'value_expansion': 2, 'feed_forward_network_channels': 2048, 'use_memory_self_attention': True, 'use_pixel2memory_feedback_attention': True, 'transformer_activation': 'softmax'}}, 'bn_layer': functools.partial(<class 'tensorflow.python.keras.layers.normalization_v2.SyncBatchNormalization'>, momentum=0.9900000095367432, epsilon=0.0010000000474974513), 'conv_kernel_weight_decay': 0.0}
I1008 09:12:06.923306 139644910475072 motion_deeplab.py:109] Setting pooling size to (13, 40)
I1008 09:12:06.923628 139644910475072 aspp.py:135] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
I1008 09:12:06.923738 139644910475072 aspp.py:135] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
2021-10-08 09:12:11.757189: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
I1008 09:12:11.760688 139644910475072 controller.py:362] restoring or initializing model...
restoring or initializing model...
I1008 09:12:12.072492 139644910475072 controller.py:368] initialized model.
initialized model.
I1008 09:12:12.073319 139644910475072 controller.py:252] eval | step: 0 | running complete evaluation...
eval | step: 0 | running complete evaluation...
2021-10-08 09:12:12.287843: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:176] None of the MLIR Optimization Passes are enabled (registered 2)
2021-10-08 09:12:12.308308: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 3399950000 Hz
WARNING:tensorflow:From /usr/wiss/webermar/anaconda3/envs/deeplab_pip/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py:5049: calling gather (from tensorflow.python.ops.array_ops) with validate_indices is deprecated and will be removed in a future version.
Instructions for updating:
The validate_indices argument has no effect. Indices are always validated on CPU and never validated on GPU.
W1008 09:12:19.992389 139644910475072 deprecation.py:534] From /usr/wiss/webermar/anaconda3/envs/deeplab_pip/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py:5049: calling gather (from tensorflow.python.ops.array_ops) with validate_indices is deprecated and will be removed in a future version.
Instructions for updating:
The validate_indices argument has no effect. Indices are always validated on CPU and never validated on GPU.
2021-10-08 09:12:25.851721: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:808] layout failed: Invalid argument: Size of values 3 does not match size of permutation 4 @ fanin shape inMotionDeepLab/PostProcessor/StatefulPartitionedCall/while/body/_231/while/SelectV2_1-1-TransposeNHWCToNCHW-LayoutOptimizer
2021-10-08 09:12:27.338178: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-10-08 09:12:28.743844: I tensorflow/stream_executor/cuda/cuda_dnn.cc:359] Loaded cuDNN version 8201
2021-10-08 09:12:30.845198: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-10-08 09:12:31.859844: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
I1008 09:37:38.998472 139644910475072 api.py:446] Creating COCO objects for AP eval...
creating index...
index created!
Loading and preparing results...
DONE (t=12.72s)
creating index...
index created!
I1008 09:37:54.079124 139644910475072 api.py:446] Running COCO evaluation...
Running per image evaluation...
Evaluate annotation type segm
DONE (t=98.34s).
Accumulating evaluation results...
DONE (t=3.74s).
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.375
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.651
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.356
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.150
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.481
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.676
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.141
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.438
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.439
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.195
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.562
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.764
I1008 09:39:38.485969 139644910475072 controller.py:261] eval | step: 0 | eval time: 1646.4 sec | output:
{'evaluation/ap/AP_Mask': 0.3751787,
'evaluation/iou/IoU': 0.63153327,
'evaluation/pq/FN': 734.8947,
'evaluation/pq/FP': 622.7895,
'evaluation/pq/PQ': 0.4207694,
'evaluation/pq/RQ': 0.52304685,
'evaluation/pq/SQ': 0.77444,
'evaluation/pq/TP': 1501.0526,
'evaluation/step/AQ': 0.5277102021206063,
'evaluation/step/IoU': 0.6308144803396004,
'evaluation/step/STQ': 0.5769638090215155,
'losses/eval_center_loss': 0.06295311,
'losses/eval_motion_loss': 0.0750779,
'losses/eval_regression_loss': 0.016829815,
'losses/eval_semantic_loss': 2.098444,
'losses/eval_total_loss': 2.2533038}
eval | step: 0 | eval time: 1646.4 sec | output:
{'evaluation/ap/AP_Mask': 0.3751787,
'evaluation/iou/IoU': 0.63153327,
'evaluation/pq/FN': 734.8947,
'evaluation/pq/FP': 622.7895,
'evaluation/pq/PQ': 0.4207694,
'evaluation/pq/RQ': 0.52304685,
'evaluation/pq/SQ': 0.77444,
'evaluation/pq/TP': 1501.0526,
'evaluation/step/AQ': 0.5277102021206063,
'evaluation/step/IoU': 0.6308144803396004,
'evaluation/step/STQ': 0.5769638090215155,
'losses/eval_center_loss': 0.06295311,
'losses/eval_motion_loss': 0.0750779,
'losses/eval_regression_loss': 0.016829815,
'losses/eval_semantic_loss': 2.098444,
'losses/eval_total_loss': 2.2533038}

Could you provide your full config as well as full log when you evaluate with unchanged code?

@lxtGH
Copy link
Author

lxtGH commented Oct 12, 2021

@markweberdev Hi! still can not reach the results. But Could you report class-wised iou or PQ for us to reference ?

@markweberdev
Copy link
Collaborator

That's unfortunate. I can eval that for you, could you please specify whether you would like to have per class PQ scores from Panoptic-DeepLab or Motion-DeepLab on KITTI-STEP?

@lxtGH
Copy link
Author

lxtGH commented Oct 12, 2021

Hi! @markweberdev Thanks for your reply. I want to obtain the results of both Panoptic-Deeplab and Motion-Deeplab. Thanks for that.

@markweberdev
Copy link
Collaborator

Please find the class wise scores attached. Please note, that the results are obtained with a ResNet50 os32 backbone.
classwise_scores_kitti_step.csv

@lxtGH
Copy link
Author

lxtGH commented Oct 20, 2021

@markweberdev Hi! Mark. I found there 11095 images in test set on Kitti STEP test set. But in your paper the number is 10173.
So the numbers seem to be not very consistent.

@markweberdev
Copy link
Collaborator

@lxtGH Thanks a lot for pointing this out. You are right, it's 11095. I will correct it in the paper!

@lxtGH
Copy link
Author

lxtGH commented Nov 11, 2021

@markweberdev Hi! Mark In Tab-3, what is window size for VPQ caculation ? According to your csv file result, I believe VPQ = PQ where k =1.

@markweberdev
Copy link
Collaborator

Hi,

I am unsure how you get to these insights. PQ scores are naturally higher than VPQ (by design they can’t be higher). With a window size of k=0 all baselines (B1-B3) would have had the same result, which is not what we reported in the paper.

We used the default setting of VPQ, as introduced in their paper. VPQ is averaged over K=4 different window sizes (0, 1, 2, 3 labelled images). As cityscapers-vps has only every 5th frame labelled this corresponds to their (0, 5, 10, 15) setting.

Hope that helps.

Best,
Mark

@lxtGH
Copy link
Author

lxtGH commented Nov 12, 2021

@markweberdev Thank for your reply !! I found VPQ of KITTY STEP changes are not as much as Cityscapes-VPS. Is the reason that STEP has less things (2 vs 8 in Cityscape) ?

@aquariusjay
Copy link
Contributor

Hi @lxtGH,

KITTI-STEP builds on top of KITTI-MOTS (which contains two thing classes for tracking) by additionally annotating the semantic segmentation.
VPQ is sensitive to the window size and stride (i.e., k and lambda in their paper), while our proposed metric STQ can directly evaluate on a whole video sequence.
For videos that have large annotation frame rate, you may need to play with different values of window size and stride to see the variation of VPQ.
For the KITTI-STEP dataset and metric discussion, please refer to our paper for your reference.

I am closing the issue, but please feel free to reopen it if you have any more questions.

Cheers,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants