Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reproducing TSM_R50_1x1x16_50e_sthv2 issue #182

Closed
youngwanLEE opened this issue Sep 16, 2020 · 3 comments
Closed

reproducing TSM_R50_1x1x16_50e_sthv2 issue #182

youngwanLEE opened this issue Sep 16, 2020 · 3 comments

Comments

@youngwanLEE
Copy link

Notice

There are several common situations in the reimplementation issues as below

  1. Reimplement a model in the model zoo using the provided configs

Checklist

  1. I have searched related issues but cannot get the expected help.

Describe the issue

When I tested tsm_r50_1x1x16_50e_sthv2_rgb with this checkpoint , the result is lower than the reported accuracy (57.68/83.65).

I used sthv2 dataset in original webm video format.

image

Reproduction

  1. What command or script did you run?
 bash tools/dist_test.sh configs/recognition/tsm/tsm_r50_1x1x16_50e_sthv2_rgb.py work_dirs/tsm_r50_1x1x16_50e_sthv2_rgb_20200621-60ff441a.pth 8 --eval top_k_accuracy mean_class_accuracy
  1. What config dir you run?
configs/recognition/tsm/tsm_r50_1x1x16_50e_sthv2_rgb.py
  1. Did you make any modifications on the code or config? Did you understand what you have modified?

To use something-somethingv-2 original video dataset, I just made sthv2_{train, val}_list_videos.txt files.

Also, modified the config file to use this video format.

# model settings
model = dict(
    type='Recognizer2D',
    backbone=dict(
        type='ResNetTSM',
        pretrained='torchvision://resnet50',
        depth=50,
        norm_eval=False,
        shift_div=8),
    cls_head=dict(
        type='TSMHead',
        num_classes=339,
        in_channels=2048,
        spatial_type='avg',
        consensus=dict(type='AvgConsensus', dim=1),
        dropout_ratio=0.5,
        init_std=0.001,
        is_shift=True))
# model training and testing settings
train_cfg = None
test_cfg = dict(average_clips=None)
# dataset settings
# dataset_type = 'RawframeDataset'
# data_root = 'data/sthv2/rawframes'
# data_root_val = 'data/sthv2/rawframes'
# ann_file_train = 'data/sthv2/sthv2_train_list_rawframes.txt'
# ann_file_val = 'data/sthv2/sthv2_val_list_rawframes.txt'
# ann_file_test = 'data/sthv2/sthv2_val_list_rawframes.txt'
dataset_type = 'VideoDataset'
data_root = 'data/sthv2/videos'
data_root_val = 'data/sthv2/videos'
ann_file_train = 'data/sthv2/sthv2_train_list_videos.txt'
ann_file_val = 'data/sthv2/sthv2_val_list_videos.txt'
ann_file_test = 'data/sthv2/sthv2_val_list_videos.txt'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_bgr=False)
train_pipeline = [
    dict(type='DecordInit'),
    dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=16),
    # dict(type='RawFrameDecode'),
    dict(type='DecordDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(
        type='MultiScaleCrop',
        input_size=224,
        scales=(1, 0.875, 0.75, 0.66),
        random_crop=False,
        max_wh_scale_gap=1,
        num_fixed_crops=13),
    dict(type='Resize', scale=(224, 224), keep_ratio=False),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs', 'label'])
]
val_pipeline = [
    dict(type='DecordInit'),
    dict(
        type='SampleFrames',
        clip_len=1,
        frame_interval=1,
        num_clips=16,
        test_mode=True),
    # dict(type='RawFrameDecode'),
    dict(type='DecordDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(type='CenterCrop', crop_size=224),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs'])
]
test_pipeline = [
    dict(type='DecordInit'),    
    dict(
        type='SampleFrames',
        clip_len=1,
        frame_interval=1,
        num_clips=16,
        test_mode=True),
    # dict(type='RawFrameDecode'),
    dict(type='DecordDecode'),
    dict(type='Resize', scale=(-1, 256)),
    dict(type='CenterCrop', crop_size=224),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='FormatShape', input_format='NCHW'),
    dict(type='Collect', keys=['imgs', 'label'], meta_keys=[]),
    dict(type='ToTensor', keys=['imgs'])
]
data = dict(
    videos_per_gpu=6,
    workers_per_gpu=4,
    train=dict(
        type=dataset_type,
        ann_file=ann_file_train,
        data_prefix=data_root,
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=ann_file_val,
        data_prefix=data_root_val,
        pipeline=val_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=ann_file_test,
        data_prefix=data_root_val,
        pipeline=test_pipeline))
# optimizer
optimizer = dict(
    type='SGD',
    constructor='TSMOptimizerConstructor',
    paramwise_cfg=dict(fc_lr5=True),
    lr=0.0075,  # this lr is used for 8 gpus
    momentum=0.9,
    weight_decay=0.0005)
optimizer_config = dict(grad_clip=dict(max_norm=20, norm_type=2))
# learning policy
lr_config = dict(policy='step', step=[20, 40])
total_epochs = 50
checkpoint_config = dict(interval=1)
evaluation = dict(
    interval=2, metrics=['top_k_accuracy', 'mean_class_accuracy'], topk=(1, 5))
log_config = dict(
    interval=20,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardLoggerHook'),
    ])
# runtime settings
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/tsm_r50_1x1x16_50e_sthv2_rgb/'
load_from = None
resume_from = None
workflow = [('train', 1)]

  1. What dataset did you use?

--> Something-Something-V2

Environment

  1. Please run PYTHONPATH=${PWD}:$PYTHONPATH python mmaction/utils/collect_env.py to collect necessary environment information and paste it here.

sys.platform: linux
Python: 3.7.9 (default, Aug 31 2020, 12:42:55) [GCC 7.3.0]
CUDA available: True
GPU 0,1,2,3,4,5,6,7: TITAN Xp
CUDA_HOME: /usr/local/cuda
NVCC:
GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
PyTorch: 1.6.0
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 10.2
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  • CuDNN 7.6.5
  • Magma 2.5.2
  • Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.7.0
OpenCV: 4.4.0
MMCV: 1.1.2
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.2
MMAction2: 0.6.0+7dc58b3

  1. You may add addition that may be helpful for locating the problem, such as
    • How you installed PyTorch [e.g., pip, conda, source]
      --> by conda

Results

If applicable, paste the related results here, e.g., what you expect and what you get.

Evaluating top_k_accuracy...

top1_acc        0.4162
top5_acc        0.7047

Evaluating mean_class_accuracy...

mean_acc        0.3648
top1_acc: 0.4162
top5_acc: 0.7047
mean_class_accuracy: 0.3648

Issue fix

If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

@SuX97
Copy link
Collaborator

SuX97 commented Sep 16, 2020

Thank you for playing with mmaciont2!

For trouble shooting, let's first sort some information out.

  1. Which Decord did you use? The decord from pypi pkg is not the latest one, it has some bugs in decoding, check this out. We recommand you build it from source, or have a try with other decoder like PyAV.

  2. All of our models at sthv2 have a potential bug, they are trained with a classification head of fc->339 rather than 174. We are retraining them now(Fix TSN sthv2 config. #174).

  3. The metrics we report are all trained and tested on identical data source, like trained on rawframes and tested on rawframes. We are not sure if it will cause loss in accuracy when using different sources.

@youngwanLEE
Copy link
Author

@SuX97
Thanks for your quick response.

  1. Which Decord did you use? The decord from pypi pkg is not the latest one, it has some bugs in decoding, check this out. We recommand you build it from source, or have a try with other decoder like PyAV.

--> I just followed your install.md by using pip install decord.
The installed decord version is 0.4.0

  1. The metrics we report are all trained and tested on identical data source, like trained on rawframes and tested on rawframes. We are not sure if it will cause loss in accuracy when using different sources.

--> I firstly tried to use rawframes but constructing rawframes by using the instruction faced this error(#180 )

@innerlee
Copy link
Contributor

All of our models at sthv2 have a potential bug, they are trained with a classification head of fc->339 rather than 174.

From the experience in image classification, this does not affect the result

The installed decord version is 0.4.0

Seeking in v0.4.0 is not exact for some videos. You may try the master branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants