Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Augmentation for Video #1064

Closed
wenjun90 opened this issue Aug 2, 2021 · 14 comments
Closed

Data Augmentation for Video #1064

wenjun90 opened this issue Aug 2, 2021 · 14 comments

Comments

@wenjun90
Copy link

wenjun90 commented Aug 2, 2021

Hello @dreamerlin,

Thank you and your team for your contribution.
Could I ask you a question for Data Augmentation, please?

How can we set the config for augmentation data during training?

Thank you very much!

@dreamerlin
Copy link
Collaborator

Ref: https://github.com/open-mmlab/mmaction2/blob/master/configs/recognition/tsn/tsn_fp16_r50_1x1x3_100e_kinetics400_rgb.py#L15-L31

You can set your Data Augmentation in config like this.

@dreamerlin
Copy link
Collaborator

dreamerlin commented Aug 3, 2021

And if you find this repo help you, you can give us a star : ) !

@irvingzhang0512
Copy link
Contributor

irvingzhang0512 commented Aug 3, 2021

There are 4 kinds of data augmentation for training

  1. Native MMAction2 data augmentation pipelines here, such as flip/resize/crop/colorjitter...
  2. Third-party library Imgaug, such as RandAugment. Demos could be find [Feature] Support Imgaug for augmentations in the data pipeline. #492 [Improvement] Set RandAugment as Imgaug default transforms. #585
  3. Third-party library PytorchVideo, such as RandAugment/AugMix. Demos could be find [Feature] Support Pytorchvideo Transforms #1008
  4. Mixup and Cutmix, demos could be find [Feature] Support Mixup and Cutmix for Recognizers. #681

@irvingzhang0512
Copy link
Contributor

irvingzhang0512 commented Aug 3, 2021

Besides, you can refer to the configs of tsm-r50/sthv1, which contains a lot of different data augmentation pipelines.
image
image

@wenjun90
Copy link
Author

wenjun90 commented Aug 3, 2021

Thank you @dreamerlin and @irvingzhang0512

Could I ask you more question, please @dreamerlin?
I still don't quite understand these parameters: dict(type='SampleFrames', clip_len=1, frame_interval=1, num_clips=3)

  • clip_len=1, it means that len of clip = 1s extract randomly from the total length of clip (for exemple length of clip = 10s)
  • frame_interval = 1 and num_clips = 3, what means?

Thank you very much!

@irvingzhang0512
Copy link
Contributor

@wenjun90 #655 (comment)

@wenjun90
Copy link
Author

Hey @dreamerlin and @irvingzhang0512 ,
Could I ask you a question for learning rate in slowfast, please?
I do my training with 1GPU and 8 video/gpu, so learning rate need to set 0.01?
Beacause lr=0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu.
Thank you very much.

@wenjun90
Copy link
Author

Hi @irvingzhang0512 ,
I did training with pytorchvideo.AugMix, training time is more long than I don't use augmentation. It is normal?

@irvingzhang0512
Copy link
Contributor

Yes, it's normal, DataAugment requires a lot of cpu resources.

@wenjun90
Copy link
Author

thank @irvingzhang0512 ,

Could I ask you 2 question, please?

  • How can I use the config slowfast_r152_r50_4x16x1_256e_kinetics400_rgb with typedataset is "VideosDataset"?
  • Can I use the multiscale like that:
    dict(
    type='MultiScaleCrop',
    input_size=224,
    scales=(1, 0.875, 0.75, 0.66),
    random_crop=False,
    max_wh_scale_gap=1,
    num_fixed_crops=13),
    for slowfast

Thank you very much.

@kennymckormick
Copy link
Member

  1. You should:
    1. set dataset_type as VideoDataset;
    2. use data list for videos (each line is like video.mp4 label);
    3. modify data pipeline, use DecordInit and DecordDecode.
  2. That's OK, but the performance might be inferior.

@bit-scientist
Copy link
Contributor

I am finding it hard to calculate a value for learning rate when different number of GPUs available. Since lr is set to 0.01 for 4 GPUs x 2 video/gpu and lr=0.08 for 16 GPUs x 4 video/gpu, what exact value should I choose for my lr if I have one GPU (32 Gb) with batchsize = 64?

@kennymckormick
Copy link
Member

Since the total batch size is still 64, you can use 0.08.

@WEIZHIHONG720
Copy link

Will the video data use "cutout" as data augmentation? If so, what is the typical cutout ratio setting? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants