Configs for HowTo100M #9

antoyang · 2021-04-27T12:58:16Z

Hi,

Thanks for this great work and repo! I'd like to know if you used different training parameters / processing for the HowTo100M task. I did a straightforward adaptation of the code and config used for Kinetics (just changing the number of classes to 1059) but it doesn't seem to work (loss doesn't decrease), both when fine-tuning from ImageNet directly / fine-tuning from Kinetics.

Best,
Antoine Yang

gberta · 2021-04-27T14:21:56Z

Hi Antoine,

I used the same training hyperparameters for HowTo100M as for the other experiments. However, I used a different loader because the original loader would often fail to load HowTo100M videos properly. Therefore, as a first step I would suggest to look into data loading and make sure that everything is working properly.

Furthermore, I used a 32 or 64 gpu setup for these experiments and haven't tested it with smaller number of GPUs. I don't suspect that this would cause issues, since the model is forced to accumulate the gradient if smaller number of GPUs is used. However, as I said I haven't tested it with different hardware setup. What infrastructure are you using?

Lastly, could you share your training logs with me so that I could take a look?

antoyang · 2021-04-28T12:36:41Z

Indeed it seems to work better by using a decoding based on ffmpeg-python instead of pyav / torchvision.

I am using one node with 4 32GB GPUs, allowing for train batch size 64 for divided space-time 8x32.

Here are the training logs (tensorboard) after 5-6 epochs (approx 32 train acc): https://drive.google.com/drive/folders/1qz3Nk4aroLCfNgiTo42foWRG91AmrCJW?usp=sharing

Also, I did not quite understand the "single clip coverage" of Table 8: if I am not mistaken, videos are 25 fps, and you sample one frame every 32 frames, so one 8-frames clip covers 32*8/25 = 10.24s, not 8.5s as mentioned in Table 8?

gberta · 2021-04-28T13:13:27Z

Clip-level accuracy of 32% with random temporal/spatial sampling after 5 epochs sounds roughly right to me. I believe that the full inference procedure with 48 temporal clips should yield roughly the same results as in our paper once the training is done.

In regards to the "single clip coverage", the target FPS of the decoder that I used was set to 30 FPS. Therefore, 32*8/30 would be ~8.5s.

r-kellerm · 2021-12-21T08:47:43Z

@gberta , @antoyang - I guess that if you use Kinetics data loader, you also have to change the number of the sampled clips, since by default Kinetics loader samples a single clip in train / val and allows multiple clips samples only for test mode through TEST.NUM_ENSEMBLE_VIEWS config key. For long videos a single sample won't be enough and that might explain why the loss doesn't decrease.
(timesformer/datasets/kinetics.py, line 64:

        if self.mode in ["train", "val"]:
            self._num_clips = 1
        elif self.mode in ["test"]:
            self._num_clips = (
                cfg.TEST.NUM_ENSEMBLE_VIEWS * cfg.TEST.NUM_SPATIAL_CROPS
            )

)

thechargedneutron · 2022-09-16T01:44:07Z

Hi,

I am trying to reproduce the results for HowTo100M video classification but there seems to be a problem with how pyav handles HowTo100M long videos. However small the batch size I choose, there's a memory error in dataloader. I am not able to train the network with HowTo100M videos or even evaluate them for the given checkpoint. Can someone provide ffmpeg snippet that would help me train and test for this task? Thanks

gberta closed this as completed Apr 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configs for HowTo100M #9

Configs for HowTo100M #9

antoyang commented Apr 27, 2021

gberta commented Apr 27, 2021 •

edited

antoyang commented Apr 28, 2021 •

edited

gberta commented Apr 28, 2021

r-kellerm commented Dec 21, 2021 •

edited

thechargedneutron commented Sep 16, 2022

Configs for HowTo100M #9

Configs for HowTo100M #9

Comments

antoyang commented Apr 27, 2021

gberta commented Apr 27, 2021 • edited

antoyang commented Apr 28, 2021 • edited

gberta commented Apr 28, 2021

r-kellerm commented Dec 21, 2021 • edited

thechargedneutron commented Sep 16, 2022

gberta commented Apr 27, 2021 •

edited

antoyang commented Apr 28, 2021 •

edited

r-kellerm commented Dec 21, 2021 •

edited