Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configs for HowTo100M #9

Closed
antoyang opened this issue Apr 27, 2021 · 5 comments
Closed

Configs for HowTo100M #9

antoyang opened this issue Apr 27, 2021 · 5 comments

Comments

@antoyang
Copy link

Hi,

Thanks for this great work and repo! I'd like to know if you used different training parameters / processing for the HowTo100M task. I did a straightforward adaptation of the code and config used for Kinetics (just changing the number of classes to 1059) but it doesn't seem to work (loss doesn't decrease), both when fine-tuning from ImageNet directly / fine-tuning from Kinetics.

Best,
Antoine Yang

@gberta
Copy link
Contributor

gberta commented Apr 27, 2021

Hi Antoine,

I used the same training hyperparameters for HowTo100M as for the other experiments. However, I used a different loader because the original loader would often fail to load HowTo100M videos properly. Therefore, as a first step I would suggest to look into data loading and make sure that everything is working properly.

Furthermore, I used a 32 or 64 gpu setup for these experiments and haven't tested it with smaller number of GPUs. I don't suspect that this would cause issues, since the model is forced to accumulate the gradient if smaller number of GPUs is used. However, as I said I haven't tested it with different hardware setup. What infrastructure are you using?

Lastly, could you share your training logs with me so that I could take a look?

@antoyang
Copy link
Author

antoyang commented Apr 28, 2021

Indeed it seems to work better by using a decoding based on ffmpeg-python instead of pyav / torchvision.

I am using one node with 4 32GB GPUs, allowing for train batch size 64 for divided space-time 8x32.

Here are the training logs (tensorboard) after 5-6 epochs (approx 32 train acc): https://drive.google.com/drive/folders/1qz3Nk4aroLCfNgiTo42foWRG91AmrCJW?usp=sharing

Also, I did not quite understand the "single clip coverage" of Table 8: if I am not mistaken, videos are 25 fps, and you sample one frame every 32 frames, so one 8-frames clip covers 32*8/25 = 10.24s, not 8.5s as mentioned in Table 8?

@gberta
Copy link
Contributor

gberta commented Apr 28, 2021

Clip-level accuracy of 32% with random temporal/spatial sampling after 5 epochs sounds roughly right to me. I believe that the full inference procedure with 48 temporal clips should yield roughly the same results as in our paper once the training is done.

In regards to the "single clip coverage", the target FPS of the decoder that I used was set to 30 FPS. Therefore, 32*8/30 would be ~8.5s.

@gberta gberta closed this as completed Apr 30, 2021
@r-kellerm
Copy link

r-kellerm commented Dec 21, 2021

@gberta , @antoyang - I guess that if you use Kinetics data loader, you also have to change the number of the sampled clips, since by default Kinetics loader samples a single clip in train / val and allows multiple clips samples only for test mode through TEST.NUM_ENSEMBLE_VIEWS config key. For long videos a single sample won't be enough and that might explain why the loss doesn't decrease.
(timesformer/datasets/kinetics.py, line 64:

        if self.mode in ["train", "val"]:
            self._num_clips = 1
        elif self.mode in ["test"]:
            self._num_clips = (
                cfg.TEST.NUM_ENSEMBLE_VIEWS * cfg.TEST.NUM_SPATIAL_CROPS
            )

)

@thechargedneutron
Copy link

Hi,

I am trying to reproduce the results for HowTo100M video classification but there seems to be a problem with how pyav handles HowTo100M long videos. However small the batch size I choose, there's a memory error in dataloader. I am not able to train the network with HowTo100M videos or even evaluate them for the given checkpoint. Can someone provide ffmpeg snippet that would help me train and test for this task? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants