Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confirm results of pretrained models #8

Closed
escorciav opened this issue Dec 16, 2020 · 11 comments
Closed

Confirm results of pretrained models #8

escorciav opened this issue Dec 16, 2020 · 11 comments

Comments

@escorciav
Copy link

Hi!

I was testing the pre-trained model, TSM RGB, and I got odd results in the validation set.

For action@1, I got 28.23 while you reported 35.75

all_action_accuracy_at_1: 28.237484484898633
all_action_accuracy_at_5: 47.6934215970211
all_noun_accuracy_at_1: 39.68762929251138
all_noun_accuracy_at_5: 65.98055440628879
all_verb_accuracy_at_1: 57.03351261894911
all_verb_accuracy_at_5: 86.38808440215143
tail_action_accuracy_at_1: 12.045088566827697
tail_noun_accuracy_at_1: 20.157894736842106
tail_verb_accuracy_at_1: 28.40909090909091

commit: d58e695

Steps

  • I generated the results in the validation set with this repo
  • Then, I evaluate those with the corresponding code.
@willprice
Copy link
Member

Hi @escorciav,

Thanks for raising this, you're right this is what you'll get from the currently released models 😧.
I reversed @d58e695 and produced the results

all_action_accuracy_at_1: 35.74679354571783
all_action_accuracy_at_5: 57.271410839884155
all_noun_accuracy_at_1: 47.4244931733554
all_noun_accuracy_at_5: 74.38973934629706
all_verb_accuracy_at_1: 62.97062474141498
all_verb_accuracy_at_5: 88.93256102606537
tail_action_accuracy_at_1: 18.26086956521739
tail_noun_accuracy_at_1: 26.63157894736842
tail_verb_accuracy_at_1: 35.90909090909091
unseen_action_accuracy_at_1: 26.572769953051644
unseen_noun_accuracy_at_1: 37.370892018779344
unseen_verb_accuracy_at_1: 54.55399061032864

I must have retrained the models with the erroneous configurations.

I have reverted the change in 731db0d and replaced the checkpoints on dropbox. You should now be able to replicate the results in the README following the same steps you did previously

Apologies for the inconvenience, I should have checked this yesterday when I made the change.

Here are the results of the other reverted checkpoints

TRN RGB

all_action_accuracy_at_1: 32.550682664460076
all_action_accuracy_at_5: 52.55482002482417
all_noun_accuracy_at_1: 43.618121638394705
all_noun_accuracy_at_5: 70.63508481588747
all_verb_accuracy_at_1: 60.5709557302441
all_verb_accuracy_at_5: 88.14646255688871
tail_action_accuracy_at_1: 15.942028985507244
tail_noun_accuracy_at_1: 22.42105263157895
tail_verb_accuracy_at_1: 33.52272727272727
unseen_action_accuracy_at_1: 24.88262910798122
unseen_noun_accuracy_at_1: 35.117370892018776
unseen_verb_accuracy_at_1: 51.45539906103287

TSN RGB

all_action_accuracy_at_1: 27.399669011170875
all_action_accuracy_at_5: 50.23789822093504
all_noun_accuracy_at_1: 43.9387670666115
all_noun_accuracy_at_5: 71.46255688870501
all_verb_accuracy_at_1: 50.5378568473314
all_verb_accuracy_at_5: 87.08109226313611
tail_action_accuracy_at_1: 14.428341384863124
tail_noun_accuracy_at_1: 24.57894736842105
tail_verb_accuracy_at_1: 29.48863636363636
unseen_action_accuracy_at_1: 18.87323943661972
unseen_noun_accuracy_at_1: 34.647887323943664
unseen_verb_accuracy_at_1: 40.751173708920184

@escorciav
Copy link
Author

escorciav commented Dec 17, 2020

I updated the std manually last night but I got the same result 😅 . I assumed the std was loaded from the ckpt.

  • Should I download new weights?

In the meantime, I will launch again by syncing my repo with origin/master, origin=this-repo.

@willprice
Copy link
Member

willprice commented Dec 17, 2020 via email

@escorciav
Copy link
Author

escorciav commented Dec 17, 2020

Hi Will!

As I said before, it seems that the std is loaded from the ckpt.

I'm using commit: 731db0d. Take a look at the config below:

INFO:test:Disabling distributed backend
INFO:test:Number of GPUs 1
INFO:test:Config:
modality: RGB
seed: 42
data:
  frame_count: 8
  segment_length: 1
  train_gulp_dir: ${data._root_gulp_dir}/rgb_train
  val_gulp_dir: /fast_scratch/datasets/epic-kitchens-100/data/processed/gulp/rgb_validation
  test_gulp_dir: ${data._root_gulp_dir}/rgb_test
  worker_count: 8
  pin_memory: true
  preprocessing:
    bgr: false
    rescale: true
    input_size: 224
    scale_size: 256
    mean:
    - 0.485
    - 0.456
    - 0.406
    std:
    - 0.229
    - 0.224
    - 0.225

@willprice
Copy link
Member

willprice commented Dec 17, 2020 via email

@escorciav
Copy link
Author

What was the conclusion?

  • I trained a model with the std == mean, and the results were lower than your checkpoint in the validation set.

  • I didn't debug if it's the std or that your checkpoint is trained in train+val ;)

@willprice
Copy link
Member

willprice commented Feb 20, 2021

I have trained some new RGB models with setting the std to the ImageNet mean and I get the following results (action top-1 accuracy on test set):

  • TSN: 23.72
  • TRN: 28.77
  • TSM: 31.99

These are all lower than the models I have released where I trained with the std set to the ImageNet mean (results are available in the README).
I retrained TSM again with the std set to the ImageNet mean and got 32.91... which is .1 percentage points off what is in the README.

I will leave the original model checkpoints available since they have better performance despite their odd training regime.

@willprice
Copy link
Member

I trained these models on the training set only, not on train+val.

willprice added a commit that referenced this issue Feb 20, 2021
#8 has more details about why we have kept our unconventional training strategy of setting the std in preprocessing to the ImageNet mean (better results across all models)
@escorciav
Copy link
Author

escorciav commented Feb 21, 2021

Cool, thanks for letting me know.

BTW, I've not read the instructions for the evaluation server yet.
Should people report if they train in "train", and "train+val"?
If not, please consider tracking that. Saying this as I noticed you have an arxiv paper with fine-print details of the last challenges. I like that preprint!

@dimadamen
Copy link
Contributor

Thanks for your note. Would be helpful if you read the instructions before raising a Q.
The leaderboards only collect the results on the "Test" set, using any amount of training data the users wish to utilise in training the model. We even welcome submissions with few-shot and weakly-supervised learning for the various challenges.

@escorciav
Copy link
Author

escorciav commented Feb 21, 2021

Noted with thanks. Have a beautiful day and week ☀️ !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants