Confirm results of pretrained models #8

escorciav · 2020-12-16T22:49:03Z

Hi!

I was testing the pre-trained model, TSM RGB, and I got odd results in the validation set.

For action@1, I got 28.23 while you reported 35.75

all_action_accuracy_at_1: 28.237484484898633
all_action_accuracy_at_5: 47.6934215970211
all_noun_accuracy_at_1: 39.68762929251138
all_noun_accuracy_at_5: 65.98055440628879
all_verb_accuracy_at_1: 57.03351261894911
all_verb_accuracy_at_5: 86.38808440215143
tail_action_accuracy_at_1: 12.045088566827697
tail_noun_accuracy_at_1: 20.157894736842106
tail_verb_accuracy_at_1: 28.40909090909091

commit: d58e695

Steps

I generated the results in the validation set with this repo
Then, I evaluate those with the corresponding code.

The text was updated successfully, but these errors were encountered:

willprice · 2020-12-17T09:51:53Z

Hi @escorciav,

Thanks for raising this, you're right this is what you'll get from the currently released models 😧.
I reversed @d58e695 and produced the results

all_action_accuracy_at_1: 35.74679354571783
all_action_accuracy_at_5: 57.271410839884155
all_noun_accuracy_at_1: 47.4244931733554
all_noun_accuracy_at_5: 74.38973934629706
all_verb_accuracy_at_1: 62.97062474141498
all_verb_accuracy_at_5: 88.93256102606537
tail_action_accuracy_at_1: 18.26086956521739
tail_noun_accuracy_at_1: 26.63157894736842
tail_verb_accuracy_at_1: 35.90909090909091
unseen_action_accuracy_at_1: 26.572769953051644
unseen_noun_accuracy_at_1: 37.370892018779344
unseen_verb_accuracy_at_1: 54.55399061032864

I must have retrained the models with the erroneous configurations.

I have reverted the change in 731db0d and replaced the checkpoints on dropbox. You should now be able to replicate the results in the README following the same steps you did previously

Apologies for the inconvenience, I should have checked this yesterday when I made the change.

Here are the results of the other reverted checkpoints

TRN RGB

all_action_accuracy_at_1: 32.550682664460076
all_action_accuracy_at_5: 52.55482002482417
all_noun_accuracy_at_1: 43.618121638394705
all_noun_accuracy_at_5: 70.63508481588747
all_verb_accuracy_at_1: 60.5709557302441
all_verb_accuracy_at_5: 88.14646255688871
tail_action_accuracy_at_1: 15.942028985507244
tail_noun_accuracy_at_1: 22.42105263157895
tail_verb_accuracy_at_1: 33.52272727272727
unseen_action_accuracy_at_1: 24.88262910798122
unseen_noun_accuracy_at_1: 35.117370892018776
unseen_verb_accuracy_at_1: 51.45539906103287

TSN RGB

all_action_accuracy_at_1: 27.399669011170875
all_action_accuracy_at_5: 50.23789822093504
all_noun_accuracy_at_1: 43.9387670666115
all_noun_accuracy_at_5: 71.46255688870501
all_verb_accuracy_at_1: 50.5378568473314
all_verb_accuracy_at_5: 87.08109226313611
tail_action_accuracy_at_1: 14.428341384863124
tail_noun_accuracy_at_1: 24.57894736842105
tail_verb_accuracy_at_1: 29.48863636363636
unseen_action_accuracy_at_1: 18.87323943661972
unseen_noun_accuracy_at_1: 34.647887323943664
unseen_verb_accuracy_at_1: 40.751173708920184

escorciav · 2020-12-17T10:05:40Z

I updated the std manually last night but I got the same result 😅 . I assumed the std was loaded from the ckpt.

Should I download new weights?

In the meantime, I will launch again by syncing my repo with origin/master, origin=this-repo.

willprice · 2020-12-17T10:13:34Z

The weights are the same, just the std in the hyperparameters dict is updated. No need to redownload. I might try and retrain the models over christmas with the original imagenet mean and then update the models, although I don't expect to see much change in the metrics.

…

On Dec 17 2020, at 10:05 am, Victor Escorcia Castillo ***@***.***> wrote: I updated the std manually last night but I got the same result 😅 . I assumed the std was loaded from the ckpt. Should I download new weights? — You are receiving this because you commented. Reply to this email directly, view it on GitHub (#8 (comment)), or unsubscribe (https://github.com/notifications/unsubscribe-auth/AAHL4PNJJ65PDCPR67LSQD3SVHJYJANCNFSM4U6WFKDQ).

escorciav · 2020-12-17T11:11:32Z

Hi Will!

As I said before, it seems that the std is loaded from the ckpt.

I'm using commit: 731db0d. Take a look at the config below:

INFO:test:Disabling distributed backend
INFO:test:Number of GPUs 1
INFO:test:Config:
modality: RGB
seed: 42
data:
  frame_count: 8
  segment_length: 1
  train_gulp_dir: ${data._root_gulp_dir}/rgb_train
  val_gulp_dir: /fast_scratch/datasets/epic-kitchens-100/data/processed/gulp/rgb_validation
  test_gulp_dir: ${data._root_gulp_dir}/rgb_test
  worker_count: 8
  pin_memory: true
  preprocessing:
    bgr: false
    rescale: true
    input_size: 224
    scale_size: 256
    mean:
    - 0.485
    - 0.456
    - 0.406
    std:
    - 0.229
    - 0.224
    - 0.225

willprice · 2020-12-17T11:14:08Z

Sure, the parameters are loaded from the checkpoint, you need to update the hyperparameters dict in the checkpoint. Or you can just redownload the models, that's all I did to update them.

…

On Dec 17 2020, at 11:11 am, Victor Escorcia Castillo ***@***.***> wrote: Hi Will! As I said before, it seems that the std is loaded from the ckpt. I'm using commit: 731db0d. Take a look at the config below: INFO:test:Disabling distributed backend INFO:test:Number of GPUs 1 INFO:test:Config: modality: RGB seed: 42 data: frame_count: 8 segment_length: 1 train_gulp_dir: ${data._root_gulp_dir}/rgb_train val_gulp_dir: /fast_scratch/datasets/epic-kitchens-100/data/processed/gulp/rgb_validation test_gulp_dir: ${data._root_gulp_dir}/rgb_test worker_count: 8 pin_memory: true preprocessing: bgr: false rescale: true input_size: 224 scale_size: 256 mean: - 0.485 - 0.456 - 0.406 std: - 0.229 - 0.224 - 0.225 — You are receiving this because you commented. Reply to this email directly, view it on GitHub (#8 (comment)), or unsubscribe (https://github.com/notifications/unsubscribe-auth/AAHL4PPXSUOGXWTP7EK5WQ3SVHRPHANCNFSM4U6WFKDQ).

escorciav · 2021-02-17T02:35:49Z

What was the conclusion?

I trained a model with the std == mean, and the results were lower than your checkpoint in the validation set.
I didn't debug if it's the std or that your checkpoint is trained in train+val ;)

willprice · 2021-02-20T15:18:57Z

I have trained some new RGB models with setting the std to the ImageNet mean and I get the following results (action top-1 accuracy on test set):

TSN: 23.72
TRN: 28.77
TSM: 31.99

These are all lower than the models I have released where I trained with the std set to the ImageNet mean (results are available in the README).
I retrained TSM again with the std set to the ImageNet mean and got 32.91... which is .1 percentage points off what is in the README.

I will leave the original model checkpoints available since they have better performance despite their odd training regime.

willprice · 2021-02-20T15:20:03Z

I trained these models on the training set only, not on train+val.

#8 has more details about why we have kept our unconventional training strategy of setting the std in preprocessing to the ImageNet mean (better results across all models)

escorciav · 2021-02-21T15:12:19Z

Cool, thanks for letting me know.

BTW, I've not read the instructions for the evaluation server yet.
❓ Should people report if they train in "train", and "train+val"?
If not, please consider tracking that. Saying this as I noticed you have an arxiv paper with fine-print details of the last challenges. I like that preprint!

dimadamen · 2021-02-21T15:27:24Z

Thanks for your note. Would be helpful if you read the instructions before raising a Q.
The leaderboards only collect the results on the "Test" set, using any amount of training data the users wish to utilise in training the model. We even welcome submissions with few-shot and weakly-supervised learning for the various challenges.

escorciav · 2021-02-21T18:23:48Z

Noted with thanks. Have a beautiful day and week ☀️ !

willprice mentioned this issue Dec 17, 2020

mean and std have the same values in the checkpoints RGB #6

Closed

willprice closed this as completed Feb 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confirm results of pretrained models #8

Confirm results of pretrained models #8

escorciav commented Dec 16, 2020

willprice commented Dec 17, 2020

escorciav commented Dec 17, 2020 •

edited

Loading

willprice commented Dec 17, 2020 via email

escorciav commented Dec 17, 2020 •

edited

Loading

willprice commented Dec 17, 2020 via email

escorciav commented Feb 17, 2021

willprice commented Feb 20, 2021 •

edited

Loading

willprice commented Feb 20, 2021

escorciav commented Feb 21, 2021 •

edited

Loading

dimadamen commented Feb 21, 2021

escorciav commented Feb 21, 2021 •

edited

Loading

Confirm results of pretrained models #8

Confirm results of pretrained models #8

Comments

escorciav commented Dec 16, 2020

willprice commented Dec 17, 2020

escorciav commented Dec 17, 2020 • edited Loading

willprice commented Dec 17, 2020 via email

escorciav commented Dec 17, 2020 • edited Loading

willprice commented Dec 17, 2020 via email

escorciav commented Feb 17, 2021

willprice commented Feb 20, 2021 • edited Loading

willprice commented Feb 20, 2021

escorciav commented Feb 21, 2021 • edited Loading

dimadamen commented Feb 21, 2021

escorciav commented Feb 21, 2021 • edited Loading

escorciav commented Dec 17, 2020 •

edited

Loading

escorciav commented Dec 17, 2020 •

edited

Loading

willprice commented Feb 20, 2021 •

edited

Loading

escorciav commented Feb 21, 2021 •

edited

Loading

escorciav commented Feb 21, 2021 •

edited

Loading