Problems on result reproduction on MEGC2021 Benchmark #6

xjtupanda · 2022-11-19T06:35:54Z

I've had a hard time trying to reproduce the results. Listed are what I've tried.

I've re-organized the code in the way I'm used to, and run experiments on CASME_sq using the features extracted by myself as instructed. The overall F1-score is somewhere around 0.23. So I doubt if there's something wrong with my feature extraction procedure, so I turn to the preprocessed features offered in the repo.
Run experiments on CASME_sq using the features you provided in repo.
Results:
Final result: TP:101, FP:290, FN:256
Precision = 0.2583
Recall = 0.185
F1-Score = 0.2156
The results are still not so good. So I finally tried to run the code in the jupyter notebook in provided in Code for evaluation on MEGC2021 benchmark? #3
Run experiments on CASME_sq & SAMMLV using the notebook & features you provided in repo. Here are the results.

Reproduction ipynb:

CASME:

Micro result: TP:3 FP:137 FN:54 F1_score:0.0305
Macro result: TP:100 FP:206 FN:200 F1_score:0.3300
Overall result: TP:103 FP:343 FN:254 F1_score:0.2565

SAMMLV:

Cumulative result until subject 30:
Micro result: TP:10 FP:169 FN:149 F1_score:0.0592
Macro result: TP:97 FP:277 FN:246 F1_score:0.2706
Overall result: TP:107 FP:446 FN:395 F1_score:0.2028

Orig ipynb:

CASME:
Micro result: TP:5 FP:77 FN:52 F1_score:0.0719
Macro result: TP:108 FP:166 FN:192 F1_score:0.3763
Overall result: TP:113 FP:243 FN:244 F1_score:0.3170

SAMM:

Micro result: TP:12 FP:104 FN:147 F1_score:0.0873
Macro result: TP:88 FP:198 FN:255 F1_score:0.2798
Overall result: TP:100 FP:302 FN:402 F1_score:0.2212

As reported above, there's a huge gap between the reproduction result & orig. performance on CASME_sq, while the gap for SAMMLV dataset is much smaller.

I've also tried fixing the ransom seed=1, the result does not improve, and replacing the mix of hard&soft label loss by pure hard label loss improves results. Moreover, I notive there are many subtle differences between the orig. code & jupyter notebook, using spotting method in the orig. code produces very bad results:
Final result: TP:53, FP:320, FN:304
Precision = 0.1421
Recall = 0.0849
F1-Score = 0.1063
Replacing it by the spotting method in the jupyter notebook turns out better, with results:
Final result: TP:102, FP:299, FN:255
Precision = 0.2544
Recall = 0.1841
F1-Score = 0.2136

And I found a typo in the orig. code

MTSN-Spot-ME-MaE/spot_interval.py

Line 116 in 42c4c35

    
           if end-start > macro_min and end-start < macro_max and ( score_plot_micro[peak] > 0.95 or (score_plot_macro[peak] > score_plot_macro[start] and score_plot_macro[peak] > score_plot_macro[end])):

I believe score_plot_micro[peak] > 0.95 should be score_plot_macro[peak] > 0.95

I'm trying to make some improvement on your work and take that as a baseline model, but I'm veru frustrated by the reproduction result. Any insight/help would be very precious to me.

The text was updated successfully, but these errors were encountered:

xjtupanda · 2022-11-19T06:44:19Z

I've also uploaded my notebook of reproduction.https://drive.google.com/file/d/1L-8EHmPC3HIj5Swd_aArrcCjTGZ99Wc6/view?usp=share_link

The only modification is on some evaluation detail, if len(preds_micro) == 0: preds_micro.append([0, 0, 0, 0, 0, -1])#, 0]) # -1 to bypass the count of additional fp
Because the evaluation function requires each pred item of length 6 and GT item of length 7, so I remove the last element in each item. The original notebook has length 7 and 8 respectively though.

xjtupanda · 2022-11-19T12:42:08Z

After a closer inspection, I think the main reason for the bad result is the low F1-score of micro-expression detection, (I still don't understand why I can't reproduce the results though.) I've found many of my results producing few TPs and large FPs for micro-expression, and thus dragging the result of macro-expression. And there are so many hyper-parameters for spotting to tune, which I believe is crucial to the final result. Is there any search strategy or just tuning by viewing the score plot?

genbing99 · 2022-11-28T09:21:35Z

Thanks for your investigation, especially the typo in the original codes.

Please refer to the Jupyter Notebook as I might wrongly copy some parts when converting to py files.

For the modification that you did:
preds_micro.append([0, 0, 0, 0, 0, -1])#, 0]) # -1 to bypass the count of additional fp
You should use the line in the original code:
preds_micro.append([0, 0, 0, 0, 0, -1, 0]) # -1 to bypass the count of additional fp
Otherwise, you will encounter an issue of adding an FP if no prediction is made on the particular video.
See that the FP:7 for the evaluation of CASME_sq Subject 1. In fact, it should be FP:0.

genbing99 · 2022-11-28T09:28:03Z

The hyperparameters were set by using a loop, which will take some time. I think this is normal to do so in the spotting task, and other methods also used a lot of hyperparameters while processing the "signal" graph.

xjtupanda · 2022-11-30T14:03:57Z

The hyperparameters were set by using a loop, which will take some time. I think this is normal to do so in the spotting task, and other methods also used a lot of hyperparameters while processing the "signal" graph.

I see. Thank you again for your patience.

xjtupanda closed this as completed Nov 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems on result reproduction on MEGC2021 Benchmark #6

Problems on result reproduction on MEGC2021 Benchmark #6

xjtupanda commented Nov 19, 2022 •

edited

Loading

xjtupanda commented Nov 19, 2022

xjtupanda commented Nov 19, 2022 •

edited

Loading

genbing99 commented Nov 28, 2022

genbing99 commented Nov 28, 2022

xjtupanda commented Nov 30, 2022

Problems on result reproduction on MEGC2021 Benchmark #6

Problems on result reproduction on MEGC2021 Benchmark #6

Comments

xjtupanda commented Nov 19, 2022 • edited Loading

xjtupanda commented Nov 19, 2022

xjtupanda commented Nov 19, 2022 • edited Loading

genbing99 commented Nov 28, 2022

genbing99 commented Nov 28, 2022

xjtupanda commented Nov 30, 2022

xjtupanda commented Nov 19, 2022 •

edited

Loading

xjtupanda commented Nov 19, 2022 •

edited

Loading