-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems on result reproduction on MEGC2021 Benchmark #6
Comments
I've also uploaded my notebook of reproduction.https://drive.google.com/file/d/1L-8EHmPC3HIj5Swd_aArrcCjTGZ99Wc6/view?usp=share_link The only modification is on some evaluation detail, |
After a closer inspection, I think the main reason for the bad result is the low F1-score of micro-expression detection, (I still don't understand why I can't reproduce the results though.) I've found many of my results producing few TPs and large FPs for micro-expression, and thus dragging the result of macro-expression. And there are so many hyper-parameters for spotting to tune, which I believe is crucial to the final result. Is there any search strategy or just tuning by viewing the score plot? |
Thanks for your investigation, especially the typo in the original codes. Please refer to the Jupyter Notebook as I might wrongly copy some parts when converting to py files. For the modification that you did: |
The hyperparameters were set by using a loop, which will take some time. I think this is normal to do so in the spotting task, and other methods also used a lot of hyperparameters while processing the "signal" graph. |
I see. Thank you again for your patience. |
I've had a hard time trying to reproduce the results. Listed are what I've tried.
Results:
Final result: TP:101, FP:290, FN:256
Precision = 0.2583
Recall = 0.185
F1-Score = 0.2156
The results are still not so good. So I finally tried to run the code in the jupyter notebook in provided in Code for evaluation on MEGC2021 benchmark? #3
Reproduction ipynb:
CASME:
Micro result: TP:3 FP:137 FN:54 F1_score:0.0305
Macro result: TP:100 FP:206 FN:200 F1_score:0.3300
Overall result: TP:103 FP:343 FN:254 F1_score:0.2565
SAMMLV:
Cumulative result until subject 30:
Micro result: TP:10 FP:169 FN:149 F1_score:0.0592
Macro result: TP:97 FP:277 FN:246 F1_score:0.2706
Overall result: TP:107 FP:446 FN:395 F1_score:0.2028
Orig ipynb:
CASME:
Micro result: TP:5 FP:77 FN:52 F1_score:0.0719
Macro result: TP:108 FP:166 FN:192 F1_score:0.3763
Overall result: TP:113 FP:243 FN:244 F1_score:0.3170
SAMM:
Micro result: TP:12 FP:104 FN:147 F1_score:0.0873
Macro result: TP:88 FP:198 FN:255 F1_score:0.2798
Overall result: TP:100 FP:302 FN:402 F1_score:0.2212
As reported above, there's a huge gap between the reproduction result & orig. performance on CASME_sq, while the gap for SAMMLV dataset is much smaller.
I've also tried fixing the ransom seed=1, the result does not improve, and replacing the mix of hard&soft label loss by pure hard label loss improves results. Moreover, I notive there are many subtle differences between the orig. code & jupyter notebook, using spotting method in the orig. code produces very bad results:
Final result: TP:53, FP:320, FN:304
Precision = 0.1421
Recall = 0.0849
F1-Score = 0.1063
Replacing it by the spotting method in the jupyter notebook turns out better, with results:
Final result: TP:102, FP:299, FN:255
Precision = 0.2544
Recall = 0.1841
F1-Score = 0.2136
And I found a typo in the orig. code
MTSN-Spot-ME-MaE/spot_interval.py
Line 116 in 42c4c35
I believe
score_plot_micro[peak] > 0.95
should bescore_plot_macro[peak] > 0.95
I'm trying to make some improvement on your work and take that as a baseline model, but I'm veru frustrated by the reproduction result. Any insight/help would be very precious to me.
The text was updated successfully, but these errors were encountered: