Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a experiment about meta-test #46

Closed
wjczf123 opened this issue Sep 20, 2022 · 8 comments
Closed

a experiment about meta-test #46

wjczf123 opened this issue Sep 20, 2022 · 8 comments

Comments

@wjczf123
Copy link

Hi. I delete Line 384-385 and Line 447 of learner.py to avoid fine-tuning in the support set during the meta-test? Is this right? Thanks.

@iofu728
Copy link
Contributor

iofu728 commented Sep 21, 2022

Hi @wjczf123, yeah. If you remove the #384-385 and #447 of learner.py, the code'll skip fine-tuning on meta-test support set.

@wjczf123
Copy link
Author

Thanks for your reply.
I ran it once under inter 5-way 1-shot setting and the results looked very bad.

2022-09-20 22:57:35 INFO: - span_f1 = 0.7218073781712385
2022-09-20 22:57:35 INFO: - span_p = 0.7370060346505719
2022-09-20 22:57:35 INFO: - span_r = 0.7072229140722269
2022-09-20 22:57:35 INFO: - type_f1 = 0.156973848019738
2022-09-20 22:57:35 INFO: - type_p = 0.156973848069738
2022-09-20 22:57:35 INFO: - type_r = 0.156973848069738
2022-09-20 22:57:35 INFO: - 9.445,9.063,9.250,73.701,70.722,72.181,15.697,15.697,15.697,0.000,0.000,0.000

@wjczf123
Copy link
Author

I understand the performance will drop, but it performs poorly.

@iofu728
Copy link
Contributor

iofu728 commented Sep 22, 2022

Thanks for your reply. I ran it once under inter 5-way 1-shot setting and the results looked very bad.

2022-09-20 22:57:35 INFO: - span_f1 = 0.7218073781712385 2022-09-20 22:57:35 INFO: - span_p = 0.7370060346505719 2022-09-20 22:57:35 INFO: - span_r = 0.7072229140722269 2022-09-20 22:57:35 INFO: - type_f1 = 0.156973848019738 2022-09-20 22:57:35 INFO: - type_p = 0.156973848069738 2022-09-20 22:57:35 INFO: - type_r = 0.156973848069738 2022-09-20 22:57:35 INFO: - 9.445,9.063,9.250,73.701,70.722,72.181,15.697,15.697,15.697,0.000,0.000,0.000

Sorry, I made a mistake earlier. You can't direct remove #447 in the type classification stage, which has some logit to generate the type embedding.
The solution should be keep #447, and change #165 to self.model.eval(). You may also need to remove #191-192

@wjczf123
Copy link
Author

Thanks. The new result seems to be correct.

2022-09-24 20:43:14 INFO: - ***** Eval results inter-test *****
2022-09-24 20:43:14 INFO: - f1 = 0.6104350036041772
2022-09-24 20:43:14 INFO: - f1_threshold = 0.6133144703132174
2022-09-24 20:43:14 INFO: - loss = tensor(4.1757, device='cuda:0')
2022-09-24 20:43:14 INFO: - precision = 0.6232885601193933
2022-09-24 20:43:14 INFO: - precision_threshold = 0.6340790479672884
2022-09-24 20:43:14 INFO: - recall = 0.5981008717310069
2022-09-24 20:43:14 INFO: - recall_threshold = 0.5938667496886657
2022-09-24 20:43:14 INFO: - span_f1 = 0.7218073781712385
2022-09-24 20:43:14 INFO: - span_p = 0.7370060346505719
2022-09-24 20:43:14 INFO: - span_r = 0.7072229140722269
2022-09-24 20:43:14 INFO: - type_f1 = 0.8474159401741568
2022-09-24 20:43:14 INFO: - type_p = 0.8474159402241568
2022-09-24 20:43:14 INFO: - type_r = 0.8474159402241568
2022-09-24 20:43:14 INFO: - 62.329,59.810,61.044,73.701,70.722,72.181,84.742,84.742,84.742,63.408,59.387,61.331

@wjczf123 wjczf123 reopened this Sep 27, 2022
@wjczf123
Copy link
Author

I am very sorry to interrupt again. Why is the performance in 5-shot worse than 1-shot after ablating fine-tuning in meta-test?
For example, the F1 under inter 5way-5shot is about 54. However, the performance under 1-shot is 61.04. Have you ever observed this phenomenon before? It doesn't seem normal. Thanks.

@wjczf123 wjczf123 reopened this Sep 30, 2022
@iofu728
Copy link
Contributor

iofu728 commented Sep 30, 2022

Hi @wjczf123, this may be reasonable, although we have not done the corresponding ablation experiments on 5shot. First of all the 5shot and 1shot datasets cannot be compared in parallel, they are both just a sampled subset of Few-NERD. Of course, according to our experimental results on inter 5-1 and inter 5-5, it seems that the 5shot results are better. Secondly, we found in our experiments that more fine-tuning steps are needed for inter 5-5 and inter 10-5 in the meta-test. Removing the fine-tune may have a greater impact on 5shot. Hope this helps.

@wjczf123
Copy link
Author

Thanks. Hope you have a good day.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants