Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Did you try without LM pretrained weights? #1

Closed
apoorvumang opened this issue Mar 27, 2022 · 5 comments
Closed

Did you try without LM pretrained weights? #1

apoorvumang opened this issue Mar 27, 2022 · 5 comments

Comments

@apoorvumang
Copy link

Hi

Very interesting work! I had a similar submission to ACL 2022 (https://github.com/apoorvumang/kgt5) and wanted to ask the following question: Did you try to train SimKGC from scratch ie without the pretrained LM weights? In our KGT5 experiments, we found using pretraining had almost no impact (although we obtain worse results than SimKGC).

My intuition says that if you start training from scratch SimKGC still might work. Have you guys tried it?

Thanks
Apoorv

@intfloat
Copy link
Owner

Hi,

Thanks for your interest in our work.

Now it's kind of standard practice to use pre-trained LMs, so we did not thoroughly compare the results with and without pre-trained weights. My best guess is that training from scratch will produce decent results but are not as good as using pre-trained language models.

I ran one experiment on WN18RR dataset this morning using randomly initialized weights without changing any hyper-parameters. Here are the results:

MRR H@1 H@3 H@10
with pre-trained (paper reported) 66.6 58.7 71.7 80.0
w/o pre-trained 56.9 50.7 59.8 68.8

With pre-trained LMs, the model converges to much better results.

For your case, it could be possible that the training dataset is large enough to shadow the benefits of pre-training.

Best,
Liang

@apoorvumang
Copy link
Author

Hi Liang,

Thanks for such a prompt response! Those are some very cool results on WN18RR, seems like a very good evidence of transfer learning on link prediction task.

I agree, for WikiKG90Mv2 it might be that training dataset is large enough. I will also try to run KGT5 on WN18RR with/without pretraining and see the difference.

However, I have a doubt that the KGT5 training methodology - which is just seq2seq and does not use any negatives - could be suffering from the issues outlined in your paper, and some sort of contrastive training could give benefits. Do you think InfoNCE (or any contrastive loss/training) could be applied to seq2seq models? If so, what would be your recommendation as a starting point?

Thanks
Apoorv

@intfloat
Copy link
Owner

I think the idea of KGT5 is very cool. By formulating KGC as a seq2seq task, it implicitly treats all sequences except the ground-truth as negatives.

On combining contrastive learning with seq2seq models, I can recommend two papers as listed below. But there does not seem to be a widely acknowledged method, and I am not sure how much gains it will bring.

  1. Contrastive Learning with Adversarial Perturbations for Conditional Text Generation, ICLR 2021
  2. A Contrastive Framework for Neural Text Generation

Best,
Liang

@apoorvumang
Copy link
Author

Thank you so much for the pointers! I will definitely look into those :)

@apoorvumang
Copy link
Author

apoorvumang commented Mar 31, 2022

Just an update: I tried WN18RR + KGT5 with pretrained weights and there was significant improvement in MRR: from 0.508 without pretrained weights to 0.532 with them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants