Did you try without LM pretrained weights? #1

apoorvumang · 2022-03-27T15:56:38Z

Hi

Very interesting work! I had a similar submission to ACL 2022 (https://github.com/apoorvumang/kgt5) and wanted to ask the following question: Did you try to train SimKGC from scratch ie without the pretrained LM weights? In our KGT5 experiments, we found using pretraining had almost no impact (although we obtain worse results than SimKGC).

My intuition says that if you start training from scratch SimKGC still might work. Have you guys tried it?

Thanks
Apoorv

intfloat · 2022-03-28T06:48:53Z

Hi,

Thanks for your interest in our work.

Now it's kind of standard practice to use pre-trained LMs, so we did not thoroughly compare the results with and without pre-trained weights. My best guess is that training from scratch will produce decent results but are not as good as using pre-trained language models.

I ran one experiment on WN18RR dataset this morning using randomly initialized weights without changing any hyper-parameters. Here are the results:

	MRR	H@1	H@3	H@10
with pre-trained (paper reported)	66.6	58.7	71.7	80.0
w/o pre-trained	56.9	50.7	59.8	68.8

With pre-trained LMs, the model converges to much better results.

For your case, it could be possible that the training dataset is large enough to shadow the benefits of pre-training.

Best,
Liang

apoorvumang · 2022-03-28T10:36:50Z

Hi Liang,

Thanks for such a prompt response! Those are some very cool results on WN18RR, seems like a very good evidence of transfer learning on link prediction task.

I agree, for WikiKG90Mv2 it might be that training dataset is large enough. I will also try to run KGT5 on WN18RR with/without pretraining and see the difference.

However, I have a doubt that the KGT5 training methodology - which is just seq2seq and does not use any negatives - could be suffering from the issues outlined in your paper, and some sort of contrastive training could give benefits. Do you think InfoNCE (or any contrastive loss/training) could be applied to seq2seq models? If so, what would be your recommendation as a starting point?

Thanks
Apoorv

intfloat · 2022-03-28T10:58:07Z

I think the idea of KGT5 is very cool. By formulating KGC as a seq2seq task, it implicitly treats all sequences except the ground-truth as negatives.

On combining contrastive learning with seq2seq models, I can recommend two papers as listed below. But there does not seem to be a widely acknowledged method, and I am not sure how much gains it will bring.

Best,
Liang

apoorvumang · 2022-03-28T19:03:43Z

Thank you so much for the pointers! I will definitely look into those :)

apoorvumang · 2022-03-31T16:06:17Z

Just an update: I tried WN18RR + KGT5 with pretrained weights and there was significant improvement in MRR: from 0.508 without pretrained weights to 0.532 with them.

apoorvumang closed this as completed Mar 28, 2022

apoorvumang mentioned this issue Apr 7, 2022

Training time of KGT5 apoorvumang/kgt5#4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Did you try without LM pretrained weights? #1

Did you try without LM pretrained weights? #1

apoorvumang commented Mar 27, 2022

intfloat commented Mar 28, 2022

apoorvumang commented Mar 28, 2022

intfloat commented Mar 28, 2022

apoorvumang commented Mar 28, 2022

apoorvumang commented Mar 31, 2022 •

edited

Loading

Did you try without LM pretrained weights? #1

Did you try without LM pretrained weights? #1

Comments

apoorvumang commented Mar 27, 2022

intfloat commented Mar 28, 2022

apoorvumang commented Mar 28, 2022

intfloat commented Mar 28, 2022

apoorvumang commented Mar 28, 2022

apoorvumang commented Mar 31, 2022 • edited Loading

apoorvumang commented Mar 31, 2022 •

edited

Loading