Issues on the performance #3

VoidHaruhi · 2021-08-16T07:46:08Z

I used the hyperparameters provided, but the performance is actually bad. Take DBLP and twitter as examples, each of the results is about 20% below your result in the paper. Here is the command lines I used and the results.
DBLP:
nohup python3 main.py --base_embedding_dim=128 --batch_size=128 --beam_width=4 --checkpoint=15 --d_ff=512 --d_k=64 --d_model=128 --d_v=64 --data_name='dblp' --false_edge_gen='double' --ft_batch_size=100 --ft_checkpoint=500 --ft_d_ff=512 --ft_drop_rate=0.1 --ft_input_option='last4_cat' --ft_layer='ffn' --ft_lr=5e-05 --ft_n_epochs=10 --gcn_option='no_gcn' --get_bert_encoder_embeddings=False --lr=0.0001 --max_length=12 --n_epochs=10 --n_heads=4 --n_layers=4 --node_edge_composition_func='mult' --num_gcn_layers=2 --num_walks_per_node=5 --outdir='./output/dblp/' --path_option='shortest' --pretrained_method='node2vec' --walk_type='dfs' 2>&1 &

Twitter:
nohup python3 main.py --base_embedding_dim=128 --batch_size=128 --beam_width=4 --checkpoint=15 --d_ff=512 --d_k=64 --d_model=128 --d_v=64 --data_name='twitter' --emb_dir=None --false_edge_gen='double' --ft_batch_size=100 --ft_checkpoint=500 --ft_d_ff=512 --ft_drop_rate=0.1 --ft_input_option='last4_cat' --ft_layer='ffn' --ft_lr=5e-05 --ft_n_epochs=10 --gcn_option='no_gcn' --get_bert_encoder_embeddings=False --lr=0.0001 --max_length=12 --n_epochs=10 --n_heads=4 --n_layers=4 --node_edge_composition_func='mult' --num_gcn_layers=2 --num_walks_per_node=10 --path_option='shortest' --pretrained_embeddings='./embed/twitter/twitter.emd' --outdir='./output/twitter/' --pretrained_method='node2vec' --walk_type='dfs' 2>&1 &

Shreyas-Bhat · 2022-06-29T08:13:02Z

@VoidHaruhi what did you use for 'data/twitter/ent2id.txt' and 'twitter.emd' please?

VoidHaruhi · 2022-07-05T02:59:34Z

@VoidHaruhi what did you use for 'data/twitter/ent2id.txt' and 'twitter.emd' please?

@Shreyas-Bhat "ent2id.txt" is an original data file, data preprocess use this file to convert entity number to id in the graph. The command line "--pretrained_embeddings='./embed/twitter/twitter.emd'" means that you save the pretrained embedding into which directory

Shreyas-Bhat · 2022-07-05T05:57:57Z

@VoidHaruhi Thanks! where can I get this "ent2id.txt" and the pre-trained embedding?

VoidHaruhi · 2022-07-05T08:36:52Z

@Shreyas-Bhat I download the dataset from https://github.com/THUDM/GATNE and put the data directory "./GATNE/data/" into "./SLiCE/data/". You can debug the code and check where the model take them as input.

Shreyas-Bhat · 2022-07-05T09:09:56Z

@VoidHaruhi thanks for your reply and sorry for asking again, but the code is not running without "ent2id.txt" and pre-trained embedding, which is absent in https://github.com/THUDM/GATNE.

VoidHaruhi · 2022-07-05T10:30:33Z

@Shreyas-Bhat Sorry for the wrong information. It has been a long time since I ran this code, so I cannot remember how I got "ent2id.txt". Please contact the author.
The embedding file is generated by the pretraining phase in the code. If you have an embedding file, it will be calculated again.

Shreyas-Bhat · 2022-07-05T11:18:36Z

@VoidHaruhi Sure. If I am not asking for too much, can you please share the scripts you used to run this model? As you have been able to produce the training procedure.

VoidHaruhi · 2022-07-05T14:22:45Z

@Shreyas-Bhat You need to install the python packages in requirements.txt and torch-geometric (maybe a bit annoying because of version conflicts). And check the input and output path. I'm not sure but it will be ok with careful debugging.

yasNing · 2022-08-02T08:20:45Z

@Shreyas-Bhat Hi, I',m also the guy who want to run this code, and really want to know if you get the ent2id.txt and the embeddings. I really need your kind help.

Shreyas-Bhat · 2022-08-03T04:52:08Z

@yasNing not yet :(

colbyham · 2022-08-03T16:48:58Z

@Shreyas-Bhat @yasNing ent2id.txt can be generated by taking
all of the node IDs from the dataset and mapping them to 0 based integer indexing. Same with rel2id.txt but with relationships. Below are example files based on the data from twitter in the GATNE repo. To generate pre-trained embeddings use node2vec on your training data.

colbyham · 2022-08-03T16:52:28Z

Trying file upload again for example mappings, upload wasn't working
ent2id.txt
rel2id.txt

yasNing · 2022-08-04T02:11:11Z

@colby-ham Thanks very much! I guess I know how to buld my datasets!

Shreyas-Bhat · 2022-08-04T13:14:19Z

@colby-ham Thank you very much for clarifying. @yasNing can we connect outside this thread? I am working on a separate dataset too and could use your help in the steps.

Shreyas-Bhat · 2022-08-09T06:15:29Z

@yasNing can you please tell me how you generate train.txt, test.txt, valid.txt etc for your dataset?

yasNing · 2022-08-09T08:12:54Z

@Shreyas-Bhat The dataset format is the same as GATNE. https://github.com/THUDM/GATNE/tree/master/data/twitter

yasNing · 2022-08-09T08:15:20Z

@Shreyas-Bhat And you should use train.txt to generate ent2id.txt. Also, use node2Vec to generate .emd

Shreyas-Bhat · 2022-08-09T08:16:22Z

@yasNing are you using the Twitter dataset? I am using a dataset which is not there in the GitHub link you have shared. Any idea how I can transform that dataset into the .txt format provided? Thanks!

yasNing · 2022-08-09T08:25:08Z

You just create your own datasets seriously by the format. I guess it will be OK.

Shreyas-Bhat · 2022-08-09T10:49:27Z

@yasNing how to generate ent2id.txt?

Shreyas-Bhat · 2022-08-11T18:05:34Z

@yasNing, I mean to say, how to generate ent2id.txt on your own dataset? thanks

colbyham · 2022-08-11T18:11:08Z

@Shreyas-Bhat Take all of the node IDs from your dataset, enumerate over them starting from 0, and for each node add an item to a dictionary. The keys are the original entity ID strings, the values are the converted integer range value. The values should be from 0 to num_entities-1. After creating this dictionary, json dump to ent2id.txt

yasNing · 2022-08-12T06:07:26Z

@Shreyas-Bhat The author has given the data format. If you don't know how to make a dataset in this situation, then I don't think you need to reproduce this paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues on the performance #3

Issues on the performance #3

VoidHaruhi commented Aug 16, 2021

Shreyas-Bhat commented Jun 29, 2022 •

edited

Loading

VoidHaruhi commented Jul 5, 2022

Shreyas-Bhat commented Jul 5, 2022

VoidHaruhi commented Jul 5, 2022

Shreyas-Bhat commented Jul 5, 2022

VoidHaruhi commented Jul 5, 2022

Shreyas-Bhat commented Jul 5, 2022

VoidHaruhi commented Jul 5, 2022

yasNing commented Aug 2, 2022

Shreyas-Bhat commented Aug 3, 2022

colbyham commented Aug 3, 2022 •

edited

Loading

colbyham commented Aug 3, 2022

yasNing commented Aug 4, 2022

Shreyas-Bhat commented Aug 4, 2022

Shreyas-Bhat commented Aug 9, 2022

yasNing commented Aug 9, 2022

yasNing commented Aug 9, 2022

Shreyas-Bhat commented Aug 9, 2022

yasNing commented Aug 9, 2022

Shreyas-Bhat commented Aug 9, 2022

Shreyas-Bhat commented Aug 11, 2022

colbyham commented Aug 11, 2022

yasNing commented Aug 12, 2022

Issues on the performance #3

Issues on the performance #3

Comments

VoidHaruhi commented Aug 16, 2021

Shreyas-Bhat commented Jun 29, 2022 • edited Loading

VoidHaruhi commented Jul 5, 2022

Shreyas-Bhat commented Jul 5, 2022

VoidHaruhi commented Jul 5, 2022

Shreyas-Bhat commented Jul 5, 2022

VoidHaruhi commented Jul 5, 2022

Shreyas-Bhat commented Jul 5, 2022

VoidHaruhi commented Jul 5, 2022

yasNing commented Aug 2, 2022

Shreyas-Bhat commented Aug 3, 2022

colbyham commented Aug 3, 2022 • edited Loading

colbyham commented Aug 3, 2022

yasNing commented Aug 4, 2022

Shreyas-Bhat commented Aug 4, 2022

Shreyas-Bhat commented Aug 9, 2022

yasNing commented Aug 9, 2022

yasNing commented Aug 9, 2022

Shreyas-Bhat commented Aug 9, 2022

yasNing commented Aug 9, 2022

Shreyas-Bhat commented Aug 9, 2022

Shreyas-Bhat commented Aug 11, 2022

colbyham commented Aug 11, 2022

yasNing commented Aug 12, 2022

Shreyas-Bhat commented Jun 29, 2022 •

edited

Loading

colbyham commented Aug 3, 2022 •

edited

Loading