Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues on the performance #3

Open
VoidHaruhi opened this issue Aug 16, 2021 · 23 comments
Open

Issues on the performance #3

VoidHaruhi opened this issue Aug 16, 2021 · 23 comments

Comments

@VoidHaruhi
Copy link

I used the hyperparameters provided, but the performance is actually bad. Take DBLP and twitter as examples, each of the results is about 20% below your result in the paper. Here is the command lines I used and the results.
DBLP:
nohup python3 main.py --base_embedding_dim=128 --batch_size=128 --beam_width=4 --checkpoint=15 --d_ff=512 --d_k=64 --d_model=128 --d_v=64 --data_name='dblp' --false_edge_gen='double' --ft_batch_size=100 --ft_checkpoint=500 --ft_d_ff=512 --ft_drop_rate=0.1 --ft_input_option='last4_cat' --ft_layer='ffn' --ft_lr=5e-05 --ft_n_epochs=10 --gcn_option='no_gcn' --get_bert_encoder_embeddings=False --lr=0.0001 --max_length=12 --n_epochs=10 --n_heads=4 --n_layers=4 --node_edge_composition_func='mult' --num_gcn_layers=2 --num_walks_per_node=5 --outdir='./output/dblp/' --path_option='shortest' --pretrained_method='node2vec' --walk_type='dfs' 2>&1 &
result
Twitter:
nohup python3 main.py --base_embedding_dim=128 --batch_size=128 --beam_width=4 --checkpoint=15 --d_ff=512 --d_k=64 --d_model=128 --d_v=64 --data_name='twitter' --emb_dir=None --false_edge_gen='double' --ft_batch_size=100 --ft_checkpoint=500 --ft_d_ff=512 --ft_drop_rate=0.1 --ft_input_option='last4_cat' --ft_layer='ffn' --ft_lr=5e-05 --ft_n_epochs=10 --gcn_option='no_gcn' --get_bert_encoder_embeddings=False --lr=0.0001 --max_length=12 --n_epochs=10 --n_heads=4 --n_layers=4 --node_edge_composition_func='mult' --num_gcn_layers=2 --num_walks_per_node=10 --path_option='shortest' --pretrained_embeddings='./embed/twitter/twitter.emd' --outdir='./output/twitter/' --pretrained_method='node2vec' --walk_type='dfs' 2>&1 &
image

@Shreyas-Bhat
Copy link

Shreyas-Bhat commented Jun 29, 2022

@VoidHaruhi what did you use for 'data/twitter/ent2id.txt' and 'twitter.emd' please?

@VoidHaruhi
Copy link
Author

@VoidHaruhi what did you use for 'data/twitter/ent2id.txt' and 'twitter.emd' please?

@Shreyas-Bhat "ent2id.txt" is an original data file, data preprocess use this file to convert entity number to id in the graph. The command line "--pretrained_embeddings='./embed/twitter/twitter.emd'" means that you save the pretrained embedding into which directory

@Shreyas-Bhat
Copy link

@VoidHaruhi Thanks! where can I get this "ent2id.txt" and the pre-trained embedding?

@VoidHaruhi
Copy link
Author

@Shreyas-Bhat I download the dataset from https://github.com/THUDM/GATNE and put the data directory "./GATNE/data/" into "./SLiCE/data/". You can debug the code and check where the model take them as input.

@Shreyas-Bhat
Copy link

@VoidHaruhi thanks for your reply and sorry for asking again, but the code is not running without "ent2id.txt" and pre-trained embedding, which is absent in https://github.com/THUDM/GATNE.

@VoidHaruhi
Copy link
Author

@Shreyas-Bhat Sorry for the wrong information. It has been a long time since I ran this code, so I cannot remember how I got "ent2id.txt". Please contact the author.
The embedding file is generated by the pretraining phase in the code. If you have an embedding file, it will be calculated again.

@Shreyas-Bhat
Copy link

@VoidHaruhi Sure. If I am not asking for too much, can you please share the scripts you used to run this model? As you have been able to produce the training procedure.

@VoidHaruhi
Copy link
Author

@Shreyas-Bhat You need to install the python packages in requirements.txt and torch-geometric (maybe a bit annoying because of version conflicts). And check the input and output path. I'm not sure but it will be ok with careful debugging.

@yasNing
Copy link

yasNing commented Aug 2, 2022

@Shreyas-Bhat Hi, I',m also the guy who want to run this code, and really want to know if you get the ent2id.txt and the embeddings. I really need your kind help.

@Shreyas-Bhat
Copy link

@yasNing not yet :(

@colbyham
Copy link
Collaborator

colbyham commented Aug 3, 2022

@Shreyas-Bhat @yasNing ent2id.txt can be generated by taking
all of the node IDs from the dataset and mapping them to 0 based integer indexing. Same with rel2id.txt but with relationships. Below are example files based on the data from twitter in the GATNE repo. To generate pre-trained embeddings use node2vec on your training data.

@colbyham
Copy link
Collaborator

colbyham commented Aug 3, 2022

Trying file upload again for example mappings, upload wasn't working
ent2id.txt
rel2id.txt

@yasNing
Copy link

yasNing commented Aug 4, 2022

@colby-ham Thanks very much! I guess I know how to buld my datasets!

@Shreyas-Bhat
Copy link

@colby-ham Thank you very much for clarifying. @yasNing can we connect outside this thread? I am working on a separate dataset too and could use your help in the steps.

@Shreyas-Bhat
Copy link

@yasNing can you please tell me how you generate train.txt, test.txt, valid.txt etc for your dataset?

@yasNing
Copy link

yasNing commented Aug 9, 2022

@Shreyas-Bhat The dataset format is the same as GATNE. https://github.com/THUDM/GATNE/tree/master/data/twitter

@yasNing
Copy link

yasNing commented Aug 9, 2022

@Shreyas-Bhat And you should use train.txt to generate ent2id.txt. Also, use node2Vec to generate .emd

@Shreyas-Bhat
Copy link

@yasNing are you using the Twitter dataset? I am using a dataset which is not there in the GitHub link you have shared. Any idea how I can transform that dataset into the .txt format provided? Thanks!

@yasNing
Copy link

yasNing commented Aug 9, 2022

You just create your own datasets seriously by the format. I guess it will be OK.

@Shreyas-Bhat
Copy link

@yasNing how to generate ent2id.txt?

@Shreyas-Bhat
Copy link

@yasNing, I mean to say, how to generate ent2id.txt on your own dataset? thanks

@colbyham
Copy link
Collaborator

@Shreyas-Bhat Take all of the node IDs from your dataset, enumerate over them starting from 0, and for each node add an item to a dictionary. The keys are the original entity ID strings, the values are the converted integer range value. The values should be from 0 to num_entities-1. After creating this dictionary, json dump to ent2id.txt

@yasNing
Copy link

yasNing commented Aug 12, 2022

@Shreyas-Bhat The author has given the data format. If you don't know how to make a dataset in this situation, then I don't think you need to reproduce this paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants