Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing NELL-995 MAP Results #1

Closed
todpole3 opened this issue Jan 10, 2018 · 14 comments
Closed

Reproducing NELL-995 MAP Results #1

todpole3 opened this issue Jan 10, 2018 · 14 comments

Comments

@todpole3
Copy link

todpole3 commented Jan 10, 2018

Thanks very much for releasing the code in accompany with the paper. It definitely makes reproducing the experiments a lot easier. I've been playing with the code base and have some questions on reproducing the NELL-995 experiments.

The codebase does not contain the configuration file for NELL-995 experiments, nor does it contains the evaluation scripts for computing MAP. (Maybe you've missed them from the release?)
I used the hyperparameters reported in "Experimental Details, section 2.3" and the appendix section 8.1 of the paper, which results in the following configuration file:

data_input_dir="datasets/data_preprocessed/nell-995/"
vocab_dir="datasets/data_preprocessed/nell-995/vocab"
total_iterations=1000
path_length=3
hidden_size=400
embedding_size=200
batch_size=64
beta=0.05
Lambda=0.02
use_entity_embeddings=1
train_entity_embeddings=1
train_relation_embeddings=1
base_output_dir="output/nell-995/"
load_model=1
model_load_dir="saved_models/nell-995/model.ckpt"

I run train & test as specified in the README, and evaluate the decoding results using the MAP computation script produced by the DeepPath paper. (I assumed that the experiment setup is exactly the same as the DeepPath paper since you compared head-to-head with them.)

However, the MAP results I obtained this way is significantly lower compared to the reported results.

MINERVA concept_athleteplaysinleague MAP: 0.810746658312 (380 queries evaluated)
MINERVA concept_athleteplaysforteam MAP: 0.649309434089 (386 queries evaluated)
MINERVA concept_organizationheadquarteredincity MAP: 0.944878371403 (246 queries evaluated)
MINERVA concept_athleteplayssport MAP: 0.919186046512 (602 queries evaluated)
MINERVA concept_personborninlocation MAP: 0.775690686628 (192 queries evaluated)
MINERVA concept_teamplayssport MAP: 0.762183612184 (111 queries evaluated)
MINERVA concept_athletehomestadium MAP: 0.519108225108 (200 queries evaluated)
MINERVA concept_worksfor MAP: 0.663530575465 (420 queries evaluated)

I did a few variation on embedding dimensions and also tried to freeze entity embeddings, yet none of the trials produced numbers close to the results tabulated in the MINERVA paper.

Would you please clarify the experiment setup for computing MAP?
I want to make sure I did set the hyperparameters to the correct value. Besides, the DeepPath paper used a relation-dependent underlying graph per relation during inference. Did you also vary the graph per relation or used a base graph for all relations like you did for other datasets?

Many thanks.

@shehzaadzd
Copy link
Owner

Hi Victoria,
Thanks for trying out our code.
Can you kindly point me to the evaluation script you used for evaluation.
Unlike DeepPath, we train a single model for all the relations and hence we use a single graph. However to keep evaluation correct, we remove the edges corresponding to the query triple. For example, for the query triple John_Doe --works_for--> Google
When MINERVA starts to walk from the node John_Doe, it is not allowed to take the edge works_for to reach Google.
(ref: https://github.com/shehzaadzd/MINERVA/blob/master/code/data/grapher.py#L56)

@todpole3
Copy link
Author

Hi Shehzaad,

I generated the results by running scripts in the repo, so yes, I think the grapher should be the same as yours. It uses the graph named "graph.txt" in the data_preprocessed folder, which contains 304,434 facts. I noticed that this is the number of edges in the full graph (154,213x2) excluding the test edges (3,992).

I did the evaluation by extracting the prediction scores of MINERVA written in the test_beam folder, and then passed to the same evaluation script used by DeepPath.
Here is my evaluation script:
https://gist.github.com/todpole3/51e1704b5efd85cb67b9ee0c95e0b028

@todpole3
Copy link
Author

@shehzaadzd Any updates on this one?

@shehzaadzd
Copy link
Owner

Hi Victoria,
Using the script you shared and the answers I got using our model I was able to replicate our scores. I have modified your script to use my logs. I'm sharing my answers (link) with you to check.
I was not able to reproduce the results you got as I didn't have the pickle files that you used.
I hope this helps.

@todpole3
Copy link
Author

todpole3 commented Feb 1, 2018

Thanks very much. I will check the difference and get back to you.

@posenhuang
Copy link

@shehzaadzd @todpole3, I encounter the same problem. What is the test_prediction_path data format? I think there seems to be a script to parse the results in test_beam?
Thanks a lot!

@shehzaadzd
Copy link
Owner

shehzaadzd commented Feb 8, 2018

Hi Po sen, I've uploaded the answers generated by our model on each nell task. (link). I'm adding the code to print these answers in our main repo and I will push it soon. Hope this helps.
Sorry for the delayed response.

@posenhuang
Copy link

posenhuang commented Feb 8, 2018

Thanks a lot! Looking forward to the script.
One more thing related to the setup. If I want to train on one relation only as in DeepPath (instead of jointly), should I only use one of the relations in train.txt?
such as grep concept:worksfor train.txt > train_worksfor.txt?
Thanks!


@shehzaadzd, any update for the script?

@rajarshd
Copy link
Collaborator

rajarshd commented Mar 4, 2018

@posenhuang - We apologize for the late reply from our side. To train on one relation; you can train on the individual graphs (exactly as in the DeepPath). For example, the data for concept:worksfor would be in here.
Also the script is the one which @shehzaadzd had linked couple of answers before (link). You can also use the same script here.
Also wanted to check with @todpole3 -- are you still facing any issues?

@todpole3
Copy link
Author

todpole3 commented Mar 7, 2018

Thanks, I haven't been experimenting with this dataset lately. Will check and let you know.

@Lee-zix
Copy link

Lee-zix commented May 2, 2018

Thanks very much for the nice code! I reproduce the experiment on the dataset NELL, However, when i generate separate models on each nell tasks using the default config files.(using_entity_embedding=1 for all task) I can produce almost all result of task.
But if i run a whole model for all task, the result is also significantly lower compared to the reported results., the result is below:
athleteplaysinleague:
MINERVA MAP: 0.7619328833895763 (381 queries evaluated)
worksfor:
MINERVA MAP: 0.7451084742652438 (421 queries evaluated)
organizationhiredperson:
MINERVA MAP: 0.8572348007748 (349 queries evaluated)
athleteplayssport:
MINERVA MAP: 0.9160862354892205 (603 queries evaluated)
teamplayssport:
MINERVA MAP: 0.7702593537414967 (112 queries evaluated)
personborninlocation:
MINERVA MAP: 0.7865804604146573 (193 queries evaluated)
athletehomestadium:
MINERVA MAP: 0.5223731630448047 (201 queries evaluated)
organizationheadquarteredincity:
MINERVA MAP: 0.9133535585342815 (249 queries evaluated)
athleteplaysforteam:
MINERVA MAP: 0.6284948040761995 (387 queries evaluated)

the config file i used is below:

 data_input_dir="datasets/data_preprocessed/nell/" . 
  vocab_dir="datasets/data_preprocessed/nell/vocab"
  total_iterations=3000
  path_length=3   //according to the appendix
  hidden_size=100 .   / /according to the Experimental Details, section 2.3 the hidden_size is 400:(In your code ,hidden_size = 4 * self.hidden_size, so i set the parameter to 100)
  embedding_size=100  //according to the Experimental Details, section 2.3 the embedding_size is 200 (In your code : self.entity_embedding_placeholder = tf.placeholder(tf.float32, [self.entity_vocab_size, 2 * self.embedding_size]), so i set the parameter to 100)
 batch_size=128  //default value
 beta=0.05        //according to the appendix
 Lambda=0.02   //according to the appendix
 use_entity_embeddings=1 
 train_entity_embeddings=1
train_relation_embeddings=1
 base_output_dir="output/nell/worksfor"
  load_model=0
 model_load_dir="/home/sdhuliawala/logs/RL-PathRNN/nnnn/45de_3_0.06_10_0.0/model/model.ckpt"
 18 nell_evaluation=1

Even the answer you provided seems correct, I still want to make sure I did set the hyperparameters to the correct value for your single model for all the relations?? Thanks a lot!!!

@shehzaadzd
Copy link
Owner

shehzaadzd commented May 2, 2018

Hi Lee,
Thanks for trying out our code!
Can you train the single model on the data in datasets/data_preprocessed/nell-995
The datasets/data_preprocessed/nell data was changed a bit for the purpose of link prediction. It has a fewer number of examples in the train. We changed this to create a proper validation set.
Tell me if you still have issues reproducing the results.
-S

@Lee-zix
Copy link

Lee-zix commented May 2, 2018

Thanks for telling me this, I will run the model on nell-995 dataset and check the result ! !

@Lee-zix
Copy link

Lee-zix commented May 4, 2018

Hi shehzaadzd
I have run the experiment on the dataset nell-995 with the config file above, this is my result

athleteplaysinleague: 
MINERVA MAP: 0.7824126150897805 (381 queries evaluated)
worksfor: 
MINERVA MAP: 0.7689410483947302 (421 queries evaluated)**(0.825)**
organizationhiredperson: 
MINERVA MAP: 0.8717628574212938 (349 queries evaluated) (0.851)
athleteplayssport: 
MINERVA MAP: 0.9177169707020453 (603 queries evaluated)  (0.985)
teamplayssport: 
MINERVA MAP: 0.6906675170068028 (112 queries evaluated)**(0.846)**
personborninlocation: 
MINERVA MAP: 0.7665333946422028 (193 queries evaluated)**(0.793)**
athletehomestadium: 
MINERVA MAP: 0.5319267658819898 (201 queries evaluated)**(0.895)**
organizationheadquarteredincity: 
MINERVA MAP: 0.9453257474341812 (249 queries evaluated) (0.946)
athleteplaysforteam: 
MINERVA MAP: 0.6555836139169473 (387 queries evaluated) **(0.824)**

config file:

LSTM_Layer=1
data_input_dir="datasets/data_preprocessed/nell/" . 
  vocab_dir="datasets/data_preprocessed/nell/vocab"
  total_iterations=3000
  path_length=3   //according to the appendix
  hidden_size=100 .   / /according to the Experimental Details, section 2.3 the hidden_size is 400:(In your code ,hidden_size = 4 * self.hidden_size, so i set the parameter to 100)
  embedding_size=100  //according to the Experimental Details, section 2.3 the embedding_size is 200 (In your code : self.entity_embedding_placeholder = tf.placeholder(tf.float32, [self.entity_vocab_size, 2 * self.embedding_size]), so i set the parameter to 100)
 batch_size=128  //default value
 beta=0.05        //according to the appendix
 Lambda=0.02   //according to the appendix
 use_entity_embeddings=1 
 train_entity_embeddings=1
train_relation_embeddings=1
 base_output_dir="output/nell/worksfor"
  load_model=0
 model_load_dir="/home/sdhuliawala/logs/RL-PathRNN/nnnn/45de_3_0.06_10_0.0/model/model.ckpt"
 18 nell_evaluation=1

I also try to set the embedding size and hidden size to 50 ,the result is below

athleteplaysinleague: 
MINERVA MAP: 0.7700987187207659 (381 queries evaluated)
worksfor: 
MINERVA MAP: 0.7844730816821078 (421 queries evaluated)
organizationhiredperson: 
MINERVA MAP: 0.8710068284974013 (349 queries evaluated)
athleteplayssport: 
MINERVA MAP: 0.9182974018794915 (603 queries evaluated)
teamplayssport: 
MINERVA MAP: 0.7468537414965987 (112 queries evaluated)
personborninlocation: 
MINERVA MAP: 0.7555456489394312 (193 queries evaluated)
athletehomestadium: 
MINERVA MAP: 0.5220393343527672 (201 queries evaluated)
organizationheadquarteredincity: 
MINERVA MAP: 0.915163829922866 (249 queries evaluated)
athleteplaysforteam: 
MINERVA MAP: 0.6305270311084265 (387 queries evaluated)

config file:

LSTM_Layer=1
data_input_dir="datasets/data_preprocessed/nell/" . 
  vocab_dir="datasets/data_preprocessed/nell/vocab"
  total_iterations=3000
  path_length=3   //according to the appendix
  hidden_size=50 .   / /according to the Experimental Details, section 2.3 the hidden_size is 400:(In your code ,hidden_size = 4 * self.hidden_size, so i set the parameter to 100)
  embedding_size=50  //according to the Experimental Details, section 2.3 the embedding_size is 200 (In your code : self.entity_embedding_placeholder = tf.placeholder(tf.float32, [self.entity_vocab_size, 2 * self.embedding_size]), so i set the parameter to 100)
 batch_size=128  //default value
 beta=0.05        //according to the appendix
 Lambda=0.02   //according to the appendix
 use_entity_embeddings=1 
 train_entity_embeddings=1
train_relation_embeddings=1
 base_output_dir="output/nell/worksfor"
  load_model=0
 model_load_dir="/home/sdhuliawala/logs/RL-PathRNN/nnnn/45de_3_0.06_10_0.0/model/model.ckpt"
 18 nell_evaluation=1

finally , i set the LSTM Layers to 3 according to your paper, the result

athleteplaysinleague: 
MINERVA MAP: 0.7820689080531601 (381 queries evaluated)
worksfor: 
MINERVA MAP: 0.7692186541355187 (421 queries evaluated)
organizationhiredperson: 
MINERVA MAP: 0.865689742796194 (349 queries evaluated)
athleteplayssport: 
MINERVA MAP: 0.9081260364842456 (603 queries evaluated)
teamplayssport: 
MINERVA MAP: 0.6653698979591837 (112 queries evaluated)
personborninlocation: 
MINERVA MAP: 0.7679808821000531 (193 queries evaluated)
athletehomestadium: 
MINERVA MAP: 0.5379048121585435 (201 queries evaluated)
organizationheadquarteredincity: 
MINERVA MAP: 0.9487218716134379 (249 queries evaluated)
athleteplaysforteam: 
MINERVA MAP: 0.6487594662013266 (387 queries evaluated)

However, none of the results are similar to the result in paper, I think i set the hyperparameters completely according to the paper or the appendix. is my config file the optimal parameter for your experiment. Could you help me to reproduce the results? Thanks a lot !!!!!!!!

@todpole3 todpole3 changed the title Problem Reproducing NELL-995 MAP Results Reproducing NELL-995 MAP Results May 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants