Reproducing NELL-995 MAP Results #1

todpole3 · 2018-01-10T09:51:11Z

Thanks very much for releasing the code in accompany with the paper. It definitely makes reproducing the experiments a lot easier. I've been playing with the code base and have some questions on reproducing the NELL-995 experiments.

The codebase does not contain the configuration file for NELL-995 experiments, nor does it contains the evaluation scripts for computing MAP. (Maybe you've missed them from the release?)
I used the hyperparameters reported in "Experimental Details, section 2.3" and the appendix section 8.1 of the paper, which results in the following configuration file:

data_input_dir="datasets/data_preprocessed/nell-995/"
vocab_dir="datasets/data_preprocessed/nell-995/vocab"
total_iterations=1000
path_length=3
hidden_size=400
embedding_size=200
batch_size=64
beta=0.05
Lambda=0.02
use_entity_embeddings=1
train_entity_embeddings=1
train_relation_embeddings=1
base_output_dir="output/nell-995/"
load_model=1
model_load_dir="saved_models/nell-995/model.ckpt"

I run train & test as specified in the README, and evaluate the decoding results using the MAP computation script produced by the DeepPath paper. (I assumed that the experiment setup is exactly the same as the DeepPath paper since you compared head-to-head with them.)

However, the MAP results I obtained this way is significantly lower compared to the reported results.

MINERVA concept_athleteplaysinleague MAP: 0.810746658312 (380 queries evaluated)
MINERVA concept_athleteplaysforteam MAP: 0.649309434089 (386 queries evaluated)
MINERVA concept_organizationheadquarteredincity MAP: 0.944878371403 (246 queries evaluated)
MINERVA concept_athleteplayssport MAP: 0.919186046512 (602 queries evaluated)
MINERVA concept_personborninlocation MAP: 0.775690686628 (192 queries evaluated)
MINERVA concept_teamplayssport MAP: 0.762183612184 (111 queries evaluated)
MINERVA concept_athletehomestadium MAP: 0.519108225108 (200 queries evaluated)
MINERVA concept_worksfor MAP: 0.663530575465 (420 queries evaluated)

I did a few variation on embedding dimensions and also tried to freeze entity embeddings, yet none of the trials produced numbers close to the results tabulated in the MINERVA paper.

Would you please clarify the experiment setup for computing MAP?
I want to make sure I did set the hyperparameters to the correct value. Besides, the DeepPath paper used a relation-dependent underlying graph per relation during inference. Did you also vary the graph per relation or used a base graph for all relations like you did for other datasets?

Many thanks.

The text was updated successfully, but these errors were encountered:

shehzaadzd · 2018-01-22T17:25:58Z

Hi Victoria,
Thanks for trying out our code.
Can you kindly point me to the evaluation script you used for evaluation.
Unlike DeepPath, we train a single model for all the relations and hence we use a single graph. However to keep evaluation correct, we remove the edges corresponding to the query triple. For example, for the query triple John_Doe --works_for--> Google
When MINERVA starts to walk from the node John_Doe, it is not allowed to take the edge works_for to reach Google.
(ref: https://github.com/shehzaadzd/MINERVA/blob/master/code/data/grapher.py#L56)

todpole3 · 2018-01-22T19:11:58Z

Hi Shehzaad,

I generated the results by running scripts in the repo, so yes, I think the grapher should be the same as yours. It uses the graph named "graph.txt" in the data_preprocessed folder, which contains 304,434 facts. I noticed that this is the number of edges in the full graph (154,213x2) excluding the test edges (3,992).

I did the evaluation by extracting the prediction scores of MINERVA written in the test_beam folder, and then passed to the same evaluation script used by DeepPath.
Here is my evaluation script:
https://gist.github.com/todpole3/51e1704b5efd85cb67b9ee0c95e0b028

todpole3 · 2018-01-31T00:59:58Z

@shehzaadzd Any updates on this one?

shehzaadzd · 2018-01-31T01:25:05Z

Hi Victoria,
Using the script you shared and the answers I got using our model I was able to replicate our scores. I have modified your script to use my logs. I'm sharing my answers (link) with you to check.
I was not able to reproduce the results you got as I didn't have the pickle files that you used.
I hope this helps.

todpole3 · 2018-02-01T07:27:41Z

Thanks very much. I will check the difference and get back to you.

posenhuang · 2018-02-07T23:55:16Z

@shehzaadzd @todpole3, I encounter the same problem. What is the test_prediction_path data format? I think there seems to be a script to parse the results in test_beam?
Thanks a lot!

shehzaadzd · 2018-02-08T01:47:13Z

Hi Po sen, I've uploaded the answers generated by our model on each nell task. (link). I'm adding the code to print these answers in our main repo and I will push it soon. Hope this helps.
Sorry for the delayed response.

posenhuang · 2018-02-08T02:43:49Z

Thanks a lot! Looking forward to the script.
One more thing related to the setup. If I want to train on one relation only as in DeepPath (instead of jointly), should I only use one of the relations in train.txt?
such as grep concept:worksfor train.txt > train_worksfor.txt?
Thanks!

@shehzaadzd, any update for the script?

rajarshd · 2018-03-04T07:58:49Z

@posenhuang - We apologize for the late reply from our side. To train on one relation; you can train on the individual graphs (exactly as in the DeepPath). For example, the data for concept:worksfor would be in here.
Also the script is the one which @shehzaadzd had linked couple of answers before (link). You can also use the same script here.
Also wanted to check with @todpole3 -- are you still facing any issues?

todpole3 · 2018-03-07T20:01:57Z

Thanks, I haven't been experimenting with this dataset lately. Will check and let you know.

Lee-zix · 2018-05-02T14:24:00Z

Thanks very much for the nice code! I reproduce the experiment on the dataset NELL, However, when i generate separate models on each nell tasks using the default config files.(using_entity_embedding=1 for all task) I can produce almost all result of task.
But if i run a whole model for all task, the result is also significantly lower compared to the reported results., the result is below:
athleteplaysinleague:
MINERVA MAP: 0.7619328833895763 (381 queries evaluated)
worksfor:
MINERVA MAP: 0.7451084742652438 (421 queries evaluated)
organizationhiredperson:
MINERVA MAP: 0.8572348007748 (349 queries evaluated)
athleteplayssport:
MINERVA MAP: 0.9160862354892205 (603 queries evaluated)
teamplayssport:
MINERVA MAP: 0.7702593537414967 (112 queries evaluated)
personborninlocation:
MINERVA MAP: 0.7865804604146573 (193 queries evaluated)
athletehomestadium:
MINERVA MAP: 0.5223731630448047 (201 queries evaluated)
organizationheadquarteredincity:
MINERVA MAP: 0.9133535585342815 (249 queries evaluated)
athleteplaysforteam:
MINERVA MAP: 0.6284948040761995 (387 queries evaluated)

the config file i used is below:

 data_input_dir="datasets/data_preprocessed/nell/" . 
  vocab_dir="datasets/data_preprocessed/nell/vocab"
  total_iterations=3000
  path_length=3   //according to the appendix
  hidden_size=100 .   / /according to the Experimental Details, section 2.3 the hidden_size is 400:(In your code ,hidden_size = 4 * self.hidden_size, so i set the parameter to 100)
  embedding_size=100  //according to the Experimental Details, section 2.3 the embedding_size is 200 (In your code : self.entity_embedding_placeholder = tf.placeholder(tf.float32, [self.entity_vocab_size, 2 * self.embedding_size]), so i set the parameter to 100)
 batch_size=128  //default value
 beta=0.05        //according to the appendix
 Lambda=0.02   //according to the appendix
 use_entity_embeddings=1 
 train_entity_embeddings=1
train_relation_embeddings=1
 base_output_dir="output/nell/worksfor"
  load_model=0
 model_load_dir="/home/sdhuliawala/logs/RL-PathRNN/nnnn/45de_3_0.06_10_0.0/model/model.ckpt"
 18 nell_evaluation=1

Even the answer you provided seems correct, I still want to make sure I did set the hyperparameters to the correct value for your single model for all the relations?? Thanks a lot!!!

shehzaadzd · 2018-05-02T14:51:41Z

Hi Lee,
Thanks for trying out our code!
Can you train the single model on the data in datasets/data_preprocessed/nell-995
The datasets/data_preprocessed/nell data was changed a bit for the purpose of link prediction. It has a fewer number of examples in the train. We changed this to create a proper validation set.
Tell me if you still have issues reproducing the results.
-S

Lee-zix · 2018-05-02T15:31:26Z

Thanks for telling me this, I will run the model on nell-995 dataset and check the result ! !

Lee-zix · 2018-05-04T04:36:41Z

Hi shehzaadzd
I have run the experiment on the dataset nell-995 with the config file above, this is my result

athleteplaysinleague: 
MINERVA MAP: 0.7824126150897805 (381 queries evaluated)
worksfor: 
MINERVA MAP: 0.7689410483947302 (421 queries evaluated)**(0.825)**
organizationhiredperson: 
MINERVA MAP: 0.8717628574212938 (349 queries evaluated) (0.851)
athleteplayssport: 
MINERVA MAP: 0.9177169707020453 (603 queries evaluated)  (0.985)
teamplayssport: 
MINERVA MAP: 0.6906675170068028 (112 queries evaluated)**(0.846)**
personborninlocation: 
MINERVA MAP: 0.7665333946422028 (193 queries evaluated)**(0.793)**
athletehomestadium: 
MINERVA MAP: 0.5319267658819898 (201 queries evaluated)**(0.895)**
organizationheadquarteredincity: 
MINERVA MAP: 0.9453257474341812 (249 queries evaluated) (0.946)
athleteplaysforteam: 
MINERVA MAP: 0.6555836139169473 (387 queries evaluated) **(0.824)**

config file:

LSTM_Layer=1
data_input_dir="datasets/data_preprocessed/nell/" . 
  vocab_dir="datasets/data_preprocessed/nell/vocab"
  total_iterations=3000
  path_length=3   //according to the appendix
  hidden_size=100 .   / /according to the Experimental Details, section 2.3 the hidden_size is 400:(In your code ,hidden_size = 4 * self.hidden_size, so i set the parameter to 100)
  embedding_size=100  //according to the Experimental Details, section 2.3 the embedding_size is 200 (In your code : self.entity_embedding_placeholder = tf.placeholder(tf.float32, [self.entity_vocab_size, 2 * self.embedding_size]), so i set the parameter to 100)
 batch_size=128  //default value
 beta=0.05        //according to the appendix
 Lambda=0.02   //according to the appendix
 use_entity_embeddings=1 
 train_entity_embeddings=1
train_relation_embeddings=1
 base_output_dir="output/nell/worksfor"
  load_model=0
 model_load_dir="/home/sdhuliawala/logs/RL-PathRNN/nnnn/45de_3_0.06_10_0.0/model/model.ckpt"
 18 nell_evaluation=1

I also try to set the embedding size and hidden size to 50 ,the result is below

athleteplaysinleague: 
MINERVA MAP: 0.7700987187207659 (381 queries evaluated)
worksfor: 
MINERVA MAP: 0.7844730816821078 (421 queries evaluated)
organizationhiredperson: 
MINERVA MAP: 0.8710068284974013 (349 queries evaluated)
athleteplayssport: 
MINERVA MAP: 0.9182974018794915 (603 queries evaluated)
teamplayssport: 
MINERVA MAP: 0.7468537414965987 (112 queries evaluated)
personborninlocation: 
MINERVA MAP: 0.7555456489394312 (193 queries evaluated)
athletehomestadium: 
MINERVA MAP: 0.5220393343527672 (201 queries evaluated)
organizationheadquarteredincity: 
MINERVA MAP: 0.915163829922866 (249 queries evaluated)
athleteplaysforteam: 
MINERVA MAP: 0.6305270311084265 (387 queries evaluated)

config file:

LSTM_Layer=1
data_input_dir="datasets/data_preprocessed/nell/" . 
  vocab_dir="datasets/data_preprocessed/nell/vocab"
  total_iterations=3000
  path_length=3   //according to the appendix
  hidden_size=50 .   / /according to the Experimental Details, section 2.3 the hidden_size is 400:(In your code ,hidden_size = 4 * self.hidden_size, so i set the parameter to 100)
  embedding_size=50  //according to the Experimental Details, section 2.3 the embedding_size is 200 (In your code : self.entity_embedding_placeholder = tf.placeholder(tf.float32, [self.entity_vocab_size, 2 * self.embedding_size]), so i set the parameter to 100)
 batch_size=128  //default value
 beta=0.05        //according to the appendix
 Lambda=0.02   //according to the appendix
 use_entity_embeddings=1 
 train_entity_embeddings=1
train_relation_embeddings=1
 base_output_dir="output/nell/worksfor"
  load_model=0
 model_load_dir="/home/sdhuliawala/logs/RL-PathRNN/nnnn/45de_3_0.06_10_0.0/model/model.ckpt"
 18 nell_evaluation=1

finally , i set the LSTM Layers to 3 according to your paper, the result

athleteplaysinleague: 
MINERVA MAP: 0.7820689080531601 (381 queries evaluated)
worksfor: 
MINERVA MAP: 0.7692186541355187 (421 queries evaluated)
organizationhiredperson: 
MINERVA MAP: 0.865689742796194 (349 queries evaluated)
athleteplayssport: 
MINERVA MAP: 0.9081260364842456 (603 queries evaluated)
teamplayssport: 
MINERVA MAP: 0.6653698979591837 (112 queries evaluated)
personborninlocation: 
MINERVA MAP: 0.7679808821000531 (193 queries evaluated)
athletehomestadium: 
MINERVA MAP: 0.5379048121585435 (201 queries evaluated)
organizationheadquarteredincity: 
MINERVA MAP: 0.9487218716134379 (249 queries evaluated)
athleteplaysforteam: 
MINERVA MAP: 0.6487594662013266 (387 queries evaluated)

However, none of the results are similar to the result in paper, I think i set the hyperparameters completely according to the paper or the appendix. is my config file the optimal parameter for your experiment. Could you help me to reproduce the results? Thanks a lot !!!!!!!!

manzilzaheer closed this as completed Apr 13, 2018

todpole3 changed the title ~~Problem Reproducing NELL-995 MAP Results~~ Reproducing NELL-995 MAP Results May 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducing NELL-995 MAP Results #1

Reproducing NELL-995 MAP Results #1

todpole3 commented Jan 10, 2018 •

edited

shehzaadzd commented Jan 22, 2018

todpole3 commented Jan 22, 2018

todpole3 commented Jan 31, 2018

shehzaadzd commented Jan 31, 2018

todpole3 commented Feb 1, 2018

posenhuang commented Feb 7, 2018

shehzaadzd commented Feb 8, 2018 •

edited

posenhuang commented Feb 8, 2018 •

edited

rajarshd commented Mar 4, 2018 •

edited

todpole3 commented Mar 7, 2018 •

edited

Lee-zix commented May 2, 2018

shehzaadzd commented May 2, 2018 •

edited

Lee-zix commented May 2, 2018

Lee-zix commented May 4, 2018

Reproducing NELL-995 MAP Results #1

Reproducing NELL-995 MAP Results #1

Comments

todpole3 commented Jan 10, 2018 • edited

shehzaadzd commented Jan 22, 2018

todpole3 commented Jan 22, 2018

todpole3 commented Jan 31, 2018

shehzaadzd commented Jan 31, 2018

todpole3 commented Feb 1, 2018

posenhuang commented Feb 7, 2018

shehzaadzd commented Feb 8, 2018 • edited

posenhuang commented Feb 8, 2018 • edited

rajarshd commented Mar 4, 2018 • edited

todpole3 commented Mar 7, 2018 • edited

Lee-zix commented May 2, 2018

shehzaadzd commented May 2, 2018 • edited

Lee-zix commented May 2, 2018

Lee-zix commented May 4, 2018

todpole3 commented Jan 10, 2018 •

edited

shehzaadzd commented Feb 8, 2018 •

edited

posenhuang commented Feb 8, 2018 •

edited

rajarshd commented Mar 4, 2018 •

edited

todpole3 commented Mar 7, 2018 •

edited

shehzaadzd commented May 2, 2018 •

edited