-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I reproduce the result of ConvE on the FB15k-237, but... #6
Comments
Hello!
Be aware that the config is
We can know the best hits@10 from the last 2 lines of logs. It supports the conclusion that
Therefore, the negative sampling is not the key step to raise ConvE from 50% to 60% at hits@10. The truth remains unknown. |
Hi, Thanks for your interest in our paper! Just to clarify, were you able to figure out the conversion from Tensorflow to Pytorch? If not, maybe it helps to follow our implementation of the train iterator here, which calls the negative sampling implemented here. Regarding the second post, we have noticed that negative sampling helps overall, but I don't remember the specific difference for each metric and dataset. Do you also observe the same difference between with/without negative sampling on other datasets or and metrics beyond hits@10 on FB15k-237 ? |
Hi, I use the code of this repo to implement As we all known, TransE is the simplest classical model, and performs I provide my from __future__ import absolute_import, division, print_function
import logging
import tensorflow as tf
from .utils.amsgrad import AMSGradOptimizer
__all__ = ['ConvE']
LOGGER = logging.getLogger(__name__)
def _create_summaries(name, tensor):
"""Creates various summaries for the provided tensor,
which are useful for TensorBoard visualizations).
"""
with tf.name_scope(name + '/summaries'):
mean = tf.reduce_mean(tensor)
tf.summary.scalar('mean', mean)
with tf.name_scope('stddev'):
stddev = tf.sqrt(tf.reduce_mean(tf.square(tensor - mean)))
tf.summary.scalar('stddev', stddev)
tf.summary.scalar('max', tf.reduce_max(tensor))
tf.summary.scalar('min', tf.reduce_min(tensor))
tf.summary.histogram('histogram', tensor)
class ConvE(object):
def __init__(self, model_descriptors):
self.use_negative_sampling = model_descriptors['use_negative_sampling']
self.label_smoothing_epsilon = model_descriptors['label_smoothing_epsilon']
self.num_ent = model_descriptors['num_ent']
self.num_rel = model_descriptors['num_rel']
self.ent_emb_size = model_descriptors['ent_emb_size']
self.rel_emb_size = model_descriptors['rel_emb_size']
self.is_parameter_lookup = model_descriptors.get('do_parameter_lookup', False)
self.conv_filter_height = model_descriptors.get('conv_filter_height', 3)
self.conv_filter_width = model_descriptors.get('conv_filter_width', 3)
self.conv_num_channels = model_descriptors.get('conv_num_channels', 32)
self.concat_rel = model_descriptors.get('concat_rel', False)
self.context_rel_conv = model_descriptors.get('context_rel_conv', None)
self.context_rel_out = model_descriptors.get('context_rel_out', None)
self.context_rel_dropout = model_descriptors.get('context_rel_dropout', 0.0)
self.context_rel_use_batch_norm = model_descriptors.get('context_rel_use_batch_norm', False)
self.input_dropout = model_descriptors['input_dropout']
self.hidden_dropout = model_descriptors['hidden_dropout']
self.output_dropout = model_descriptors['output_dropout']
self.batch_norm_momentum = model_descriptors.get('batch_norm_momentum', 0.1)
self.batch_norm_train_stats = model_descriptors.get('batch_norm_train_stats', False)
self._loss_summaries = model_descriptors['add_loss_summaries']
self._variable_summaries = model_descriptors['add_variable_summaries']
self._tensor_summaries = model_descriptors['add_tensor_summaries']
learning_rate = model_descriptors['learning_rate']
optimizer = AMSGradOptimizer(learning_rate)
# Build the graph.
with tf.device('/CPU:0'):
self.input_iterator_handle = tf.placeholder(
tf.string, shape=[], name='input_iterator_handle')
self.input_iterator = tf.data.Iterator.from_string_handle(
self.input_iterator_handle,
output_types={
'e1': tf.int64,
'e2': tf.int64,
'rel': tf.int64,
'e2_multi': tf.float32,
'lookup_values': tf.int32
},
output_shapes={
'e1': [None],
'e2': [None],
'rel': [None],
'e2_multi': [None, None],
'lookup_values': [None, None]
})
# Get the next samples from the training and the evaluation iterators.
self.next_input_sample = self.input_iterator.get_next()
# Training Data.
self.is_train = tf.placeholder_with_default(False, shape=[], name='is_train')
self.e1 = self.next_input_sample['e1']
self.rel = self.next_input_sample['rel']
self.e2 = self.next_input_sample['e2']
self.e2_multi = self.next_input_sample['e2_multi']
if self.use_negative_sampling:
self.obj_lookup_values = self.next_input_sample['lookup_values']
else:
self.obj_lookup_values = None
with tf.variable_scope('variables', use_resource=True):
self.variables = self._create_variables()
ent_emb = self.variables['ent_emb']
rel_emb = self.variables['rel_emb']
conve_e1_emb = tf.nn.embedding_lookup(ent_emb, self.e1, name='e1_emb')
conve_rel_emb = tf.nn.embedding_lookup(rel_emb, self.rel, name='rel_emb')
# Use the model to predict the embedding of the correct answer e2.
self.predicted_e2_emb = self._create_predictions(conve_e1_emb, conve_rel_emb)
# Compare the predicted e2 embedding with the embeddings of all e2 provided in the `obj_lookup_values`.
self.predictions_lookup = self._compute_likelihoods(
self.predicted_e2_emb, 'predictions_lookup', self.obj_lookup_values)
# Compare the predicted e2 embedding with the embeddings of all e2 in the vocabulary.
self.predictions_all = self._compute_likelihoods(self.predicted_e2_emb, 'predictions')
self.loss = self._create_loss(self.predictions_lookup, self.e2_multi)
# The following control dependency is needed in order for batch
# normalization to work correctly.
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
gradients, variables = zip(*optimizer.compute_gradients(self.loss))
gradients, _ = tf.clip_by_global_norm(gradients, 5.0)
self.train_op = optimizer.apply_gradients(zip(gradients, variables))
self.summaries = tf.summary.merge_all()
def _create_variables(self):
"""Creates the network variables and returns them in a dictionary."""
ent_emb = tf.get_variable(
name='ent_emb', dtype=tf.float32,
shape=[self.num_ent, self.ent_emb_size],
initializer=tf.contrib.layers.xavier_initializer())
rel_emb = tf.get_variable(
name='rel_emb', dtype=tf.float32,
shape=[self.num_rel, self.rel_emb_size],
initializer=tf.contrib.layers.xavier_initializer())
pred_bias = tf.get_variable(
name='pred_bias', dtype=tf.float32,
shape=[self.num_ent], initializer=tf.zeros_initializer())
variables = {
'ent_emb': ent_emb,
'rel_emb': rel_emb,
'pred_bias': pred_bias}
if self._variable_summaries:
_create_summaries('emb/ent', ent_emb)
_create_summaries('emb/rel', rel_emb)
return variables
def _create_predictions(self, e1_emb, rel_emb):
is_train_float = tf.cast(self.is_train, tf.float32)
is_train_batch_norm = self.is_train if self.batch_norm_train_stats else False
e1_emb = tf.layers.batch_normalization(
e1_emb, momentum=self.batch_norm_momentum, reuse=tf.AUTO_REUSE,
training=is_train_batch_norm, fused=True, name='Conv1BN')
rel_emb = tf.layers.batch_normalization(
rel_emb, momentum=self.batch_norm_momentum, reuse=tf.AUTO_REUSE,
training=is_train_batch_norm, fused=True, name='Conv1BN')
x = e1_emb + rel_emb
x = tf.layers.batch_normalization(
x, momentum=self.batch_norm_momentum, reuse=tf.AUTO_REUSE,
training=is_train_batch_norm, fused=True, name='FCBN')
x = tf.nn.dropout(x, 1 - (self.output_dropout * is_train_float))
x = tf.nn.relu(x)
return x
def _compute_likelihoods(self, predicted_e2_emb, name, ent_indices=None):
if self._tensor_summaries:
_create_summaries('fc_with_activation', predicted_e2_emb)
with tf.name_scope('output_layer'):
if ent_indices is None:
ent_emb = self.variables['ent_emb']
ent_emb_t = tf.transpose(ent_emb)
predictions = tf.matmul(predicted_e2_emb, ent_emb_t, name=name)
predictions += self.variables['pred_bias']
else:
ent_emb = tf.gather(self.variables['ent_emb'], ent_indices) # Returns shape [BatchSize, NumSamples, EmbSize]
ent_emb_t = tf.transpose(ent_emb, [0, 2, 1])
predictions = tf.matmul(predicted_e2_emb[:, None, :], ent_emb_t, name=name)[:, 0, :]
pred_bias = tf.gather(self.variables['pred_bias'], ent_indices)
predictions += pred_bias
if self._tensor_summaries:
_create_summaries('predictions', predictions)
return predictions
def _create_loss(self, predictions, targets):
with tf.name_scope('loss'):
targets = ((1 - self.label_smoothing_epsilon) * targets) + (1.0 / self.num_ent)
loss = tf.reduce_sum(
tf.losses.sigmoid_cross_entropy(targets, predictions),
name='loss')
if self._loss_summaries:
tf.summary.scalar('loss', loss)
return loss Here goes the log:
|
Hi, @otiliastr I have solved it! It is because ConvE uses average hits@10 calculated from By the way, it is interesting to find out that |
I'm glad it's clarified! I noticed that Minerva also reports 60% for ConvE, I think that's where we followed the evaluation from. Thanks for figuring it out! |
Hello @otiliastr , thanks for sharing your nice code! Your job has inspired me a lot. I love it.
I run the code in CoPER_ConvE, and I have got ~60% of ConvE and ~62% of CoPER-ConvE on the FB15k-237 (hits@10).
Everything is OK.
However , when I rewrite it using PyTorch, I can only get ~50% of ConvE and ~53% of CoPER-ConvE on the FB15k-237 (hits@10).
I have read the paper carefully. And I have implemented the negative sampling the same as the paper said. But it shows that the negative sampling takes no effect. It keeps ~53% of CoPER-ConvE on the FB15k-237 (hits@10) with and without negative sampling.
So I believe the negative sampling strategy is not the key trick.
It is very very very strange that I can reproduce the result which is written by Tensorflow1.x while I couldn't reproduce it by PyTorch!
I will appreciate it if you can give more advice.
How I implement negative sampling:
BatchSize x SamplingWindowSize
and target scores matrix of the same shape as target ids.for BatchSize=2, SamplingWindowSize=3, training set
{ (1,1,2),(1,1,3),(2,1,3) }
, entity set{1,2,3,4}
, relation set{1,2}
, the sample istarget ids=
[ [2,3,4], [3,1,2] ]
, target scores=[ [1,1,0], [1,0,0] ]
for batch[ [1,1], [2,1] ]
The text was updated successfully, but these errors were encountered: