I reproduce the result of ConvE on the FB15k-237, but... #6

LinXueyuanStdio · 2021-07-09T08:51:06Z

Hello @otiliastr , thanks for sharing your nice code! Your job has inspired me a lot. I love it.

I run the code in CoPER_ConvE, and I have got ~60% of ConvE and ~62% of CoPER-ConvE on the FB15k-237 (hits@10).
Everything is OK.
However , when I rewrite it using PyTorch, I can only get ~50% of ConvE and ~53% of CoPER-ConvE on the FB15k-237 (hits@10).
I have read the paper carefully. And I have implemented the negative sampling the same as the paper said. But it shows that the negative sampling takes no effect. It keeps ~53% of CoPER-ConvE on the FB15k-237 (hits@10) with and without negative sampling.
So I believe the negative sampling strategy is not the key trick.
It is very very very strange that I can reproduce the result which is written by Tensorflow1.x while I couldn't reproduce it by PyTorch!
I will appreciate it if you can give more advice.

How I implement negative sampling:

build map : { (h, r)-> [t1,t2,...,tn] }. Here (h, r, t1), (h,r,t2),...,(h,r,tn) are triples from training set. h,r,t the index of head, relation and tail.
build target ids matrix of shape BatchSize x SamplingWindowSize and target scores matrix of the same shape as target ids.
for BatchSize=2, SamplingWindowSize=3, training set { (1,1,2),(1,1,3),(2,1,3) }, entity set {1,2,3,4}, relation set {1,2}, the sample is
target ids=[ [2,3,4], [3,1,2] ], target scores=[ [1,1,0], [1,0,0] ] for batch [ [1,1], [2,1] ]
train using target ids and target socres.

The text was updated successfully, but these errors were encountered:

LinXueyuanStdio · 2021-07-12T06:18:39Z

Hello!
I run another experiment to explore why the performance improves.
the logs are shown below:

2021-07-12 12:02:38,647 - PID: 10991 - INFO    - __main__             - Step 510000 | Loss:     0.0009
2021-07-12 12:02:38,647 - PID: 10991 - INFO    - __main__             - Evaluating model with name cpg-FB15k-237-ent_emb_200-rel_emb_32-batch_512-prop_neg_100.0-num_labels_None-OnePosPerSampl_False-bn_momentum_0.99-eval_hits@1-dropo
uts_0.2_0.3_0.2_0.2-context_batchnorm_True ...
2021-07-12 12:02:38,647 - PID: 10991 - INFO    - __main__             - Running dev_evaluation at step 510000...
2021-07-12 12:02:38,669 - PID: 10991 - INFO    - qa_cpg.metrics       -
2021-07-12 12:02:38,669 - PID: 10991 - INFO    - qa_cpg.metrics       - --------------------------------------------------
2021-07-12 12:02:38,669 - PID: 10991 - INFO    - qa_cpg.metrics       - dev_evaluation
2021-07-12 12:02:38,669 - PID: 10991 - INFO    - qa_cpg.metrics       - --------------------------------------------------
2021-07-12 12:02:38,669 - PID: 10991 - INFO    - qa_cpg.metrics       -
2021-07-12 12:02:59,266 - PID: 10991 - INFO    - qa_cpg.metrics       - Evaluated 17535 samples.
2021-07-12 12:02:59,267 - PID: 10991 - INFO    - qa_cpg.metrics       - Hits @1:   0.321129
2021-07-12 12:02:59,268 - PID: 10991 - INFO    - qa_cpg.metrics       - Hits @3:   0.443513
2021-07-12 12:02:59,269 - PID: 10991 - INFO    - qa_cpg.metrics       - Hits @5:   0.505503
2021-07-12 12:02:59,270 - PID: 10991 - INFO    - qa_cpg.metrics       - Hits @10:   0.591674
2021-07-12 12:02:59,270 - PID: 10991 - INFO    - qa_cpg.metrics       - Hits @20:   0.663986
2021-07-12 12:02:59,273 - PID: 10991 - INFO    - root                 - Mean rank: 397.347876
2021-07-12 12:02:59,273 - PID: 10991 - INFO    - root                 - Mean reciprocal rank:   0.409182
2021-07-12 12:02:59,273 - PID: 10991 - INFO    - root                 - --------------------------------------------------
2021-07-12 12:02:59,273 - PID: 10991 - INFO    - __main__             - Running test_evaluation at step 510000...
2021-07-12 12:02:59,295 - PID: 10991 - INFO    - qa_cpg.metrics       -
2021-07-12 12:02:59,295 - PID: 10991 - INFO    - qa_cpg.metrics       - --------------------------------------------------
2021-07-12 12:02:59,295 - PID: 10991 - INFO    - qa_cpg.metrics       - test_evaluation
2021-07-12 12:02:59,296 - PID: 10991 - INFO    - qa_cpg.metrics       - --------------------------------------------------
2021-07-12 12:02:59,296 - PID: 10991 - INFO    - qa_cpg.metrics       -
2021-07-12 12:03:23,318 - PID: 10991 - INFO    - qa_cpg.metrics       - Evaluated 20466 samples.
2021-07-12 12:03:23,320 - PID: 10991 - INFO    - qa_cpg.metrics       - Hits @1:   0.312811
2021-07-12 12:03:23,320 - PID: 10991 - INFO    - qa_cpg.metrics       - Hits @3:   0.440487
2021-07-12 12:03:23,321 - PID: 10991 - INFO    - qa_cpg.metrics       - Hits @5:   0.502052
2021-07-12 12:03:23,322 - PID: 10991 - INFO    - qa_cpg.metrics       - Hits @10:   0.586876
2021-07-12 12:03:23,323 - PID: 10991 - INFO    - qa_cpg.metrics       - Hits @20:   0.663442
2021-07-12 12:03:23,326 - PID: 10991 - INFO    - root                 - Mean rank: 426.946008
2021-07-12 12:03:23,326 - PID: 10991 - INFO    - root                 - Mean reciprocal rank:   0.403114
2021-07-12 12:03:23,326 - PID: 10991 - INFO    - root                 - --------------------------------------------------
2021-07-12 12:03:23,327 - PID: 10991 - INFO    - __main__             - Best dev hits@1 so far is at step 105000. Best dev metrics: {'mr': 173.22115768463073, 'mrr': 0.42594187325428945, 'hits@1': 0.33310521813515825, 'hits@3': 0.46
216139150270885, 'hits@5': 0.5297975477616196, 'hits@10': 0.613344739093242, 'hits@20': 0.6932420872540633}
2021-07-12 12:03:23,327 - PID: 10991 - INFO    - __main__             - Test metrics at best dev: {'mr': 192.89758624059417, 'mrr': 0.41676507564941717, 'hits@1': 0.322339489885664, 'hits@3': 0.45519398025994334, 'hits@5': 0.5240887
325320043, 'hits@10': 0.6075442196814228, 'hits@20': 0.6852829082380534}

Be aware that the config is cpg-FB15k-237-ent_emb_200-rel_emb_32-batch_512-prop_neg_100.0-num_labels_None-OnePosPerSampl_False-bn_momentum_0.99-eval_hits@1-dropo uts_0.2_0.3_0.2_0.2-context_batchnorm_True

num_labels is None. We don't apply negative sampling.

We can know the best hits@10 from the last 2 lines of logs. It supports the conclusion that

Without negative sampling, CoPER-ConvE (hits@10 =~61%) is still better than ConvE (hits@10 =~60%).
Negative sampling doesn't affect the performance too much.

Therefore, the negative sampling is not the key step to raise ConvE from 50% to 60% at hits@10.

The truth remains unknown.

otiliastr · 2021-07-12T12:52:34Z

Hi,

Thanks for your interest in our paper!

Just to clarify, were you able to figure out the conversion from Tensorflow to Pytorch? If not, maybe it helps to follow our implementation of the train iterator here, which calls the negative sampling implemented here.
But based on your second post, it sounds like the negative sampling is not the issue at least for FB15k-237, so there must be some other issue that happened in the conversion to Pytorch.

Regarding the second post, we have noticed that negative sampling helps overall, but I don't remember the specific difference for each metric and dataset. Do you also observe the same difference between with/without negative sampling on other datasets or and metrics beyond hits@10 on FB15k-237 ?

LinXueyuanStdio · 2021-07-12T15:52:02Z

Hi,

I use the code of this repo to implement TransE. To my surprise, I gain ~60% hits@10 on the FB15k-237!

As we all known, TransE is the simplest classical model, and performs ~47% hits@10 on the FB15k-237. There is no way for TransE to raise to 60%. So there MUST be something wrong, but I can't find any bugs.

I provide my TransE implementation below. Just replace the content of models.py with the code below, you are supposed to reproduce my result.
Below goes my code(models.py):

from __future__ import absolute_import, division, print_function

import logging

import tensorflow as tf

from .utils.amsgrad import AMSGradOptimizer

__all__ = ['ConvE']

LOGGER = logging.getLogger(__name__)


def _create_summaries(name, tensor):
    """Creates various summaries for the provided tensor,
    which are useful for TensorBoard visualizations).
    """
    with tf.name_scope(name + '/summaries'):
        mean = tf.reduce_mean(tensor)
        tf.summary.scalar('mean', mean)
        with tf.name_scope('stddev'):
            stddev = tf.sqrt(tf.reduce_mean(tf.square(tensor - mean)))
        tf.summary.scalar('stddev', stddev)
        tf.summary.scalar('max', tf.reduce_max(tensor))
        tf.summary.scalar('min', tf.reduce_min(tensor))
        tf.summary.histogram('histogram', tensor)


class ConvE(object):
    def __init__(self, model_descriptors):
        self.use_negative_sampling = model_descriptors['use_negative_sampling']
        self.label_smoothing_epsilon = model_descriptors['label_smoothing_epsilon']

        self.num_ent = model_descriptors['num_ent']
        self.num_rel = model_descriptors['num_rel']
        self.ent_emb_size = model_descriptors['ent_emb_size']
        self.rel_emb_size = model_descriptors['rel_emb_size']

        self.is_parameter_lookup = model_descriptors.get('do_parameter_lookup', False)

        self.conv_filter_height = model_descriptors.get('conv_filter_height', 3)
        self.conv_filter_width = model_descriptors.get('conv_filter_width', 3)
        self.conv_num_channels = model_descriptors.get('conv_num_channels', 32)

        self.concat_rel = model_descriptors.get('concat_rel', False)
        self.context_rel_conv = model_descriptors.get('context_rel_conv', None)
        self.context_rel_out = model_descriptors.get('context_rel_out', None)
        self.context_rel_dropout = model_descriptors.get('context_rel_dropout', 0.0)
        self.context_rel_use_batch_norm = model_descriptors.get('context_rel_use_batch_norm', False)

        self.input_dropout = model_descriptors['input_dropout']
        self.hidden_dropout = model_descriptors['hidden_dropout']
        self.output_dropout = model_descriptors['output_dropout']

        self.batch_norm_momentum = model_descriptors.get('batch_norm_momentum', 0.1)
        self.batch_norm_train_stats = model_descriptors.get('batch_norm_train_stats', False)

        self._loss_summaries = model_descriptors['add_loss_summaries']
        self._variable_summaries = model_descriptors['add_variable_summaries']
        self._tensor_summaries = model_descriptors['add_tensor_summaries']

        learning_rate = model_descriptors['learning_rate']
        optimizer = AMSGradOptimizer(learning_rate)

        # Build the graph.
        with tf.device('/CPU:0'):
            self.input_iterator_handle = tf.placeholder(
                tf.string, shape=[], name='input_iterator_handle')
            self.input_iterator = tf.data.Iterator.from_string_handle(
                self.input_iterator_handle,
                output_types={
                    'e1': tf.int64,
                    'e2': tf.int64,
                    'rel': tf.int64,
                    'e2_multi': tf.float32,
                    'lookup_values': tf.int32
                },
                output_shapes={
                    'e1': [None],
                    'e2': [None],
                    'rel': [None],
                    'e2_multi': [None, None],
                    'lookup_values': [None, None]
                })

        # Get the next samples from the training and the evaluation iterators.
        self.next_input_sample = self.input_iterator.get_next()

        # Training Data.
        self.is_train = tf.placeholder_with_default(False, shape=[], name='is_train')
        self.e1 = self.next_input_sample['e1']
        self.rel = self.next_input_sample['rel']
        self.e2 = self.next_input_sample['e2']
        self.e2_multi = self.next_input_sample['e2_multi']

        if self.use_negative_sampling:
            self.obj_lookup_values = self.next_input_sample['lookup_values']
        else:
            self.obj_lookup_values = None

        with tf.variable_scope('variables', use_resource=True):
            self.variables = self._create_variables()

        ent_emb = self.variables['ent_emb']
        rel_emb = self.variables['rel_emb']

        conve_e1_emb = tf.nn.embedding_lookup(ent_emb, self.e1, name='e1_emb')
        conve_rel_emb = tf.nn.embedding_lookup(rel_emb, self.rel, name='rel_emb')

        # Use the model to predict the embedding of the correct answer e2.
        self.predicted_e2_emb = self._create_predictions(conve_e1_emb, conve_rel_emb)

        # Compare the predicted e2 embedding with the embeddings of all e2 provided in the `obj_lookup_values`.
        self.predictions_lookup = self._compute_likelihoods(
            self.predicted_e2_emb, 'predictions_lookup', self.obj_lookup_values)

        # Compare the predicted e2 embedding with the embeddings of all e2 in the vocabulary.
        self.predictions_all = self._compute_likelihoods(self.predicted_e2_emb, 'predictions')

        self.loss = self._create_loss(self.predictions_lookup, self.e2_multi)

        # The following control dependency is needed in order for batch
        # normalization to work correctly.
        update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
        with tf.control_dependencies(update_ops):
            gradients, variables = zip(*optimizer.compute_gradients(self.loss))
            gradients, _ = tf.clip_by_global_norm(gradients, 5.0)
            self.train_op = optimizer.apply_gradients(zip(gradients, variables))
        self.summaries = tf.summary.merge_all()

    def _create_variables(self):
        """Creates the network variables and returns them in a dictionary."""
        ent_emb = tf.get_variable(
            name='ent_emb', dtype=tf.float32,
            shape=[self.num_ent, self.ent_emb_size],
            initializer=tf.contrib.layers.xavier_initializer())

        rel_emb = tf.get_variable(
            name='rel_emb', dtype=tf.float32,
            shape=[self.num_rel, self.rel_emb_size],
            initializer=tf.contrib.layers.xavier_initializer())

        pred_bias = tf.get_variable(
            name='pred_bias', dtype=tf.float32,
            shape=[self.num_ent], initializer=tf.zeros_initializer())

        variables = {
            'ent_emb': ent_emb,
            'rel_emb': rel_emb,
            'pred_bias': pred_bias}

        if self._variable_summaries:
            _create_summaries('emb/ent', ent_emb)
            _create_summaries('emb/rel', rel_emb)

        return variables

    def _create_predictions(self, e1_emb, rel_emb):
        is_train_float = tf.cast(self.is_train, tf.float32)
        is_train_batch_norm = self.is_train if self.batch_norm_train_stats else False

        e1_emb = tf.layers.batch_normalization(
            e1_emb, momentum=self.batch_norm_momentum, reuse=tf.AUTO_REUSE,
            training=is_train_batch_norm, fused=True, name='Conv1BN')

        rel_emb = tf.layers.batch_normalization(
            rel_emb, momentum=self.batch_norm_momentum, reuse=tf.AUTO_REUSE,
            training=is_train_batch_norm, fused=True, name='Conv1BN')

        x = e1_emb + rel_emb
        x = tf.layers.batch_normalization(
            x, momentum=self.batch_norm_momentum, reuse=tf.AUTO_REUSE,
            training=is_train_batch_norm, fused=True, name='FCBN')
        x = tf.nn.dropout(x, 1 - (self.output_dropout * is_train_float))
        x = tf.nn.relu(x)
        return x

    def _compute_likelihoods(self, predicted_e2_emb, name, ent_indices=None):
        if self._tensor_summaries:
            _create_summaries('fc_with_activation', predicted_e2_emb)

        with tf.name_scope('output_layer'):
            if ent_indices is None:
                ent_emb = self.variables['ent_emb']
                ent_emb_t = tf.transpose(ent_emb)
                predictions = tf.matmul(predicted_e2_emb, ent_emb_t, name=name)
                predictions += self.variables['pred_bias']
            else:
                ent_emb = tf.gather(self.variables['ent_emb'], ent_indices)  # Returns shape [BatchSize, NumSamples, EmbSize]
                ent_emb_t = tf.transpose(ent_emb, [0, 2, 1])
                predictions = tf.matmul(predicted_e2_emb[:, None, :], ent_emb_t, name=name)[:, 0, :]
                pred_bias = tf.gather(self.variables['pred_bias'], ent_indices)
                predictions += pred_bias
            if self._tensor_summaries:
                _create_summaries('predictions', predictions)
        return predictions

    def _create_loss(self, predictions, targets):
        with tf.name_scope('loss'):
            targets = ((1 - self.label_smoothing_epsilon) * targets) + (1.0 / self.num_ent)
            loss = tf.reduce_sum(
                tf.losses.sigmoid_cross_entropy(targets, predictions),
                name='loss')

            if self._loss_summaries:
                tf.summary.scalar('loss', loss)
        return loss

Here goes the log:

2021-07-12 23:43:18,299 - PID: 28415 - INFO    - qa_cpg.metrics       -
2021-07-12 23:43:42,047 - PID: 28415 - INFO    - qa_cpg.metrics       - Evaluated 17535 samples.
2021-07-12 23:43:42,048 - PID: 28415 - INFO    - qa_cpg.metrics       - Hits @0:   0.000000
2021-07-12 23:43:42,049 - PID: 28415 - INFO    - qa_cpg.metrics       - Hits @1:   0.332136
2021-07-12 23:43:42,050 - PID: 28415 - INFO    - qa_cpg.metrics       - Hits @3:   0.466610
2021-07-12 23:43:42,051 - PID: 28415 - INFO    - qa_cpg.metrics       - Hits @5:   0.530140
2021-07-12 23:43:42,052 - PID: 28415 - INFO    - qa_cpg.metrics       - Hits @9:   0.604790
2021-07-12 23:43:42,053 - PID: 28415 - INFO    - qa_cpg.metrics       - Hits @10:   0.616538
2021-07-12 23:43:42,054 - PID: 28415 - INFO    - qa_cpg.metrics       - Hits @11:   0.627146
2021-07-12 23:43:42,055 - PID: 28415 - INFO    - qa_cpg.metrics       - Hits @20:   0.693755
2021-07-12 23:43:42,058 - PID: 28415 - INFO    - root                 - Mean rank: 158.555632
2021-07-12 23:43:42,058 - PID: 28415 - INFO    - root                 - Mean reciprocal rank:   0.426363
2021-07-12 23:43:42,059 - PID: 28415 - INFO    - root                 - --------------------------------------------------
2021-07-12 23:43:42,059 - PID: 28415 - INFO    - logger-0             - Running test_evaluation at step 215000...
2021-07-12 23:43:42,091 - PID: 28415 - INFO    - qa_cpg.metrics       -
2021-07-12 23:43:42,091 - PID: 28415 - INFO    - qa_cpg.metrics       - --------------------------------------------------
2021-07-12 23:43:42,091 - PID: 28415 - INFO    - qa_cpg.metrics       - test_evaluation
2021-07-12 23:43:42,091 - PID: 28415 - INFO    - qa_cpg.metrics       - --------------------------------------------------
2021-07-12 23:43:42,091 - PID: 28415 - INFO    - qa_cpg.metrics       -
2021-07-12 23:44:09,837 - PID: 28415 - INFO    - qa_cpg.metrics       - Evaluated 20466 samples.
2021-07-12 23:44:09,838 - PID: 28415 - INFO    - qa_cpg.metrics       - Hits @0:   0.000000
2021-07-12 23:44:09,839 - PID: 28415 - INFO    - qa_cpg.metrics       - Hits @1:   0.324831
2021-07-12 23:44:09,840 - PID: 28415 - INFO    - qa_cpg.metrics       - Hits @3:   0.458956
2021-07-12 23:44:09,842 - PID: 28415 - INFO    - qa_cpg.metrics       - Hits @5:   0.525945
2021-07-12 23:44:09,843 - PID: 28415 - INFO    - qa_cpg.metrics       - Hits @9:   0.598700
2021-07-12 23:44:09,844 - PID: 28415 - INFO    - qa_cpg.metrics       - Hits @10:   0.609548
2021-07-12 23:44:09,845 - PID: 28415 - INFO    - qa_cpg.metrics       - Hits @11:   0.620541
2021-07-12 23:44:09,846 - PID: 28415 - INFO    - qa_cpg.metrics       - Hits @20:   0.685332
2021-07-12 23:44:09,850 - PID: 28415 - INFO    - root                 - Mean rank: 185.547103
2021-07-12 23:44:09,850 - PID: 28415 - INFO    - root                 - Mean reciprocal rank:   0.419544
2021-07-12 23:44:09,850 - PID: 28415 - INFO    - root                 - --------------------------------------------------
2021-07-12 23:44:09,851 - PID: 28415 - INFO    - logger-0             - Best dev hits@1 so far is at step 150000. Best dev metrics: {'mr': 127.98762475049901, 'mrr': 0.43143945634775316, 'hits@0': 0.0, 'hits@1': 0.34029084687767325,
 'hits@3': 0.46877673224978617, 'hits@5': 0.5327060165383519, 'hits@9': 0.6030225263758198, 'hits@10': 0.6163672654690618, 'hits@11': 0.6259481037924152, 'hits@20': 0.6865126889078985}
2021-07-12 23:44:09,851 - PID: 28415 - INFO    - logger-0             - Test metrics at best dev: {'mr': 145.88600605882928, 'mrr': 0.42354017013400524, 'hits@0': 0.0, 'hits@1': 0.33049936480015635, 'hits@3': 0.46379360891234245, 'h
its@5': 0.5272158702237858, 'hits@9': 0.5970878530245285, 'hits@10': 0.6091077885273136, 'hits@11': 0.6190755399198671, 'hits@20': 0.6807387862796834}
2021-07-12 23:44:13,261 - PID: 28415 - INFO    - logger-0             - Step 215100 | Loss:     0.0009
2021-07-12 23:44:17,161 - PID: 28415 - INFO    - logger-0             - Step 215200 | Loss:     0.0018
2021-07-12 23:44:21,002 - PID: 28415 - INFO    - logger-0             - Step 215300 | Loss:     0.0010

LinXueyuanStdio · 2021-07-16T08:33:24Z

Hi, @otiliastr

I have solved it!

It is because ConvE uses average hits@10 calculated from h,r to t and t,reverse_r to h, and CoPER uses hits@10 calculated from h,r to t only.

By the way, it is interesting to find out that TransE is also able to perform ~60% hits@10 on FB15k-237.

otiliastr · 2021-07-16T13:21:25Z

I'm glad it's clarified! I noticed that Minerva also reports 60% for ConvE, I think that's where we followed the evaluation from. Thanks for figuring it out!

LinXueyuanStdio closed this as completed Jul 16, 2021

apoorvumang mentioned this issue Feb 2, 2022

Potentially incorrect evaluation on FB15k-237? Why only tail prediction? #8

Open

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I reproduce the result of ConvE on the FB15k-237, but... #6

I reproduce the result of ConvE on the FB15k-237, but... #6

LinXueyuanStdio commented Jul 9, 2021

LinXueyuanStdio commented Jul 12, 2021

otiliastr commented Jul 12, 2021

LinXueyuanStdio commented Jul 12, 2021

LinXueyuanStdio commented Jul 16, 2021

otiliastr commented Jul 16, 2021

I reproduce the result of ConvE on the FB15k-237, but... #6

I reproduce the result of ConvE on the FB15k-237, but... #6

Comments

LinXueyuanStdio commented Jul 9, 2021

LinXueyuanStdio commented Jul 12, 2021

otiliastr commented Jul 12, 2021

LinXueyuanStdio commented Jul 12, 2021

LinXueyuanStdio commented Jul 16, 2021

otiliastr commented Jul 16, 2021