Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rare case when entity has no relations but attribute triples #21

Closed
sven-h opened this issue Jan 27, 2021 · 12 comments
Closed

Rare case when entity has no relations but attribute triples #21

sven-h opened this issue Jan 27, 2021 · 12 comments

Comments

@sven-h
Copy link
Contributor

sven-h commented Jan 27, 2021

Hi,

as you already pointed out in the pull request:
in the rare case that an entity has no relations but only attribute triples, the code fails also at this position when called by the init function of class KGs.

Best regards
Sven

@sunzequn
Copy link
Member

sunzequn commented Feb 7, 2021

Hi Sven,

This is because the code does not generate ids for the entities that only have attributes.
I will update the code.
Thanks!

Best regards,
Zequn

@sunzequn
Copy link
Member

sunzequn commented Feb 7, 2021

Hi,

I have updated the code. Now the entity set contains the new entities from attribute triples.
You can test the code and let me know whether or not it works well on your datasets.
Thanks.

@dakeleblack
Copy link

dakeleblack commented Feb 22, 2021

作者您好,如果只有属性三元组没有关系三元组可以吗。看了代码:triples_num = self.kgs.kg1.relation_triples_num + self.kgs.kg2.relation_triples_num,如果没有关系三元组的话triples_num为0,会导致后续计算loss时报错

@sunzequn
Copy link
Member

作者您好,如果只有属性三元组没有关系三元组可以吗。看了代码:triples_num = self.kgs.kg1.relation_triples_num + self.kgs.kg2.relation_triples_num,如果没有关系三元组的话triples_num为0,会导致后续计算loss时报错

同学你好,我后面开学了会仔细测试一下这个场景。不过目前的方法都是需要relation的,attribute只是辅助信息,所以可能很少存在relation为0的情况。如果你有需求,你可以先自己修改一下代码。我后面会修复一下来支持这个场景。

@dakeleblack
Copy link

作者您好,如果只有属性三元组没有关系三元组可以吗。看了代码:triples_num = self.kgs.kg1.relation_triples_num + self.kgs.kg2.relation_triples_num,如果没有关系三元组的话triples_num为0,会导致后续计算loss时报错

同学你好,我后面开学了会仔细测试一下这个场景。不过目前的方法都是需要relation的,attribute只是辅助信息,所以可能很少存在relation为0的情况。如果你有需求,你可以先自己修改一下代码。我后面会修复一下来支持这个场景。

好的,我尝试修改下代码

@sven-h
Copy link
Contributor Author

sven-h commented Feb 27, 2021

Hi Zequn,

sorry for the late answer.
I tried out the commit you made, but this results in an error during loading the KGs.

Traceback (most recent call last):
  File "main.py", line 20, in <module>
    kgs = read_kgs_from_folder(args.training_data, args.dataset_division, args.alignment_module, args.ordered, remove_unlinked=False)
  File "/OpenEA/code/src/openea/modules/load/kgs.py", line 95, in read_kgs_from_folder
    kgs = KGs(kg1, kg2, train_links, test_links, valid_links=valid_links, mode=mode, ordered=ordered)
  File "/OpenEA/code/src/openea/modules/load/kgs.py", line 16, in __init__
    kg2.relation_triples_set, kg2.entities_set, ordered=ordered)
  File "/OpenEA/code/src/openea/modules/load/read.py", line 86, in generate_mapping_id
    assert len(ids2) == len(set(kg2_elements))
AssertionError

I created a small example with all necessary files, so that you can also test it.
I know that the KGs are larger, but at least it should start training and run a few epochs such that I get an output.
If you could test it, I would be very happy.

Best regards
Sven

@sven-h
Copy link
Contributor Author

sven-h commented Apr 6, 2021

Hi Zequn,

did you already have some time to test the small example?
Is there anything else you need which I could provide?

Best regards
Sven

@sunzequn
Copy link
Member

Hi Sven,

So sorry for my late reply. I've been very busy in the past few months (due to some personal reasons). I plan to test the proposed cases in the next few weeks. Sorry for the inconvenience.

Best,

Zequn

@sven-h
Copy link
Contributor Author

sven-h commented Jun 28, 2021

Hi Zequn,

thanks for your answer.
I hope everthing is fine (regarding your personal reasons) - wishing you all the best for it.

Great, that you want to test the proposed cases ( I know that the data may be very noisy, but I think this makes the library stronger and more resilient to any input).
If I can help you with anything, please let me know.

Best regards
Sven

@sunzequn
Copy link
Member

sunzequn commented Jul 1, 2021

Hi Sven,

I have updated the code to support the proposed case where some entities have no relations.

I tested the code with AlignE, not BootEA because the latter's current implementation cannot support entity alignment with dangling entities (i.e., those having no alignment as in your dataset).

Please use the configuration (arguments.json) below, and you can further test the code on large training/test data.

{
  "loss_norm" : "L2",
  "eval_metric" : "inner",
  "eval_freq" : 10,
  "neg_triple_num" : 10,
  "pos_margin" : 0.01,
  "dim" : 100,
  "start_valid" : 10,
  "rel_l2_norm" : true,
  "output" : "output/",
  "loss" : "limited",
  "optimizer" : "Adagrad",
  "truncated_freq" : 10,
  "alignment_module" : "swapping",
  "top_k" : [ 1, 5, 10, 50 ],
  "search_module" : "greedy",
  "csls" : 10,
  "predict_top_k" : 1,
  "neg_margin" : 2.0,
  "likelihood_slice" : 10,
  "sub_epoch" : 10,
  "dataset_division" : "alignment/",
  "init" : "normal",
  "ordered" : true,
  "batch_size" : 5000,
  "stop_metric" : "hits1",
  "batch_threads_num" : 4,
  "sim_th" : 0.7,
  "k" : 10,
  "is_save" : true,
  "max_epoch" : 2000,
  "training_data" : "data/",
  "embedding_module" : "AlignE",
  "neg_sampling" : "truncated",
  "truncated_epsilon" : 0.99,
  "neg_margin_balance" : 0.2,
  "test_threads_num" : 16,
  "ent_l2_norm" : true,
  "learning_rate" : 0.01,
  "eval_norm" : false
}

As the test data is very small, the performance fluctuates a lot. I give the log of the first 40 epochs as follows:

(openea) ➜  example git:(master) ✗ python main.py
load arguments: {'loss_norm': 'L2', 'eval_metric': 'inner', 'eval_freq': 10, 'neg_triple_num': 10, 'pos_margin': 0.01, 'dim': 100, 'start_valid': 10, 'rel_l2_norm': True, 'output': 'output/', 'loss': 'limited', 'optimizer': 'Adagrad', 'truncated_freq': 10, 'alignment_module': 'swapping', 'top_k': [1, 5, 10, 50], 'search_module': 'greedy', 'csls': 10, 'predict_top_k': 1, 'neg_margin': 2.0, 'likelihood_slice': 10, 'sub_epoch': 10, 'dataset_division': 'alignment/', 'init': 'normal', 'ordered': True, 'batch_size': 5000, 'stop_metric': 'hits1', 'batch_threads_num': 4, 'sim_th': 0.7, 'k': 10, 'is_save': True, 'max_epoch': 2000, 'training_data': 'data/', 'embedding_module': 'AlignE', 'neg_sampling': 'truncated', 'truncated_epsilon': 0.99, 'neg_margin_balance': 0.2, 'test_threads_num': 16, 'ent_l2_norm': True, 'learning_rate': 0.01, 'eval_norm': False}
read relation triples: data/rel_triples_1
read relation triples: data/rel_triples_2
read attribute triples: data/attr_triples_1
read attribute triples: data/attr_triples_2
read links: data/alignment/train_links
read links: data/alignment/valid_links
read links: data/alignment/test_links
Number of rt_dict: 144620
Number of hr_dict: 251193
entity relations dict: 144620
Number of av_dict: 66327
entity attributes dict: 66327

KG statistics:
Number of entities: 254537
Number of relations: 180
Number of attributes: 287
Number of relation triples: 2096198
Number of attribute triples: 430630
Number of local relation triples: 2096198
Number of local attribute triples: 430630

Number of rt_dict: 26774
Number of hr_dict: 53929
entity relations dict: 26774
Number of av_dict: 29991
entity attributes dict: 29991

KG statistics:
Number of entities: 55402
Number of relations: 133
Number of attributes: 194
Number of relation triples: 412179
Number of attribute triples: 155161
Number of local relation triples: 412179
Number of local attribute triples: 155161

Number of rt_dict: 144620
Number of hr_dict: 251193
entity relations dict: 144620
Number of av_dict: 66327
entity attributes dict: 66327

KG statistics:
Number of entities: 254537
Number of relations: 180
Number of attributes: 287
Number of relation triples: 2096198
Number of attribute triples: 430630
Number of local relation triples: 2096198
Number of local attribute triples: 430630

Number of rt_dict: 26774
Number of hr_dict: 53929
entity relations dict: 26774
Number of av_dict: 29991
entity attributes dict: 29991

KG statistics:
Number of entities: 55402
Number of relations: 133
Number of attributes: 194
Number of relation triples: 412179
Number of attribute triples: 155161
Number of local relation triples: 412179
Number of local attribute triples: 155161

supervised relation triples: 239487, 66567
supervised attribute triples: 10719, 7481
output/ data/ ['data'] alignment/ AlignE
results output folder: output/AlignE/data/alignment/20210701105422/
2021-07-01 10:54:22.335521: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2021-07-01 10:54:22.439925: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6325
pciBusID: 0000:03:00.0
totalMemory: 10.91GiB freeMemory: 10.39GiB
2021-07-01 10:54:22.439960: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2021-07-01 10:54:22.768419: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-01 10:54:22.768455: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2021-07-01 10:54:22.768462: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2021-07-01 10:54:22.768938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10051 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1)
epoch 1, avg. triple loss: 2.3641, cost time: 39.3560s
epoch 2, avg. triple loss: 1.9214, cost time: 39.2562s
epoch 3, avg. triple loss: 1.6306, cost time: 39.4330s
epoch 4, avg. triple loss: 1.4525, cost time: 39.5583s
epoch 5, avg. triple loss: 1.3411, cost time: 39.8204s
epoch 6, avg. triple loss: 1.2667, cost time: 39.6197s
epoch 7, avg. triple loss: 1.2144, cost time: 39.6867s
epoch 8, avg. triple loss: 1.1756, cost time: 39.3372s
epoch 9, avg. triple loss: 1.1459, cost time: 39.2396s
epoch 10, avg. triple loss: 1.1221, cost time: 39.4547s
quick results: hits@[1, 5, 10, 50] = [ 2.222 11.111 20.    57.778]%, time = 0.987 s 

generating neighbors of 309939 entities costs 496.280 s.
epoch 11, avg. triple loss: 2.2871, cost time: 41.1608s
epoch 12, avg. triple loss: 1.9561, cost time: 41.4813s
epoch 13, avg. triple loss: 1.8016, cost time: 41.0047s
epoch 14, avg. triple loss: 1.7171, cost time: 41.2335s
epoch 15, avg. triple loss: 1.6703, cost time: 40.9148s
epoch 16, avg. triple loss: 1.6388, cost time: 41.0950s
epoch 17, avg. triple loss: 1.6155, cost time: 41.4174s
epoch 18, avg. triple loss: 1.5961, cost time: 40.6094s
epoch 19, avg. triple loss: 1.5796, cost time: 41.3229s
epoch 20, avg. triple loss: 1.5655, cost time: 40.8944s
quick results: hits@[1, 5, 10, 50] = [ 4.444 28.889 42.222 77.778]%, time = 4.667 s 

generating neighbors of 309939 entities costs 508.628 s.
epoch 21, avg. triple loss: 1.8248, cost time: 41.4047s
epoch 22, avg. triple loss: 1.6999, cost time: 41.6181s
epoch 23, avg. triple loss: 1.6455, cost time: 41.6523s
epoch 24, avg. triple loss: 1.6151, cost time: 41.4173s
epoch 25, avg. triple loss: 1.5931, cost time: 41.5321s
epoch 26, avg. triple loss: 1.5753, cost time: 41.5690s
epoch 27, avg. triple loss: 1.5610, cost time: 41.6192s
epoch 28, avg. triple loss: 1.5483, cost time: 44.2122s
epoch 29, avg. triple loss: 1.5373, cost time: 41.6236s
epoch 30, avg. triple loss: 1.5282, cost time: 41.3632s
quick results: hits@[1, 5, 10, 50] = [20.    48.889 55.556 86.667]%, time = 5.517 s 

generating neighbors of 309939 entities costs 523.470 s.
epoch 31, avg. triple loss: 1.7091, cost time: 42.0826s
epoch 32, avg. triple loss: 1.6371, cost time: 41.7535s
epoch 33, avg. triple loss: 1.6122, cost time: 41.5167s
epoch 34, avg. triple loss: 1.5996, cost time: 41.5506s
epoch 35, avg. triple loss: 1.5911, cost time: 41.5930s
epoch 36, avg. triple loss: 1.5838, cost time: 41.7895s
epoch 37, avg. triple loss: 1.5778, cost time: 41.2144s
epoch 38, avg. triple loss: 1.5725, cost time: 41.2602s
epoch 39, avg. triple loss: 1.5676, cost time: 41.5401s
epoch 40, avg. triple loss: 1.5632, cost time: 41.7990s
quick results: hits@[1, 5, 10, 50] = [17.778 64.444 68.889 93.333]%, time = 5.610 s 

Best,

Zequn

@sven-h
Copy link
Contributor Author

sven-h commented Jul 1, 2021

Hi Zequn,

thank you so much for your quick answer and the code adaption.
I will soon test it and let you know if it also work in other test cases.

Can you provide some further information about the dangling entities?
Does it mean that some entities do not participate in an alignment (e.g. have no correspondence) or is it only the case that entites do not have any relations and do not have any corresponding entity.

Am I right, that BootEA does not work, when some entities in the KG are not aligned? (How can the implementation depend on such a prerequisite? I thought that the algorithm just get a training alignment and tries to find other correspondences. Can the algorithm be adapted to work also in such cases or is it a strict requirement for the approach to work?)

What other algorithms (implemented in this library) can be used in such situations, where not all entities are aligned (besides AlignE)?

Best regards
Sven

@sunzequn
Copy link
Member

sunzequn commented Jul 8, 2021

Hi Sven,

You can refer to our recent paper for more details about entity alignment with dangling cases. This paper presents a primary attempt at this problem but this is still a long way to go.

I will be still working on this problem and implement more algorithm variants (e.g., BootEA) that can be used in such situations.

Best,
Zequn

@whu2015 whu2015 closed this as completed Apr 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants