GitHub

This repository contains dataset and code for the following paper:

Learning Relation Entailment with Structured and Textual Information (AKBC2020)

Resources

https://www.wikidata.org/wiki/Wikidata:Property_navboxes
https://tools.wmflabs.org/hay/propbrowse/
https://www.npmjs.com/package/wikidata-taxonomy
- get subproperty: wdtaxonomy P361
https://tools.wmflabs.org/prop-explorer/
- A tree structure constructed by "subclass of" (P279) or "subproperty of" (P1647).
https://github.com/lucaswerkmeister/wikidata-ontology-exploreradd
https://www.wikidata.org/wiki/Wikidata:Database_reports/List_of_properties/all (property types)
https://tools.wmflabs.org/hay/propbrowse/props properties in a single json file

Build sub-graph

Implementation

Useful SQL queries

SELECT ?item ?itemLabel ?value ?valueLabel 
WHERE 
{
  ?item wdt:P170 ?value.  # value should be the creator of item
  #?item wdt:P136 wd:Q828322.  # item's genre must be a game
  #?item wdt:P31 wd:Q7397.  # item is an instance of software
  #?value wdt:P452 wd:Q941594.  # value's industry be a video game
  ?value wdt:P106 wd:Q5482740.  # value's occupation should be developer
  #?item ?prop ?value.
  FILTER NOT EXISTS { ?item wdt:P178 ?value }  # value is not the developer of item
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
LIMIT 100

Wikidata preprocessing

Download truthy file from https://dumps.wikimedia.org/wikidatawiki/entities/
Generate triples.txt and split it by properties.
- Only keep properties whose head and tail items are entities (starts with 'Q').
Downsampling
- Downsample triples to keep only frequent entities and save it to triples_ds.txt
- Downsample triples by properties. Keep the most popular instances for each property. The number of instance kept is determined by taking sqrt or log on the size of the property.
Inflate the downsampled properties because a kept instance of property A might also be an instance of property B but is not select. Make sure that all the entities kept have their P31 being kept because we use this to split properties.
Build an ontology (based on the entire Wikidata, not the downsampled one) for the dataset using P31 (instance of) and P279 (subclass of). The classification system of Wikidata is very strange, which is an cyclic graph.
Split leaf properties to ultra-fine properties based on the ontology inferred from above.
1. Compute the depth of each item in the ontology. Several heuristics need be used because it is cyclic.
2. For each instance of a certain property, we use the value of P31 of its head entity and tail entity as signature to split. In cases where the entities don't have P31, we use 'Q' as the placeholder.
3. For a property with K instances, all its sub-properties larger than K/100 are kept and the remaining ones are merged if any, which mean that the maximum number of sub-properties we could get is 100+1.
Merge instances from all the properties generated by the splitting algorithm.
Train KGE methods using the merged file.
Choose new parent among the sub-properties. This is crucial because it will influence the perform significantly.

SLING Python API

# get string of the mention
mention_str = doc.phrase(mention.begin, mention.end)

# iterate over evokes
for e in mention.evokes():
    print(e.data(pretty=True))

Experiments

Google sheet used to track experimental resutls.

Google sheet used to track ranking results.

Name		Name	Last commit message	Last commit date
Latest commit History 173 Commits
analogy		analogy
demo		demo
wikiutil		wikiutil
.gitignore		.gitignore
README.md		README.md
dep2prompt.py		dep2prompt.py
exp.ipynb		exp.ipynb
filter_emb.sh		filter_emb.sh
get_entity_occ_by_sling.sh		get_entity_occ_by_sling.sh
get_entity_occ_dep_by_sling.sh		get_entity_occ_dep_by_sling.sh
get_patterns.py		get_patterns.py
hiro_code.py		hiro_code.py
hiro_extract.sh		hiro_extract.sh
prep_data.py		prep_data.py
prep_data.sh		prep_data.sh
prep_data_large.sh		prep_data_large.sh
prep_data_large_overlap.sh		prep_data_large_overlap.sh
prep_text_data.py		prep_text_data.py
prep_text_data.sh		prep_text_data.sh
prop.py		prop.py
run.py		run.py
run_emb.py		run_emb.py
run_emb.sh		run_emb.sh
run_emb_learn.py		run_emb_learn.py
run_emb_learn.sh		run_emb_learn.sh
shuf_files_in_dir.sh		shuf_files_in_dir.sh
syn_data.sh		syn_data.sh
train.py		train.py
train.sh		train.sh
train_bow.sh		train_bow.sh

jzbjyb/RelEnt

Folders and files

Latest commit

History

Repository files navigation

Resources

Build sub-graph

Implementation

Useful SQL queries

Wikidata preprocessing

Experiments

About

Resources

Stars

Watchers

Forks

Languages