# Convert Freebase IDs to Wikidata IDs

## Prerequisites

Deploy the wikidata endpoint locally for fast queries. Please follow [qEndpoint - wikidata](https://github.com/the-qa-company/qEndpoint#qacompanyqendpoint-wikidata) to set up the endpoint.

In enssence, you only need to run the following command:

`# docker run -p 1234:1234 --name qendpoint-wikidata qacompany/qendpoint-wikidata`

Alternatively, you can use the endpoint `https://query.wikidata.org/sparql` from the [Wikidata Query Service](https://query.wikidata.org/) endpoint, but it will be much slower. We do not recommend you to perform huge queries on the public endpoint.

## Part 1: Convert Entity IDs

In [None]:
import pickle
from tqdm import tqdm

from converter import EntityConverter

We initialize the converter with a local Wikidata endpoint.

In [None]:
entity_coverter = EntityConverter("http://localhost:1234/api/endpoint/sparql")

We convert all entities from [Wikidata5m](https://deepgraphlearning.github.io/project/wikidata5m) to Freebase entities.

In [None]:
with open('artifacts/id2entity.pkl', 'rb') as f:
    id2entity = pickle.load(f)
len(id2entity)


4818298

In [None]:
fids = map(entity_coverter.get_wikidata_id, tqdm(id2entity))
fid2qid = {fid: qid for fid, qid in zip(fids, id2entity) if fid is not None}
print("Found:", len(fid2qid))
print(f"Not found: {len(id2entity) - len(fid2qid)} / {len(id2entity)}")

We save the results to `fid2qid.pkl` fur others to easily use.

In [None]:
with open('artifacts/fid2qid.pkl', 'wb') as f:
    pickle.dump(fid2qid, f)

## Part 2: Convert Property IDs

This [FAQ page](https://www.wikidata.org/wiki/Help:FAQ/Freebase#How_can_I_map_my_Freebase_Mids_to_Wikidata_Qids?) states that the conversion can be acquired with [*equivalent property (P1628)*](https://www.wikidata.org/wiki/Property:P1628), but the Freebase link is not present in any property.

[This page](https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase) pointed to [Freebase Mapping](https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase/Mapping) for the current mapping, there are rougly 1000 most popular properties that are mapped to Wikidata.

Although the mapping is not complete, there is no better alternatives at the moment. We scrape the mapping from the page and convert the Freebase IDs to Wikidata IDs.

Since we need to get the mapping from the page, firstly we use [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) to parse the HTML page.

In [None]:
import pickle
import requests
from bs4 import BeautifulSoup

url = "https://www.wikidata.org/wiki/Wikidata:WikiProject_Freebase/Mapping"
response = requests.get(url)
html_doc = response.text
soup = BeautifulSoup(html_doc)

Secondly, we parse the tables and get the mapping.

In [None]:
mapping = {}
rows = soup.find_all("tr")
for row in rows:
    cols = row.find_all("td")
    try:
        freebase_url = cols[0].a.attrs["href"]
    except Exception:
        continue
    if 'www.freebase.com' not in freebase_url:
        continue
    try:
        wikidata_url = cols[1].a.attrs["href"]
    except AttributeError:
        wikidata_url = None
    mapping[freebase_url] = wikidata_url

print("# exsit mapping: ", len(mapping))
print("# not none mapping: ", len([x for x in mapping.values() if x is not None]))
print("# wikidata properties mapped:", len(set(mapping.values())))

# exsit mapping:  1437
# not none mapping:  435
# wikidata properties mapped: 253


Thirdly, there are some freebase relations that are the inversions of other freebase relations, we also look them up. 

In [None]:
# In freebase, some relations are the reverse of another relation. For example,
# /people/person/place_of_birth is the reverse of /location/location/people_born_here.
# We want to keep only one of them, so we keep the one with the most occurences.
reverse_relations = {}
for row in rows:
    cols = row.find_all("td")
    try:
        freebase_url = cols[0].a.attrs["href"]
    except Exception:
        continue
    try:
        reversed_url = row.find_all("td")[-1].find_all("a")[0].attrs["href"]
        reverse_relations[freebase_url] = reversed_url
    except Exception:
        continue
print("# reverse relations:", len(reverse_relations))

# reverse relations: 77


Finally, we save all of the mappings to files.

In [None]:
with open('artifacts/f2w-properties.pkl', 'wb') as f:
    pickle.dump(mapping, f)
with open('artifacts/reversed-properties.pkl', 'wb') as f:
    pickle.dump(reverse_relations, f)