# AskWikidata


A prototype for a Wikidata Question Answering System.
https://github.com/rti/askwikidata


## Quickstart Notebook

In [None]:
%cd /content/
%rm -rf askwikidata

In [None]:
# Clone the askwikidata repo from Github.
!git clone https://github.com/rti/askwikidata
%cd /content/askwikidata

In [None]:
%pip install \
  sentence-transformers==3.1.1 \
  langchain==0.3.0 \
  langchain-community==0.3.0 \
  annoy==1.17.3 \
  bitsandbytes==0.43.3

In [4]:
# Unzip all cache files provided with the askwikidata repository.
!cd /content/askwikidata && bunzip2 --force --keep *.bz2

In [5]:
# Generate text representations of Wikidata items.
!cd /content/askwikidata && python text_representation.py > text_representations.log

100% 13877/13877 [00:12<00:00, 1137.28it/s]


In [None]:
# Add askwikidata source to the python import paths.
import sys
sys.path.append('/content/askwikidata')

# Change working to askwikidata
import os
os.chdir('/content/askwikidata')


# Setup the actual AskWikidata RAG system.
from askwikidata import AskWikidata

config = {
    "chunk_size": 1280,
    "chunk_overlap": 0,
    "index_trees": 1024,
    "retrieval_chunks": 16,
    "context_chunks": 5,
    "embedding_model_name": "BAAI/bge-small-en-v1.5",
    "reranker_model_name": "BAAI/bge-reranker-base",
    "qa_model_url": "Qwen/Qwen2.5-3B-Instruct",
}

askwikidata = AskWikidata(**config)
askwikidata.setup()

In [4]:
# Answer an example question.
print(askwikidata.ask("Who is the current mayor of Berlin? Since when do they serve?"))

Retrieving...
Reranking...
Generating...
<|im_start|>assistant
Kai Wegner since 2023-04-27 until today.<|im_end|>

Sources:
- https://www.wikidata.org/wiki/Q2079
- https://www.wikidata.org/wiki/Q1757
- https://www.wikidata.org/wiki/Q64
- https://www.wikidata.org/wiki/Q1055
