

**Reminder 🐣**


If the notebook imports the .py files,
then the .py files must be in the same project folder or environment.

Meaning:

On Colab → you upload the folder to /content/…

On GitHub → the .py files sit in /src/

In local development → the .py files sit in the repository next to the notebook




In [13]:
# Create main project folder
!mkdir -p /content/TKG_demo_project/src/

!ls -R /content/TKG_demo_project



/content/TKG_demo_project:
src

/content/TKG_demo_project/src:
__pycache__  temporal_qa_system.py  wikipedia_retriever.py

/content/TKG_demo_project/src/__pycache__:
temporal_qa_system.cpython-312.pyc  wikipedia_retriever.cpython-312.pyc


In [14]:
from google.colab import files
import shutil


#upload augmentation.py
uploaded = files.upload()
shutil.move(list(uploaded.keys())[0], "/content/TKG_demo_project/src/")

#upload temporal_qa_system.py
uploaded = files.upload()
shutil.move(list(uploaded.keys())[0], "/content/TKG_demo_project/src/")




Saving augmentation.py to augmentation.py


Saving temporal_qa_system.py to temporal_qa_system.py


Error: Destination path '/content/TKG_demo_project/src/temporal_qa_system.py' already exists

In [15]:
import glob
glob.glob("/content/TKG_demo_project/src/*.py")


['/content/TKG_demo_project/src/augmentation.py',
 '/content/TKG_demo_project/src/wikipedia_retriever.py',
 '/content/TKG_demo_project/src/temporal_qa_system.py']

In [16]:
import sys
sys.path.append('/content/TKG_demo_project/src/')

!pip install wikipedia-api sentence-transformers transformers
from temporal_qa_system import create_system_from_notebook
from wikipedia_retriever import WikipediaRetriever




In [23]:
# Download and unzip the mmkb repo
!wget https://github.com/mniepert/mmkb/archive/refs/heads/master.zip -O mmkb.zip
!unzip -q mmkb.zip

# Inspect the ICEWS14 folder
!ls mmkb-master/TemporalKGs/icews14



--2025-11-20 21:09:06--  https://github.com/mniepert/mmkb/archive/refs/heads/master.zip
Resolving github.com (github.com)... 20.27.177.113
Connecting to github.com (github.com)|20.27.177.113|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/mniepert/mmkb/zip/refs/heads/master [following]
--2025-11-20 21:09:07--  https://codeload.github.com/mniepert/mmkb/zip/refs/heads/master
Resolving codeload.github.com (codeload.github.com)... 20.27.177.114
Connecting to codeload.github.com (codeload.github.com)|20.27.177.114|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: ‘mmkb.zip’

mmkb.zip                [           <=>      ]  22.92M  5.51MB/s    in 4.2s    

2025-11-20 21:09:11 (5.51 MB/s) - ‘mmkb.zip’ saved [24037907]

icews_2014_test.txt  icews_2014_train.txt  icews_2014_valid.txt


In [24]:
!mkdir -p /content/ICEWS14
!cp mmkb-master/TemporalKGs/icews14/* /content/ICEWS14/
!ls /content/ICEWS14


entity2id.txt	     icews_2014_train.txt  relation2id.txt  train.txt
icews_2014_test.txt  icews_2014_valid.txt  test.txt	    valid.txt


In [27]:
import pandas as pd

# Paths to ICEWS14 files downloaded from mmkb
train_path = "/content/ICEWS14/icews_2014_train.txt"
valid_path = "/content/ICEWS14/icews_2014_valid.txt"
test_path  = "/content/ICEWS14/icews_2014_test.txt"

# Load ICEWS14 splits (tab-separated, no header)
train_df = pd.read_csv(train_path, sep="\t", header=None)
valid_df = pd.read_csv(valid_path, sep="\t", header=None)
test_df  = pd.read_csv(test_path,  sep="\t", header=None)

# Assign column names
train_df.columns = ["head", "relation", "tail", "timestamp"]
valid_df.columns = ["head", "relation", "tail", "timestamp"]
test_df.columns  = ["head", "relation", "tail", "timestamp"]

print("Train sample:")
print(train_df.head())


Train sample:
                  head                             relation              tail  \
0          South Korea                Criticize or denounce       North Korea   
1                 Iran  Express intent to meet or negotiate             China   
2           Xi Jinping                              Consult       Le Hong Anh   
3  Civic Group (Kenya)            Make an appeal or request  Joseph Ole Lenku   
4   Criminal (Somalia)      Abduct, hijack, or take hostage   Citizen (India)   

    timestamp  
0  2014-05-13  
1  2014-02-02  
2  2014-08-28  
3  2014-05-13  
4  2014-10-31  


In [28]:
# 1. Build ID dictionaries from all splits
df_all = pd.concat([train_df, valid_df, test_df]).reset_index(drop=True)

for col in ["head", "relation", "tail", "timestamp"]:
    df_all[col] = df_all[col].astype(str)

entities = sorted(set(df_all["head"]).union(df_all["tail"]))
relations = sorted(df_all["relation"].unique())
times = sorted(df_all["timestamp"].unique())

entity2id = {e: idx for idx, e in enumerate(entities)}
relation2id = {r: idx for idx, r in enumerate(relations)}
time2id = {t: idx for idx, t in enumerate(times)}

print("Num entities:", len(entity2id))
print("Num relations:", len(relation2id))
print("Num timestamps:", len(time2id))

# 2. Build triple lists from official ICEWS14 splits
train_triples_raw = list(zip(
    train_df["head"].astype(str),
    train_df["relation"].astype(str),
    train_df["tail"].astype(str),
    train_df["timestamp"].astype(str),
))

valid_triples_raw = list(zip(
    valid_df["head"].astype(str),
    valid_df["relation"].astype(str),
    valid_df["tail"].astype(str),
    valid_df["timestamp"].astype(str),
))

test_triples_raw = list(zip(
    test_df["head"].astype(str),
    test_df["relation"].astype(str),
    test_df["tail"].astype(str),
    test_df["timestamp"].astype(str),
))

print("Train triples:", len(train_triples_raw))
print("Valid triples:", len(valid_triples_raw))
print("Test triples:", len(test_triples_raw))
print("Example train triple:", train_triples_raw[0])


Num entities: 7128
Num relations: 230
Num timestamps: 365
Train triples: 72826
Valid triples: 8941
Test triples: 8963
Example train triple: ('South Korea', 'Criticize or denounce', 'North Korea', '2014-05-13')


**PyTorch TransE**

In [29]:
import torch
import torch.nn as nn
from sentence_transformers import SentenceTransformer


In [32]:
class TransE(nn.Module):
    def __init__(self, num_entities, num_relations, dim=100):
        super().__init__()
        self.ent_embeddings = nn.Embedding(num_entities, dim)
        self.rel_embeddings = nn.Embedding(num_relations, dim)
        nn.init.xavier_uniform_(self.ent_embeddings.weight)
        nn.init.xavier_uniform_(self.rel_embeddings.weight)

    def forward(self, h, r, t):
        h_e = self.ent_embeddings(h)
        r_e = self.rel_embeddings(r)
        t_e = self.ent_embeddings(t)
        # higher score = better triple
        return -torch.norm(h_e + r_e - t_e, p=2, dim=1)

def train_transe(model, triples, epochs=3, batch_size=1024, lr=0.001, device="cuda"):
    model.to(device)
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)

    h = torch.tensor([entity2id[a[0]] for a in triples], dtype=torch.long)
    r = torch.tensor([relation2id[a[1]] for a in triples], dtype=torch.long)
    t = torch.tensor([entity2id[a[2]] for a in triples], dtype=torch.long)

    dataset = torch.utils.data.TensorDataset(h, r, t)
    loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True)

    for epoch in range(epochs):
        for h_b, r_b, t_b in loader:
            h_b = h_b.to(device)
            r_b = r_b.to(device)
            t_b = t_b.to(device)

            optimizer.zero_grad()
            pos_score = model(h_b, r_b, t_b)

            # simple negative sampling: corrupt tail
            t_neg = t_b[torch.randperm(len(t_b))]
            neg_score = model(h_b, r_b, t_neg)

            loss = torch.relu(1.0 + neg_score - pos_score).mean()
            loss.backward()
            optimizer.step()
        print(f"Epoch {epoch+1}, loss = {float(loss):.4f}")

    return model

device = "cuda" if torch.cuda.is_available() else "cpu"

transe_model = TransE(
    num_entities=len(entity2id),
    num_relations=len(relation2id),
    dim=100
)

transe_model = train_transe(
    transe_model,
    train_triples_raw,
    epochs=10,
    batch_size=1024,
    lr=0.001,
    device=device
)



Epoch 1, loss = 0.9068
Epoch 2, loss = 0.6063
Epoch 3, loss = 0.4729
Epoch 4, loss = 0.4471
Epoch 5, loss = 0.4482
Epoch 6, loss = 0.3220
Epoch 7, loss = 0.2778
Epoch 8, loss = 0.3134
Epoch 9, loss = 0.2481
Epoch 10, loss = 0.2043


**Load BGE-large**

In [33]:
semantic_model = SentenceTransformer("BAAI/bge-large-en-v1.5").to(device)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/779 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

**Import Demo system + wiki retriever**

In [73]:
# Choose 2–3 demo ICEWS triple indices
demo_indices = [0, 2, 6]

import sys
sys.path.append('/content/TKG_demo_project/src/')

system = create_system_from_notebook(
    model=transe_model,
    semantic_model=semantic_model,
    entity2id=entity2id,
    relation2id=relation2id,
    time2id=time2id,
    train_triples_raw=train_triples_raw,
    demo_indices=demo_indices,  # Uses the indices from Cell 1
)
print("System created")

System created


In [74]:
from wikipedia_retriever import WikipediaRetriever

system.wikipedia_retriever = WikipediaRetriever(
    language="en",
    user_agent="TKG-Demo/1.0 (contact: example@example.com)"
)
print("Retriever attached")

Retriever attached


**attach LLM and run demo**

In [75]:
from transformers import T5ForConditionalGeneration, T5Tokenizer

llm_tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-base")
llm_model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-base").to(device)

def call_flan_t5(prompt, max_length=256):
    inputs = llm_tokenizer(prompt, return_tensors="pt", truncation=True).to(device)
    outputs = llm_model.generate(**inputs, max_length=max_length)
    return llm_tokenizer.decode(outputs[0], skip_special_tokens=True)

system._call_llm = call_flan_t5
print("LLM is ready")

LLM is ready


 **Build demo QA from your ICEWS triples**

In [78]:
# Run demo
for qa in demo_qa:
    q = qa["question"]
    gold = qa["answer"]

    print("\n" + "="*60)
    print("Question:", q)
    print("Gold answer:", gold)

    result = system.answer_question(q, k=2, alpha=0.5)

    print("\nLLM answer:", result["answer"])
    print("\nTop ICEWS facts:")
    for f in result["top_facts"]:  # Changed from "facts"
        print("  -", f)

    print("\nWikipedia passages:")
    for w in result["wiki_passages"]:
        print("  -", w)

    print("\nEntities:", result["entities_used"], "Year:", result["year_used"])  # Changed keys


Question: What did South Korea do toward North Korea in 2014-05-13?
Gold answer: Criticize or denounce

LLM answer: Criticize or denounce North Korea

Top ICEWS facts:
  - In 2014-05-13, South Korea Criticize or denounce North Korea.
  - In 2014-08-28, Xi Jinping Consult Le Hong Anh.

Wikipedia passages:
  - South Korea went on to sign a Free Trade Agreement with Canada and Australia in 2014, and another with New Zealand in 2015.
  - Despite the Sunshine Policy and efforts at reconciliation, the progress was complicated by North Korean missile tests in 1993, 1998, 2006, 2009, and 2013.
  - As of 2015, North Korea had diplomatic relations with 166 countries and embassies in 47 countries.
  - In 2015, North Korea was reported to employ 6,000 sophisticated computer security personnel in a cyberwarfare unit operating out of China.

Entities: ['South Korea', 'North Korea'] Year: 2014

Question: What did Xi Jinping do toward Le Hong Anh in 2014-08-28?
Gold answer: Consult

LLM answer: Consu

# 🐣 think about how to make implicit Questions -> explicit

example:

implicit: “After the Danish Ministry, who was the first to visit Iraq?"

explicit: “After 2016-01-05, who was the first to visit Iraq?”


