![Event Prediction using Case-Based Reasoning over Knowledge Graphs](../images/titleslide.png)


In this demonstration, we present EvCBR, a case-based reasoning model for knowledge-driven
event prediction.
We will first load up a model,
with some preprocessing already performed on the KG, and demonstrate two examples for
predicting properties about the Effect of a new Cause event.

In [1]:
from evcbr import EvCBR
from utils import *
from rdflib import Graph, Literal
import random
import pandas as pd

In [2]:
full_kg_file = (DATA_DIR / "demo_kg"/"wikidata_cc_full_3_hop.nt").resolve()
model_kg_file = (DATA_DIR / "demo_kg"/"wikidata_cc_nolit_3_hop.nt").resolve()

preprocessed_sim_dir = (DATA_DIR / "pp_wiki_full").resolve()
full_kg = Graph()
full_kg.bind("wd", WD)
full_kg.bind("wdt", WDT)
full_kg.parse(str(full_kg_file), format='nt')
# remove literals form the kg to use in the model
model_kg = Graph()
model_kg.parse(str(model_kg_file), format='nt')
add_haseffect_relations(model_kg)

model = EvCBR(KG=model_kg, preprocessed_data_dir=preprocessed_sim_dir)

Our first example will be to predict the effect of a new Megathrust Earthquake event that occurs in Japan.
First we'll query the KG to search for the appropriate URIs to use for our inputs of the new event.

In [3]:
query_results = full_kg.query("""
select ?qid ?label
where {
    ?qid rdfs:label ?label.
    filter( regex(?label, "megathrust earthquake" ))
}
""")
print("Matching results for Megathrust Earthquake")

results = [{'QID':res.qid, 'Label':res.label} for res in query_results]
res = pd.json_normalize(results)
res.head(3)

Matching results for Megathrust Earthquake


Unnamed: 0,QID,Label
0,http://www.wikidata.org/entity/Q727990,megathrust earthquake


In [4]:
query_results = full_kg.query("""
select ?qid ?label
where {
    ?qid rdfs:label ?label.
    filter( regex(?label, "Japan" ))
}
""")
print("Matching results for Japan")

# There are many matching results for Japan, so we'll just sort by the length of the label
# and show a few of them here
results = [{'QID':res.qid, 'Label':res.label} for res in query_results]
res = pd.json_normalize(results)
res.sort_values(by="Label", key=lambda x: x.str.len(), ascending=True, inplace=True)
res.head(5)

Matching results for Japan


Unnamed: 0,QID,Label
168,http://www.wikidata.org/entity/Q17,Japan
32,http://www.wikidata.org/entity/Q58893412,Japanese
31,http://www.wikidata.org/entity/Q5287,Japanese
157,http://www.wikidata.org/entity/Q908745,Japan Prize
50,http://www.wikidata.org/entity/Q6585753,Portal:Japan


We'll select WD:Q17 as our entity for Japan, and WD:Q727990 as our entity for the Megathrust Earthquake class.
We'll make our new event as an entity with a "instanceOf" relation pointing to the Megathrust Earthquake class, and
"country" relation to Japan.

In [5]:
p_instanceOf = WDT["P31"] # instanceOf
p_country = WDT["P17"] # country
ent_instanceOf1 = WD["Q727990"] # Megathrust Earthquake
ent_country1 = WD["Q17"] # Japan

# set up a dummy node for a new "megathrust earthquake in Japan" event
anon_node1 = WD["ex1"]
japan_eq_pred_triples = [
    (anon_node1, p_instanceOf, ent_instanceOf1),
    (anon_node1, p_country, ent_country1)
]

# predict the instanceOf and country of the effect event
predict_properties = [p_instanceOf, p_country]

Run the EvCBR prediction method. We can specify parameters such as how many hops in the KG to explore, how many
cases to retrieve, and how many paths to sample.

We produce two predictions here: one for our basic prediction method, and a second to produce "refined" predictions.
Intuitively, our basic prediction method produces initial predictions about the properties of the new Effect event,
while the refinement method aims to improve the predictions based on how well they can be used to "predict" the
input properties used to produce them.

In [6]:
# many methods involve random choices, so the final results will most likely
# vary between runs.

# set up and run prediction
model.set_forecast_triples(japan_eq_pred_triples)
forward_res1 = model.forecast_effects(
    triples_for_inductive_forecast=japan_eq_pred_triples,
    dummy_target_uri=anon_node1,
    forecast_relations=predict_properties,
    max_hops=3, sample_case_count=4,
    sample_case_cov_count=0, sample_path_count=180,
    print_info=False,
    precomputed_similar_cases=None,
    dummy_connecting_relation_uri=WDT_HASEFFECT,
    prevent_inverse_paths=False,
    save_path_info=True
)
model.clean_forecast_triples()
reverse_res1 = model.forecast_effect_reverse_predictions(
                prop_forecasts=forward_res1.property_entity_support,
                dummy_target_uri=anon_node1,
                triples_for_inductive_forecast=japan_eq_pred_triples,
                similar_case_effects=forward_res1.similar_cause_effect_pairs,
                max_hops=3,sample_path_count=180,
                prevent_inverse_paths=False
            )

Here we show some visualizations of 4 cases from the KG that were used to make our prediction. Cases are selected
based on factors such as the similarity of each property and the relative importance of properties.

`P1542` indicates the `hasEffect` relation between events.

In [7]:
from visualizations.vis_cases import *
supporting_cases1 = forward_res1.similar_cause_effect_pairs[:4]
cg = new_collective_make_case_graph(cs=supporting_cases1, kg=model_kg, connecting_prop=WDT_HASEFFECT,
                         cause_props=[p_instanceOf, p_country], effect_props=[p_instanceOf, p_country],
                              workaround_country=ent_country1)

# sig = setup_sigma_graph(cg)
# sig
# uncomment the two lines above this to run an interactive graph

Static images of the visualizations are shown in this notebook, but running
the notebook will allow you to start up an interactive graph UI.
![image showing a graph of some similar cases used to make predictions](../images/eq_cases.png)

The top predictions for the new Effect event's `country` and `instanceOf` properties using the basic prediction
method are as follows:

In [8]:
print("Top predictions for the effect event's country relation")
country_predictions = [{'QID': ent, 'Label': full_kg.value(subject=ent, predicate=RDFS.label)}
                       for ent in list(forward_res1.property_entity_support[p_country].keys())[:7]]
df = pd.json_normalize(country_predictions)
df.head(7)

Top predictions for the effect event's country relation


Unnamed: 0,QID,Label
0,http://www.wikidata.org/entity/Q17,Japan
1,http://www.wikidata.org/entity/Q30,United States of America
2,http://www.wikidata.org/entity/Q970,Comoros
3,http://www.wikidata.org/entity/Q114,Kenya
4,http://www.wikidata.org/entity/Q252,Indonesia
5,http://www.wikidata.org/entity/Q805,Yemen
6,http://www.wikidata.org/entity/Q1045,Somalia


A visualization of some of the prediction paths leading to the top prediction can be seen below.

In [9]:
top_pred = list(forward_res1.property_entity_support[p_country].keys())[0]
top_pred_paths = forward_res1.property_prediction_paths[p_country][top_pred]
# sig = visualize_prediction_path(model_kg, japan_eq_pred_triples, p_country, top_pred, top_pred_paths, pathcount=5)
# sig

![some example paths leading to the prediction for the effect's country property](../images/eq_countrypred.png)

We can also see how the paths are present in some of the past cases. The following visualization shows how one of the
prediction paths are present in the similar cases in the KG.

In [10]:
sorted_pathscores = sorted(top_pred_paths.items(), key=lambda x: x[1], reverse=True)
top_paths = [path_det for (path_det, path_score) in sorted_pathscores[:3]]
target_path = top_paths[0]

# csig = show_paths_on_cases(cs=supporting_cases1, kg=model_kg, connecting_prop=WDT_HASEFFECT,
#                          cause_props=[p_instanceOf, p_country], effect_props=[p_instanceOf, p_country],
#                               workaround_country=ent_country1, target_prop=p_country, path=target_path)

#csig

![examples showing where the prediction paths are present in the retrieved cases](../images/eq_countrypath_cases.png)

In [11]:
print("Top predictions for the effect event's instanceOf relation")
instanceOf_predictions = [{'QID': ent, 'Label': full_kg.value(subject=ent, predicate=RDFS.label)}
                       for ent in list(forward_res1.property_entity_support[p_instanceOf].keys())[:7]]
df = pd.json_normalize(instanceOf_predictions)
df.head(7)

Top predictions for the effect event's instanceOf relation


Unnamed: 0,QID,Label
0,http://www.wikidata.org/entity/Q7944,earthquake
1,http://www.wikidata.org/entity/Q727990,megathrust earthquake
2,http://www.wikidata.org/entity/Q11639848,multi-segment earthquake
3,http://www.wikidata.org/entity/Q3510594,earthquake in Japan
4,http://www.wikidata.org/entity/Q7312186,remotely triggered earthquakes
5,http://www.wikidata.org/entity/Q3541302,volcano tectonic earthquake
6,http://www.wikidata.org/entity/Q8070,tsunami


Similarly, a visualization of the prediction paths for the top instanceOf relation can be seen below.

In [12]:
top_pred = list(forward_res1.property_entity_support[p_instanceOf].keys())[0]
top_pred_paths = forward_res1.property_prediction_paths[p_instanceOf][top_pred]
# sig = visualize_prediction_path(model_kg, japan_eq_pred_triples, p_instanceOf, top_pred, top_pred_paths, pathcount=5)
# sig

![example paths leading to the prediction for the effect's instanceOf property](../images/eq_instancepred.png)

In [13]:
sorted_pathscores = sorted(top_pred_paths.items(), key=lambda x: x[1], reverse=True)
top_paths = [path_det for (path_det, path_score) in sorted_pathscores[:3]]
target_path = top_paths[0]

# csig = show_paths_on_cases(cs=supporting_cases1, kg=model_kg, connecting_prop=WDT_HASEFFECT,
#                          cause_props=[p_instanceOf, p_country], effect_props=[p_instanceOf, p_country],
#                               workaround_country=ent_country1, target_prop=p_instanceOf, path=target_path)
# csig

![example of where a prediction path is present in the retrieved cases](../images/eq_instancepath_cases.png)

The top predictions based on our refinement method are:

In [14]:
print("Refined predictions for the effect event's country relation")
country_predictions = [{'QID': ent, 'Label': full_kg.value(subject=ent, predicate=RDFS.label)}
                       for (ent, support) in reverse_res1.property_prediction_support[p_country][:7]]
df = pd.json_normalize(country_predictions)
df.head(7)

Refined predictions for the effect event's country relation


Unnamed: 0,QID,Label
0,http://www.wikidata.org/entity/Q17,Japan
1,http://www.wikidata.org/entity/Q71707,Fukushima Prefecture
2,http://www.wikidata.org/entity/Q30,United States of America
3,http://www.wikidata.org/entity/Q16,Canada
4,http://www.wikidata.org/entity/Q865,Taiwan
5,http://www.wikidata.org/entity/Q183,Germany
6,http://www.wikidata.org/entity/Q1023949,Futaba


In [15]:
print("Refined predictions for the effect event's instanceOf relation")
instanceOf_predictions = [{'QID': ent, 'Label': full_kg.value(subject=ent, predicate=RDFS.label)}
                       for (ent, support) in reverse_res1.property_prediction_support[p_instanceOf][:7]]
df = pd.json_normalize(instanceOf_predictions)
df.head(7)

Refined predictions for the effect event's instanceOf relation


Unnamed: 0,QID,Label
0,http://www.wikidata.org/entity/Q7944,earthquake
1,http://www.wikidata.org/entity/Q494721,city of Japan
2,http://www.wikidata.org/entity/Q727990,megathrust earthquake
3,http://www.wikidata.org/entity/Q728937,railway line
4,http://www.wikidata.org/entity/Q18663566,
5,http://www.wikidata.org/entity/Q11639848,multi-segment earthquake
6,http://www.wikidata.org/entity/Q7312186,remotely triggered earthquakes


We can see some a mixture of some good and bad predictions from both methods. In general the refined predictions
appear to do a better job of predicting that an effect will occur in Japan, specific areas in Japan, or countries
that are geographically/politically close to Japan.

The refinement method sometimes leads to some less desirable predictions for the effect's instanceOf,
such as classes that aren't types of events. Since several of the similar cases used to make predictions were
events where a megathrust earthquake caused the occurrence of more earthquakes, we can see how many of our predictions
in turn are different varieties of earthquakes.

Next, we'll look at an example for predicting the effects of an ongoing event, specifically the
recent Protests in Iran.

We once again will perform predictions using some minimal input properties.

In [16]:
query_results = full_kg.query("""
select ?qid ?label
where {
    ?qid rdfs:label ?label.
    filter( regex(?label, "protest" ))
}
""")
print("Matching results for Protest")

results = [{'QID':res.qid, 'Label':res.label} for res in query_results]
res = pd.json_normalize(results)
res.sort_values(by="Label", key=lambda x: x.str.len(), ascending=True, inplace=True)
res.head(5)

Matching results for Protest


Unnamed: 0,QID,Label
46,http://www.wikidata.org/entity/Q273120,protest
26,http://www.wikidata.org/entity/Q829147,protest song
4,http://www.wikidata.org/entity/Q15631336,protest march
14,http://www.wikidata.org/entity/Q111448505,type of protest
39,http://www.wikidata.org/entity/Q7623053,street protester


In [17]:
query_results = full_kg.query("""
select ?qid ?label
where {
    ?qid rdfs:label ?label.
    filter( regex(?label, "Iran" ))
}
order by strlen(str(?label))
""")
# Omit additional results
print("Matching results for Iran")

results = [{'QID':res.qid, 'Label':res.label} for res in query_results]
res = pd.json_normalize(results)
res.sort_values(by="Label", key=lambda x: x.str.len(), ascending=True, inplace=True)
res.head(5)

Matching results for Iran


Unnamed: 0,QID,Label
0,http://www.wikidata.org/entity/Q794,Iran
1,http://www.wikidata.org/entity/Q63214936,Iranun
2,http://www.wikidata.org/entity/Q3409301,Irantxe
3,http://www.wikidata.org/entity/Q63158027,Qajar Iran
4,http://www.wikidata.org/entity/Q107258515,Pahlavi Iran


We'll select WD:Q794 as our entity for Iran, and WD:Q273120 as our entity for the protest class.

In [32]:
# PROTEST in IRAN example

p_instanceOf = WDT["P31"] # instanceOf
p_country = WDT["P17"] # country
ent_instanceOf2 = WD["Q273120"] # Protest
ent_country2 = WD["Q794"] # Iran

# set up a dummy node for a new "protest in Iran" event
anon_node2 = WD["ex12"]
iran_protest_pred_triples = [
    (anon_node2, p_instanceOf, ent_instanceOf2),
    (anon_node2, p_country, ent_country2)
]

# predict the instanceOf and country of the effect event
predict_properties = [p_instanceOf, p_country]

In [19]:
# set up and run prediction
model.set_forecast_triples(iran_protest_pred_triples)
forward_res2 = model.forecast_effects(
    triples_for_inductive_forecast=iran_protest_pred_triples,
    dummy_target_uri=anon_node2,
    forecast_relations=predict_properties,
    max_hops=3, sample_case_count=5,
    sample_case_cov_count=2, sample_path_count=180,
    print_info=False,
    precomputed_similar_cases=None,
    dummy_connecting_relation_uri=WDT_HASEFFECT,
    prevent_inverse_paths=False,
    save_path_info=True
)
model.clean_forecast_triples()
reverse_res2 = model.forecast_effect_reverse_predictions(
                prop_forecasts=forward_res2.property_entity_support,
                dummy_target_uri=anon_node2,
                triples_for_inductive_forecast=iran_protest_pred_triples,
                similar_case_effects=forward_res2.similar_cause_effect_pairs,
                max_hops=3,sample_path_count=180,
                prevent_inverse_paths=False
            )

We once again visualize the cases retrieved to make our prediction. Recall that `P1542` is the causal relation
`hasEffect` in our dataset.

In [20]:
from visualizations.vis_cases import *
supporting_cases2 = forward_res2.similar_cause_effect_pairs[:4]
cg = new_collective_make_case_graph(cs=supporting_cases2, kg=model_kg, connecting_prop=WDT_HASEFFECT,
                         cause_props=[p_instanceOf, p_country], effect_props=[p_instanceOf, p_country],
                              workaround_country=ent_country2)
# sig = setup_sigma_graph(cg)
# sig

![example cases retrieved to make predictions for the iran protest](../images/ip_cases.png)

Several example cases drawn from the KG appear to be very relevant to our new event, such as the
Bahraini protests, Chilean protests, and Iranian Revolution. The Bahraini protests appeared to be the most similar
to our new event, given that it had an exact match to the type of cause event and because Bahrain is more similar to
Iran than Chile.

We also see an example of a somewhat less desirable case, which is largely an artifact of how Wikidata is
currently populated. The COVID-19 pandemic, and a Cuban protest caused by the pandemic, are listed among the
similar cases used in this example. Wikidata's entry for COVID-19 lists over 100 countries which were affected, which
leads to this event having a match for the country Iran with our new Cause event. Arguably, the pandemic and its effects
may be relevant to considering what the effects of a new event in Iran are, but the breadth of this particular event is
not ideal.

This example demonstrates how (1) some peculiarities in the underlying KG can lead to odd choices of
cases that are retrieved, and (2) the availability of relevant events in the KG can limit how good the
"best" cases are.


The top predictions for the new Effect event's `country` and `instanceOf` properties using the basic prediction
method are as follows:

In [21]:
print("Top predictions for the effect event's country relation")
country_predictions = [{'QID': ent, 'Label': full_kg.value(subject=ent, predicate=RDFS.label)}
                       for ent in list(forward_res2.property_entity_support[p_country].keys())[:7]]
df = pd.json_normalize(country_predictions)
df.head(7)

Top predictions for the effect event's country relation


Unnamed: 0,QID,Label
0,http://www.wikidata.org/entity/Q794,Iran
1,http://www.wikidata.org/entity/Q6178890,Khorasan Province
2,http://www.wikidata.org/entity/Q1986139,Parthian Empire
3,http://www.wikidata.org/entity/Q83891,Sasanian Empire
4,http://www.wikidata.org/entity/Q274536,Iranian Azerbaijan
5,http://www.wikidata.org/entity/Q159,Russia
6,http://www.wikidata.org/entity/Q227,Azerbaijan


In [22]:
print("Top predictions for the effect event's instanceOf relation")
instanceOf_predictions = [{'QID': ent, 'Label': full_kg.value(subject=ent, predicate=RDFS.label)}
                       for ent in list(forward_res2.property_entity_support[p_instanceOf].keys())[:7]]
df = pd.json_normalize(instanceOf_predictions)
df.head(7)

Top predictions for the effect event's instanceOf relation


Unnamed: 0,QID,Label
0,http://www.wikidata.org/entity/Q273120,protest
1,http://www.wikidata.org/entity/Q1996993,police brutality
2,http://www.wikidata.org/entity/Q20730691,death in police custody
3,http://www.wikidata.org/entity/Q3199915,massacre
4,http://www.wikidata.org/entity/Q124734,rebellion
5,http://www.wikidata.org/entity/Q175331,demonstration
6,http://www.wikidata.org/entity/Q686984,civil disorder


The top predictions based on our refinement method are:

In [23]:
print("Refined predictions for the effect event's country relation")
country_predictions = [{'QID': ent, 'Label': full_kg.value(subject=ent, predicate=RDFS.label)}
                       for (ent, support) in reverse_res2.property_prediction_support[p_country][:7]]
df = pd.json_normalize(country_predictions)
df.head(7)

Refined predictions for the effect event's country relation


Unnamed: 0,QID,Label
0,http://www.wikidata.org/entity/Q794,Iran
1,http://www.wikidata.org/entity/Q796,Iraq
2,http://www.wikidata.org/entity/Q227,Azerbaijan
3,http://www.wikidata.org/entity/Q159,Russia
4,http://www.wikidata.org/entity/Q232,Kazakhstan
5,http://www.wikidata.org/entity/Q851,Saudi Arabia
6,http://www.wikidata.org/entity/Q274536,Iranian Azerbaijan


In [24]:
print("Refined predictions for the effect event's instanceOf relation")
instanceOf_predictions = [{'QID': ent, 'Label': full_kg.value(subject=ent, predicate=RDFS.label)}
                       for (ent, support) in reverse_res2.property_prediction_support[p_instanceOf][:7]]
df = pd.json_normalize(instanceOf_predictions)
df.head(7)

Refined predictions for the effect event's instanceOf relation


Unnamed: 0,QID,Label
0,http://www.wikidata.org/entity/Q273120,protest
1,http://www.wikidata.org/entity/Q175331,demonstration
2,http://www.wikidata.org/entity/Q686984,civil disorder
3,http://www.wikidata.org/entity/Q7272924,Quran desecration
4,http://www.wikidata.org/entity/Q1562095,human chain
5,http://www.wikidata.org/entity/Q124734,rebellion
6,http://www.wikidata.org/entity/Q1996993,police brutality


This time our refinement method seems to produce more convincing improvements to the predictions. In particular,
the predictions for the country using the basic prediction lead to some questionable results, such as
old empires, while the refined predictions list relevant countries like Iran, Iraq, and Azerbaijan.

The refinement for the instanceOf of the effect are slightly harder to judge, with both methods producing
some reasonable options such as how this event might lead to further protests, demonstrations, rebellion,
and police brutality.

Some visualizations of the prediction paths for these predictions can be seen below.
Again, static images are shown below, but running the notebook will allow an interactive
UI to be present.

In [41]:
top_pred = list(reverse_res2.property_prediction_support[p_instanceOf])[0][0]
top_pred_paths = forward_res2.property_prediction_paths[p_instanceOf][top_pred]
# sig = visualize_prediction_path(model_kg, iran_protest_pred_triples, p_instanceOf, top_pred, top_pred_paths, pathcount=5)
# sig

Collecting entity labels from Wikidata
Collecting property labels from Wikidata


Sigma(nx.DiGraph with 6 nodes and 6 edges)

![](../images/ip1.png)

In [43]:
top_pred = list(reverse_res2.property_prediction_support[p_instanceOf])[1][0]
top_pred_paths = forward_res2.property_prediction_paths[p_instanceOf][top_pred]
# sig = visualize_prediction_path(model_kg, iran_protest_pred_triples, p_instanceOf, top_pred, top_pred_paths, pathcount=5)
# sig

Collecting entity labels from Wikidata
Collecting property labels from Wikidata


Sigma(nx.DiGraph with 6 nodes and 6 edges)

![](../images/ip2.png)

In [44]:
top_pred = list(reverse_res2.property_prediction_support[p_instanceOf])[2][0]
top_pred_paths = forward_res2.property_prediction_paths[p_instanceOf][top_pred]
# sig = visualize_prediction_path(model_kg, iran_protest_pred_triples, p_instanceOf, top_pred, top_pred_paths, pathcount=5)
# sig

Collecting entity labels from Wikidata
Collecting property labels from Wikidata


Sigma(nx.DiGraph with 6 nodes and 6 edges)

![](../images/ip3.png)

In [45]:
top_pred = list(reverse_res2.property_prediction_support[p_country])[0][0]
top_pred_paths = forward_res2.property_prediction_paths[p_country][top_pred]
# sig = visualize_prediction_path(model_kg, iran_protest_pred_triples, p_country, top_pred, top_pred_paths, pathcount=5)
# sig

Collecting entity labels from Wikidata
Collecting property labels from Wikidata


Sigma(nx.DiGraph with 6 nodes and 8 edges)

![](../images/ip4.png)

In [47]:
top_pred = list(reverse_res2.property_prediction_support[p_country])[1][0]
top_pred_paths = forward_res2.property_prediction_paths[p_country][top_pred]
# sig = visualize_prediction_path(model_kg, iran_protest_pred_triples, p_country, top_pred, top_pred_paths, pathcount=5)
# sig

Collecting entity labels from Wikidata
Collecting property labels from Wikidata


Sigma(nx.DiGraph with 10 nodes and 14 edges)

![](../images/ip5.png)

In [49]:
top_pred = list(reverse_res2.property_prediction_support[p_country])[2][0]
top_pred_paths = forward_res2.property_prediction_paths[p_country][top_pred]
# sig = visualize_prediction_path(model_kg, iran_protest_pred_triples, p_country, top_pred, top_pred_paths, pathcount=5)
# sig

Collecting entity labels from Wikidata
Collecting property labels from Wikidata


Sigma(nx.DiGraph with 9 nodes and 13 edges)

![](../images/ip6.png)