 Please read the [README](README.md) file first to download necessary data and make essential configuration. If you follow the instructions in [README](README.md), then you should be using *pipenv* to create virtual env. Remember to enter the virtual env before running this script. 

# Initial Probability Assignment
Before being able to assign probabilities, a probability calibrator must be trained. This step will require some training data. The command below will train a calibration model that could be used to transform scores into probabilityes. 

In [1]:
!python train_cal.py
# !python train_cal.py -i inputs/train.dat -o inputs/cal.model

INFO:root:Loading training data from /home/ricky/TREAT/proba-assign/inputs/train.dat ...
INFO:root:training...
INFO:root:Saving calibraiton model to /home/ricky/TREAT/proba-assign/inputs/cal.model...
INFO:root:Done!


Now we can assign initial probabilities

In [2]:
!python assign.py 
# !python assign.py -i inputs/prod.dat -o outputs/pkb.dat -m inputs/cal.model

INFO:root:Loading triple from /home/ricky/TREAT/proba-assign/inputs/prod.dat ...
INFO:root:Loading calibration model from /home/ricky/TREAT/proba-assign/inputs/cal.model ...
INFO:root:Assigning probabilities ...
INFO:root:Storing PKB to /home/ricky/TREAT/proba-assign/outputs/pkb.dat ...
INFO:root:Done!


The results are stored in the file `outputs/pkb.dat` (or your configured directory). The output data format is like `<head, relation, tail, probability, strength>`, where the strength indicates our confidence of the computed probability. Now we can check the former 10 lines of the result file.

In [3]:
!head -n 10 "outputs/pkb.dat"

http://treat.net/onto.owl#alm_100185	http://treat.net/onto.owl#caused_by	http://treat.net/onto.owl#alm_100058_uam40_15684	0.0	2.0
http://treat.net/onto.owl#alarm_para_instance_1798	http://treat.net/onto.owl#category	http://treat.net/onto.owl#alarm_para_instance_1351	0.003	2.0
http://treat.net/onto.owl#alarm_para_instance_713	http://treat.net/onto.owl#category	http://treat.net/onto.owl#alm_221257768	0.035	2.0
http://treat.net/onto.owl#alarm_para_instance_950	http://treat.net/onto.owl#category	http://treat.net/onto.owl#alm_100058_uam40_15666	0.0	2.0
http://treat.net/onto.owl#alm_100001_uam20_10	http://treat.net/onto.owl#has_parameter	http://treat.net/onto.owl#alarm_para_instance_296	0.001	2.0
http://treat.net/onto.owl#alm_100001_usm21_249	http://treat.net/onto.owl#has_parameter	http://treat.net/onto.owl#alarm_para_instance_2242	0.018	2.0
http://treat.net/onto.owl#alm_100001_usm21_28	http://treat.net/onto.owl#has_parameter	http://treat.net/onto.owl#alarm_para_instance_1346	0.007	2.0

Actually, the *probability* and the *strength* together form a Beta distribution $Beta(a, b)$ where $a=probability*strength, b=(1-probability)*strength$. In this way, we not only have a probability value of a triple, but a **probability distribution of that probability** of the triple. However, the *strength* here is assigned a fixed value, 2 (on behalf of a weak strength, it could be an arbitrary small number, but conventionally we use 2). It is to be updated in the following step.

# Probability Updating

The updating procedure will integrate the information of the evidence into the existing knowledge, by adding new probabilistic triples, or updating the probability value of existing triples. The required data format is `<head, relation, tail, probability>`, the *probability* of which could be obtained by `assign.py`. An example input is:

In [4]:
!head -n 10 "inputs/synthetic_evidence.dat"

http://treat.net/onto.owl#alm_5521_uam20_685	http://treat.net/onto.owl#has_parameter	http://treat.net/onto.owl#alarm_para_instance_1628	0.23630100592339764
http://treat.net/onto.owl#alm_100001_uam20_280	http://treat.net/onto.owl#has_parameter	http://treat.net/onto.owl#alarm_para_instance_2783	0.9701712202310359
http://treat.net/onto.owl#alm_221257878	http://treat.net/onto.owl#has_parameter	http://treat.net/onto.owl#alm_100001_usm21_736	0.3636269391610121
http://treat.net/onto.owl#alm_5521_usm21_721	http://treat.net/onto.owl#has_parameter	http://treat.net/onto.owl#alm_100001_usm21_46	0.895746043753355
http://treat.net/onto.owl#alm_100058_uam40_15809	http://treat.net/onto.owl#has_parameter	http://treat.net/onto.owl#alarm_para_instance_1080	0.5330004485415151
http://treat.net/onto.owl#alm_100001_usm21_440	http://treat.net/onto.owl#category	http://treat.net/onto.owl#alm_para_195	0.5636861902839531
http://treat.net/onto.owl#alarm_para_instance_733	http://treat.net/onto.owl#caused_by	h

To update probabilities with uncertain evidence, use this command

In [5]:
!python update.py 
# !python update.py -e inputs/synthetic_evidence.dat -d outputs/pkb.dat -o outputs/pkb.dat

INFO:root:Loading PKB from /home/ricky/TREAT/proba-assign/outputs/pkb.dat ...
INFO:root:Performing probabilistic updating by evidence from /home/ricky/TREAT/proba-assign/inputs/synthetic_evidence.dat ...
INFO:root:Saving PKB to /home/ricky/TREAT/proba-assign/outputs/pkb.dat ...
INFO:root:Done!


Now we check the former 10 lines of the result file.

In [6]:
!head -n 10 "outputs/pkb.dat"

http://treat.net/onto.owl#alm_100185	http://treat.net/onto.owl#caused_by	http://treat.net/onto.owl#alm_100058_uam40_15684	0.346	5.0
http://treat.net/onto.owl#alarm_para_instance_1798	http://treat.net/onto.owl#category	http://treat.net/onto.owl#alarm_para_instance_1351	0.003	2.0
http://treat.net/onto.owl#alarm_para_instance_713	http://treat.net/onto.owl#category	http://treat.net/onto.owl#alm_221257768	0.035	2.0
http://treat.net/onto.owl#alarm_para_instance_950	http://treat.net/onto.owl#category	http://treat.net/onto.owl#alm_100058_uam40_15666	0.0	2.0
http://treat.net/onto.owl#alm_100001_uam20_10	http://treat.net/onto.owl#has_parameter	http://treat.net/onto.owl#alarm_para_instance_296	0.144	3.0
http://treat.net/onto.owl#alm_100001_usm21_249	http://treat.net/onto.owl#has_parameter	http://treat.net/onto.owl#alarm_para_instance_2242	0.368	4.0
http://treat.net/onto.owl#alm_100001_usm21_28	http://treat.net/onto.owl#has_parameter	http://treat.net/onto.owl#alarm_para_instance_1346	0.28	5.

We can see the probabilities and strengths updated.

## Complexity Estimate

Initial probability assignment is essentially putting a triple's statistical scores into a probabilisitc classifier, so it takes constant time ($O(1)$) to assign probability per triple.

Probability updating works by directly adding a whole evidence (also a probabilitstic triple) into the PKB, or applying Jeffrey's Conditionalisation formula, which is also constant time ($O(1)$) per evidence

## Evaluation


In [7]:
from sklearn.metrics import mean_squared_error, log_loss

import utils

!python assign.py -i inputs/prod.dat -o outputs/prod-results.dat

probas = utils.read_tsv('outputs/prod-results.dat')[:, -2].astype(float)
labels = utils.read_tsv('inputs/prod-labelled.dat')[:, -1].astype(int)

INFO:root:Loading triple from inputs/prod.dat ...
INFO:root:Loading calibration model from /home/ricky/TREAT/proba-assign/inputs/cal.model ...
INFO:root:Assigning probabilities ...
INFO:root:Storing PKB to outputs/prod-results.dat ...
INFO:root:Done!


In [8]:
print('Mean Square Error:', mean_squared_error(labels, probas))
print('Negative Log Loss:', log_loss(labels, probas))


Mean Square Error: 0.032948945003731654
Negative Log Loss: 0.10706370382983427


In [9]:
import numpy as np
rand_probas = np.random.rand(len(probas))

print('Mean Square Error:', mean_squared_error(labels, rand_probas))
print('Negative Log Loss:', log_loss(labels, rand_probas))

Mean Square Error: 0.3339890495921917
Negative Log Loss: 1.0047463842569324
