## Finding a baseline

Here, we try to find / create a baseline for our tasks.

Task 3 has a 50% chance baseline (since it is only binary classification).

Task 1: Gegeben ein World Modell und Keys (aus Train Datensatz) -> nehme zufällig aus allen passenden Formeln aus Train Datensatz eine Formel f’ (passend = kommen dieselben Prädikate, Variablen etc. vor), und teste accuracy (richtig oder falsch gegeben World Model, Keys und Formula). wiederhole Für ausreichend World Modell-Keys-Formel-Kombinationen, kombiniere accuracy => das ist unsere Baseline.

Task 2: Analog zu task 1: Gegeben Formel (aus Train Datensatz) -> nehme zufällig aus allen anderen passenden Datenpunkten aus Train Datensatz ein die Keys (passend = kommen dieselben Prädikate, Variablen etc. vor), generiere irgendein World model. und teste accuracy (richtig oder falsch gegeben World Model, Keys und Formula). wiederhole Für ausreichend World Modell-Keys-Formel-Kombinationen, kombiniere accuracy => das ist unsere Baseline.



In [1]:
import json
import collections 
import pandas as pd
import matplotlib.pyplot as plt

from nltk.sem.logic import *
import nltk
from nltk.sem.logic import LogicParser, Expression
from nltk.sem.evaluate import Valuation, Model

In [2]:
filename = 'base_pred_logic_data.json'
base_dataset = pd.read_json('../datasets/' + filename)

In [None]:
pd.set_option('display.max_colwidth', None)
base_dataset.iloc[6]

In [4]:
G_dataset = base_dataset[base_dataset["Predicates"].isin([['G']])]
len(G_dataset)

27135

In [5]:
F_dataset = base_dataset[base_dataset["Predicates"].isin([['F']])]
len(F_dataset)

27221

In [6]:
F_G_dataset = base_dataset[base_dataset["Predicates"].isin([['G', 'F']])]
len(F_G_dataset)


45644

In [7]:
splits = [F_G_dataset, G_dataset, F_dataset]

In [8]:
# Algo Task 1:
# split dataset according to the same predicates.
# for each datapoint in each split:
# take another formula and see whether it turns out as it should (sat / unsat) given the world model, keys, from this datapoint...
# repeat this (sampling) many times.

In [9]:
def convert_valuation_back(valuation):
    # this is necessary, as jsonl could not serialize sets, but nltk expects sets for predicates.
    return [(v[0], set(v[1])) if v[0].isupper() else v for v in valuation]

In [11]:
def get_baseline_task1(datapoint1, datapoint2):
    valuation = datapoint1["Valuation"]
    target = datapoint1["Satisfied"]
    formula = datapoint2["Formula"]

    valuation = convert_valuation_back(valuation)
    val = Valuation(valuation)
    dom = val.domain
    m = nltk.sem.evaluate.Model(dom, val)
    g = nltk.sem.Assignment(dom)
    sat = m.evaluate(str(formula), g)
    if sat == True:
        prediction = "satisfied"
    elif sat == False:
        prediction = "unsatisfied"

    return prediction==target

In [12]:
max_baseline_datapoints = 1000000
currentcount = 0

task1_baseline_list = []
for df in splits:
    while currentcount < max_baseline_datapoints:
        datapoints = df.sample(n=2) # two distinct datapoints
        try: # if the evaluation of the parser is undefined, we do not know whether it is true, so we skip it.
            task1_baseline_list.append(get_baseline_task1(datapoints.iloc[0], datapoints.iloc[1]))
            currentcount += 1
        except:
            pass
        

task1_baseline = sum(task1_baseline_list) / len(task1_baseline_list)
task1_baseline

0.500493

In [None]:
# Algo Task 2:
# split dataset according to the same predicates.
# for each datapoint in each split:
# take another world model and see whether it turns out as it should (sat / unsat) given the world model, keys, from this datapoint...
# repeat this (sampling) many times.

In [13]:
def get_baseline_task2(datapoint1, datapoint2):
    valuation = datapoint2["Valuation"]
    target = datapoint1["Satisfied"]
    formula = datapoint1["Formula"]

    valuation = convert_valuation_back(valuation)
    val = Valuation(valuation)
    dom = val.domain
    m = nltk.sem.evaluate.Model(dom, val)
    g = nltk.sem.Assignment(dom)
    sat = m.evaluate(str(formula), g)
    if sat == True:
        prediction = "satisfied"
    elif sat == False:
        prediction = "unsatisfied"

    return prediction==target

In [14]:
max_baseline_datapoints = 1000000

currentcount = 0

task2_baseline_list = []
for df in splits:
    while currentcount < max_baseline_datapoints:
        datapoints = df.sample(n=2) # two distinct datapoints
        try: # if the evaluation of the parser is undefined, we do not know whether it is true, so we skip it.
            task2_baseline_list.append(get_baseline_task2(datapoints.iloc[0], datapoints.iloc[1]))
            currentcount += 1
        except:
            pass

task2_baseline = sum(task2_baseline_list) / len(task2_baseline_list)
task2_baseline

0.827161