# Inductive Logic Programming with Large Language Models (LLMs)

Given a background knowledge (BK), a set of logical clauses that represent prior knowledge about the domain a set of positive examples (E+), a set of ground facts (instances) which the learned theory should entail (i.e., these are examples that should be true according to the theory), a set of negative examples (E−), a set of ground facts which the learned theory should not entail (i.e., these are examples that should be false according to the theory). Find a hypothesis H (a set of logical clauses) such that:

- *Completeness:* For every example $e \in E^+, H \cup BK \models e$, meaning the hypothesis H, together with the background knowledge BK, should entail all positive examples.
- *Consistency:* For every example $e \in E^−, H \cup BK \not\models e$, meaning the hypothesis H, together with the background knowledge BK, should not entail any negative examples.

\begin{equation}
\forall e \in E^+, H \cup BK \models e
\end{equation}

\begin{equation}
\forall e \in E^−, H \cup BK \not\models e
\end{equation}

<img src="figures/hybrid_ilp.png">

## Evaluation

 Checking for the completeness and consistency conditions:
 
 - *True Positives (TP):* The number of positive examples correctly entailed by the hypothesis.
 - *False Positives (FP):* The number of negative examples incorrectly entailed by the hypothesis.
 - *False Negatives (FN):* The number of positive examples not entailed by the hypothesis.
 - *True Negatives (TN):* The number of negative examples correctly not entailed by the hypothesis.



## Generalising the ancestor relationship

In [None]:
#Import the relevant libraries
from critique.inductive_logic_programming import ILPSolver
from generation.generative_model import GPT
import yaml


data_name = 'example_1'

#Instantiate the generative model. We use GPT-4o-mini
with open('config.yaml', 'r') as file:
    config = yaml.safe_load(file)
    api_key = config.get('gpt-4o-mini', {}).get('api_key')

llm = GPT('gpt-4o-mini', api_key)


# Instantiate the ILPSolver. The solver uses Prolog to evaluate the theory generated by the LLM 
ilp_solver = ILPSolver(
    generative_model=llm,
    theory_name = data_name
)

result = ilp_solver.critique()

print("Background Knowledge\n")
background_data = open(f"./formalisation/prolog/{data_name}/bk.pl", 'r')
print(background_data.read())

print("\nPositive and Negative Examples\n")
examples = open(f"./formalisation/prolog/{data_name}/exs.pl", 'r')
print(examples.read())

print("\nGenerated Theory\n")
print(result["generated_theory"])

print("\nEvaluation\n")
print("Accuracy", result['accuracy'])
print("F1 score", result['f1'])

Background Knowledge

parent(john, mary).
parent(mary, susan).
parent(john, michael).
parent(michael, robert).

Positive and Negative Examples

pos(ancestor(john, susan)).
pos(ancestor(john, robert)).
pos(ancestor(mary, susan)).
neg(ancestor(mary, robert)).
neg(ancestor(michael, susan)).

Generated Theory

prolog
ancestor(X, Y) :- parent(X, Y).
ancestor(X, Y) :- parent(X, Z), ancestor(Z, Y).


Evaluation

Accuracy 1.0
F1 score 1.0
{'false_positives': [], 'false_negatives': []}


## Generalising the granparent relationship

In [None]:
# Example from https://github.com/logic-and-learning-lab/Popper/tree/main/examples/kinship-pi
from critique.inductive_logic_programming import ILPSolver
from generation.generative_model import GPT
import yaml

data_name = 'example_2'

with open('config.yaml', 'r') as file:
    config = yaml.safe_load(file)
    api_key = config.get('gpt-4o-mini', {}).get('api_key')
llm = GPT('gpt-4o-mini', api_key)


ilp_solver = ILPSolver(
    generative_model=llm,
    theory_name = data_name
)

result = ilp_solver.critique()

print("Background Knowledge\n")
background_data = open(f"./formalisation/prolog/{data_name}/bk.pl", 'r')
print(background_data.read())

print("\nPositive and Negative Examples\n")
examples = open(f"./formalisation/prolog/{data_name}/exs.pl", 'r')
print(examples.read())

print("\nGenerated Theory\n")
print(result["generated_theory"])

print("\nEvaluation\n")
print("Accuracy", result['accuracy'])
print("F1 score", result['f1'])

Background Knowledge

mother(ann,amy).
mother(ann,andy).
mother(amy,amelia).
mother(linda,gavin).
father(steve,amy).
father(steve,andy).
father(gavin,amelia).
father(andy,spongebob).

Positive and Negative Examples

pos(grandparent(ann,amelia)).
pos(grandparent(steve,amelia)).
pos(grandparent(ann,spongebob)).
pos(grandparent(steve,spongebob)).
pos(grandparent(linda,amelia)).
neg(grandparent(amy,amelia)).

Generated Theory

prolog
grandparent(X, Y) :- mother(X, Z), mother(Z, Y).
grandparent(X, Y) :- father(X, Z), mother(Z, Y).
grandparent(X, Y) :- mother(X, Z), father(Z, Y).
grandparent(X, Y) :- father(X, Z), father(Z, Y).


Evaluation

Accuracy 1.0
F1 score 1.0
{'false_positives': [], 'false_negatives': []}


## Counting the number of elements in a list

In [None]:
# Example from https://github.com/logic-and-learning-lab/Popper/tree/main/examples/synthesis-length
from critique.inductive_logic_programming import ILPSolver
from generation.generative_model import GPT
import yaml


data_name = 'example_len'

with open('config.yaml', 'r') as file:
    config = yaml.safe_load(file)
    api_key = config.get('gpt-4o-mini', {}).get('api_key')
llm = GPT('gpt-4o-mini', api_key)


ilp_solver = ILPSolver(
    generative_model=llm,
    theory_name = data_name
)

result = ilp_solver.critique()

print("Background Knowledge\n")
background_data = open(f"./formalisation/prolog/{data_name}/bk.pl", 'r')
print(background_data.read())

print("\nPositive and Negative Examples\n")
examples = open(f"./formalisation/prolog/{data_name}/exs.pl", 'r')
print(examples.read())

print("\nGenerated Theory\n")
print(result["generated_theory"])

print("\nEvaluation\n")
print("Accuracy", result['accuracy'])
print("F1 score", result['f1'])
print(result['wrong_examples'])

Background Knowledge

tail([_|T],T).
head([H|_],H).
empty([]).
zero(0).
one(1).
succ(0,1).
succ(1,2).
succ(2,3).
succ(3,4).
succ(4,5).
succ(5,6).
succ(6,7).
succ(7,8).
succ(8,9).
succ(9,10).

Positive and Negative Examples

pos(f([1],1)).
pos(f([2,2],2)).
pos(f([3,3,1],3)).
pos(f([4],1)).
pos(f([4,3],2)).
pos(f([4,3,2,2,3,5,2,7],8)).
neg(f([3,3,1],2)).
neg(f([4],0)).
neg(f([4],2)).

Generated Theory

prolog
f(L, H) :- head(L, H).
f(L, H) :- tail(L, T), f(T, H1), H is H1 + 1.


Evaluation

Accuracy 0.7777777777777778
F1 score 0.8
{'false_positives': [], 'false_negatives': ['f([4],1)', 'f([4,3],2)']}
