# Ontological connection checking

Here I want to play around with ontological type checking in `pyiron_workflow` using `semantikon`'s `u` annotations.

In [12]:
import rdflib

from semantikon.metadata import u
from semantikon import ontology

import pyiron_workflow as pwf
from pyiron_workflow import knowledge, suggest
from pyiron_workflow.channels import ChannelConnectionError
from pyiron_workflow.nodes.composite import FailedChildError


EX = rdflib.Namespace("http://www.example.org/")

class Meal: ...

class Garbage: ...

@pwf.as_function_node("pizza")
def prepare_pizza() -> u(Meal, uri=EX.Pizza):
    return Meal()

@pwf.as_function_node("unidentified_meal")
def prepare_non_ontological_meal() -> Meal:
    return Meal()

@pwf.as_function_node("rice")
def prepare_rice() -> u(Meal, uri=EX.Rice):
    return Meal()

@pwf.as_function_node("garbage")
def prepare_garbage() -> u(Garbage, uri=EX.Garbage):
    return Garbage()

@pwf.as_function_node("garbage")
def prepare_unhinted_garbage():
    return Garbage()

@pwf.as_function_node("verdict")
def eat(meal: u(Meal, uri=EX.Meal)) -> str:
    return f"Yummy {meal.__class__.__name__} meal"

@pwf.as_function_node("verdict")
def eat_pizza(meal: u(Meal, uri=EX.Pizza)) -> str:
    return f"Yummy {meal.__class__.__name__} pizza"

## Both fully hinted

Works fine

In [13]:
wf = pwf.Workflow("ontoflow")
wf.make = prepare_pizza()
wf.eat = eat_pizza(wf.make)
wf()

{'eat__verdict': 'Yummy Meal pizza'}

## Upstream type hint is missing

Standard `pyiron_workflow` typing behaviour: we are allowed to form the connection (since the source has no hint), but at runtime, we will fail when we try to actually assign the value

In [14]:
wf = pwf.Workflow("no_type")
wf.make = prepare_unhinted_garbage()
wf.eat = eat_pizza(wf.make)
try:
    wf()
except FailedChildError as e:
    print(e)

/no_type encountered error in child: {'/no_type/eat.accumulate_and_run': TypeError("The channel /no_type/eat.meal cannot take the value `<__main__.Garbage object at 0x11fd86f60>` (<class '__main__.Garbage'>) because it is not compliant with the type hint typing.Annotated[__main__.Meal, ('uri', rdflib.term.URIRef('http://www.example.org/Pizza'))]")}


## Upstream type hint is wrong

Standard `pyiron_workflow` typing behaviour: we're not even allowed to form the connection -- the recipe would be invalid

In [15]:
wf = pwf.Workflow("no_type")
wf.make = prepare_garbage()
try:
    wf.eat = eat_pizza(wf.make)
except ChannelConnectionError as e:
    print(e)

The channel /no_type/make.garbage (<class 'pyiron_workflow.mixin.injection.OutputDataWithInjection'>) has the correct type (<class 'pyiron_workflow.channels.OutputData'>) to connect with /eat_pizza.meal (<class 'pyiron_workflow.channels.InputData'>), but is not a valid connection.Please check type hints, etc. /no_type/make.garbage.type_hint = typing.Annotated[__main__.Garbage, ('uri', rdflib.term.URIRef('http://www.example.org/Garbage'))]; /eat_pizza.meal.type_hint = typing.Annotated[__main__.Meal, ('uri', rdflib.term.URIRef('http://www.example.org/Pizza'))]


So far, so good: `u` decoration has no negative impact on the existing type hint checking procedures

## Upstream ontological hint is missing

New ontological behaviour: As with type hints, if one side is missing we just let things pass. Unlike type hints, we can also _execute_ the workflow, because the ontologies only impact the recipe-level behaviour, not the instance behaviour!

In [16]:
wf = pwf.Workflow("no_ontology")
wf.make = prepare_non_ontological_meal()
wf.eat = eat_pizza(wf.make)
wf()

{'eat__verdict': 'Yummy Meal pizza'}

## Upstream ontological hint is WRONG

New ontological behaviour: new ontological type checking now prevents us from even forming the ontologically invalid connection!

In [17]:
wf = pwf.Workflow("failed_ontology")
wf.make = prepare_rice()
try:
    wf.eat = eat_pizza(wf.make)
except ChannelConnectionError as e:
    print(e)

The channel /failed_ontology/make.rice (<class 'pyiron_workflow.mixin.injection.OutputDataWithInjection'>) has the correct type (<class 'pyiron_workflow.channels.OutputData'>) to connect with /eat_pizza.meal (<class 'pyiron_workflow.channels.InputData'>), but is not a valid connection.Please check type hints, etc. /failed_ontology/make.rice.type_hint = typing.Annotated[__main__.Meal, ('uri', rdflib.term.URIRef('http://www.example.org/Rice'))]; /eat_pizza.meal.type_hint = typing.Annotated[__main__.Meal, ('uri', rdflib.term.URIRef('http://www.example.org/Pizza'))]


## Downstream ontological hint is less specific

This should work fine...

In [18]:
wf = pwf.Workflow("relaxed_ontology")
wf.make = prepare_rice()
try:
    wf.eat = eat(wf.make)
except ChannelConnectionError as e:
    print(e)

The channel /relaxed_ontology/make.rice (<class 'pyiron_workflow.mixin.injection.OutputDataWithInjection'>) has the correct type (<class 'pyiron_workflow.channels.OutputData'>) to connect with /eat.meal (<class 'pyiron_workflow.channels.InputData'>), but is not a valid connection.Please check type hints, etc. /relaxed_ontology/make.rice.type_hint = typing.Annotated[__main__.Meal, ('uri', rdflib.term.URIRef('http://www.example.org/Rice'))]; /eat.meal.type_hint = typing.Annotated[__main__.Meal, ('uri', rdflib.term.URIRef('http://www.example.org/Meal'))]


But! We forgot something! This form of failure is known from the `semantikon` notebook whence these demonstration workflow spring: we never informed the ontology that "rice" is a subclass of "meal"!

We let the ontology know this by adding the corresponding triple to our `rdflib.Graph`. In `pyiron_workflow` we can manage this by pre-populating a `knowledge: rdflib.Graph` property on the graph root (i.e. top-most object) as follows:

In [19]:
wf = pwf.Workflow("relaxed_ontology")

wf.knowledge = rdflib.Graph()
wf.knowledge.add((EX.Rice, rdflib.RDFS.subClassOf, EX.Meal))

wf.make = prepare_rice()
wf.eat = eat(wf.make)
wf()

{'eat__verdict': 'Yummy Meal meal'}

# Ontological triples

Alright, for our simple pizza example things are working beautifully. Let's try it with the clothes example.

In [20]:
from rdflib import OWL, Namespace

import pyiron_workflow as pwf
from pyiron_workflow import knowledge
from semantikon.metadata import u
from semantikon import ontology

EX = Namespace("http://www.example.org/")

class Clothes:
    pass

@pwf.as_function_node
def wash(clothes: Clothes) -> u(
    Clothes,
    triples=(EX.hasProperty, EX.cleaned),
    derived_from="inputs.clothes"
):
    ...
    return clothes

@pwf.as_function_node
def dye(clothes: Clothes, color="blue") -> u(
    Clothes,
    triples=(EX.hasProperty, EX.color),
    derived_from="inputs.clothes",
):
    ...
    return clothes

@pwf.as_function_node
def sell(
    clothes: u(
        Clothes, restrictions=(
            ((OWL.onProperty, EX.hasProperty), (OWL.someValuesFrom, EX.cleaned)),
            ((OWL.onProperty, EX.hasProperty), (OWL.someValuesFrom, EX.color))
        )
    )
) -> int:
    price = 10
    return price

In [21]:
my_correct_wf = pwf.Workflow("my_correct_workflow")
my_correct_wf.dyed_clothes = dye(Clothes())
my_correct_wf.washed_clothes = wash(my_correct_wf.dyed_clothes)
my_correct_wf.money = sell(my_correct_wf.washed_clothes)
my_correct_wf()
# Why does it think the top level input needs the properties already???
# This was working fine when I generate the graph and validate it after all is said and done...

ChannelConnectionError: The channel /my_correct_workflow/washed_clothes.clothes (<class 'pyiron_workflow.mixin.injection.OutputDataWithInjection'>) has the correct type (<class 'pyiron_workflow.channels.OutputData'>) to connect with /sell.clothes (<class 'pyiron_workflow.channels.InputData'>), but is not a valid connection.Please check type hints, etc. /my_correct_workflow/washed_clothes.clothes.type_hint = typing.Annotated[__main__.Clothes, ('triples', (rdflib.term.URIRef('http://www.example.org/hasProperty'), rdflib.term.URIRef('http://www.example.org/cleaned')), 'derived_from', 'inputs.clothes')]; /sell.clothes.type_hint = typing.Annotated[__main__.Clothes, ('restrictions', (((rdflib.term.URIRef('http://www.w3.org/2002/07/owl#onProperty'), rdflib.term.URIRef('http://www.example.org/hasProperty')), (rdflib.term.URIRef('http://www.w3.org/2002/07/owl#someValuesFrom'), rdflib.term.URIRef('http://www.example.org/cleaned'))), ((rdflib.term.URIRef('http://www.w3.org/2002/07/owl#onProperty'), rdflib.term.URIRef('http://www.example.org/hasProperty')), (rdflib.term.URIRef('http://www.w3.org/2002/07/owl#someValuesFrom'), rdflib.term.URIRef('http://www.example.org/color')))))]

In [19]:
my_correct_wf.money.inputs.clothes.type_hint

AttributeError: Could not find attribute 'money' on my_correct_workflow (Workflow) or among its children (dict_keys(['dyed_clothes', 'washed_clothes'])).

In [20]:
ontology.validate_values(knowledge.parse_workflow(my_correct_wf))

{'missing_triples': [], 'incompatible_connections': []}

In [None]:
@pwf.as_macro_node
def my_correct_macro(self, clothes: Clothes):
    self.dyed_clothes = dye(clothes)
    self.washed_clothes = wash(self.dyed_clothes)
    self.money = sell(self.washed_clothes)
    return self.money

correct_m = my_correct_macro(Clothes())
correct_m()

In [20]:
ontology.validate_values(knowledge.parse_workflow(correct_m))

{'missing_triples': [], 'incompatible_connections': []}

In [21]:
my_wrong_wf = pwf.Workflow("my_wrong_workflow")
my_wrong_wf.washed_clothes = wash(Clothes())
my_wrong_wf.money = sell(my_wrong_wf.washed_clothes)
my_wrong_wf()

{'money__10': 10}

In [22]:
ontology.validate_values(knowledge.parse_workflow(my_wrong_wf))

{'missing_triples': [(rdflib.term.URIRef('my_wrong_workflow.money.inputs.clothes'),
   rdflib.term.URIRef('http://www.example.org/hasProperty'),
   rdflib.term.URIRef('http://www.example.org/color'))],
 'incompatible_connections': []}

In [23]:
@pwf.as_macro_node
def my_wrong_macro(self, clothes: Clothes):
    self.washed_clothes = wash(clothes)
    self.money = sell(self.washed_clothes)
    return self.money

wrong_m = my_wrong_macro(Clothes())
wrong_m()

{'money': 10}

In [24]:
ontology.validate_values(knowledge.parse_workflow(wrong_m))

{'missing_triples': [(rdflib.term.URIRef('my_wrong_macro.money.inputs.clothes'),
   rdflib.term.URIRef('http://www.example.org/hasProperty'),
   rdflib.term.URIRef('http://www.example.org/color'))],
 'incompatible_connections': []}

In [25]:
@pwf.as_function_node
def dye_with_cancel(clothes: Clothes, color="blue") -> u(
    Clothes,
    triples=(EX.hasProperty, EX.color),
    derived_from="inputs.clothes",
    cancel=(EX.hasProperty, EX.cleaned)
):
    return clothes

In [26]:
my_wf_with_wrong_order = pwf.Workflow("my_workflow_with_wrong_order")
my_wf_with_wrong_order.washed_clothes = wash(Clothes())
my_wf_with_wrong_order.dyed_clothes = dye_with_cancel(my_wf_with_wrong_order.washed_clothes)
my_wf_with_wrong_order.money = sell(my_wf_with_wrong_order.dyed_clothes)
my_wf_with_wrong_order()

{'money__10': 10}

In [27]:
ontology.validate_values(knowledge.parse_workflow(my_wf_with_wrong_order))

{'missing_triples': [(rdflib.term.URIRef('my_workflow_with_wrong_order.money.inputs.clothes'),
   rdflib.term.URIRef('http://www.example.org/hasProperty'),
   rdflib.term.URIRef('http://www.example.org/cleaned'))],
 'incompatible_connections': []}

This example produces expected outcomes the whole way through.

As a first stage then, we can internally create type-validated connections, search for a graph root, transform the entire workflow graph to a "parsed" graph, feed that to semantikon, and reverse the connection iff we encounter a problem.
This is not the most computationally efficient approach, but should be pretty robust and very fast to implement.

# Node suggestions

Given a hinted channel instance in the context of some workflow, we can ask for suggestions of sibling (node, channel) to connect to:

In [22]:
wf = pwf.Workflow("ontoflow")
wf.make = prepare_pizza()
wf.eat = eat_pizza()
suggest.suggest_connections(wf.eat.inputs.meal)

Looking at make
outputs OutputsWithInjection ['pizza']
What about make__pizza


[(<__main__.prepare_pizza at 0x11fe022a0>,
  <pyiron_workflow.mixin.injection.OutputDataWithInjection at 0x11fe02030>)]

# Known Issues

- This implementation naively creates a circular dependence: the `channels` module needs the `knowledge` module to evaluate the ontological validity of new connections, but the `knowledge` module relies on `workflow` and `nodes.composite` to parse graphs, and these in turn depend on `channels`. For now, we avoid dealing with this by importing `knowledge` locally in `channels` when it's time to use the ontology.
- There are strings everywhere. The ontological features rely heavily on dictionaries, which are tough to type check and rely on string-based key access. E.g., when we want to see if the ontological validation raised any errors, we need to manually check on two dictionary entries by name. This is fragile.
- It is heinously inefficient. At every new ontologically-hinted connection, we reconstruct the entire recipe dictionary before positing the new connection and checking its validity. I'm not sure we'll get around validation operating on the entire graph, but we should adjust `pyiron_workflow` to store more recipe information at the class level where it is statically known (macros, function nodes, etc)
- This is not _at all_ edge-case tested. This works for the special cases we test here, and there's no generic guarantee this functionality works for more complex ontologies or more complex workflows.

# Open Questions

- What will happen when we more deeply nest workflows?
- Where and how will things break when we introduce other node types, i.e. for- and while-loops, dataclass nodes, or other transformers?