Skip to content

Conversation

jan-janssen
Copy link
Member

@jan-janssen jan-janssen commented May 23, 2025

Status of the Implementation for pyiron_workflow
Binder

Example Name Export to Python Workflow Definition Import from Python Workflow Definition
arithmetic
nfdi4ing
quantum_espresso

@jan-janssen jan-janssen marked this pull request as draft May 23, 2025 08:29
@liamhuber
Copy link

Hi @jan-janssen, I'm just migrating your e-mail comment here

The export from pyiron_workflow works fine, at least for the first of the three examples, but the import from the Python Workflow Definition to pyiron_workflow currently still has issues, so I was looking for existing code I could recycle.

I forked the repo. I didn't install all the dependencies for the other managers, but for the arithmetic example I can see that

  • ✅ building the pyiron_workflow workflow up from pieces works fine and saving it makes (up to some ordering) the same json file
  • ❌ loading the existing workflow.json file directly doesn't produce the correct pyiron_workflow.Workflow object

Later this morning I'll tack a crack at debugging python_workflow_definition.pyiron_workflow.load_workflow_json to resolve the failing step.

@jan-janssen
Copy link
Member Author

@liamhuber Thanks a lot, based on your pull request I was able to get the arithmetic example to work as well as the nfdi4ing example. The missing part is the quantum espresso example, here we use the following get_dict() function to merge the outputs of multiple individual jobs into one dictionary:

def get_dict(**kwargs) -> dict:
    return {k: v for k, v in kwargs.items()}

This is not compatible with the type checking from pyiron_workflow so the tests currently fail.

@jan-janssen jan-janssen changed the title Implement Python Workflow Definition in pyiron_workflow [New WfMS] Implement Python Workflow Definition in pyiron_workflow May 24, 2025
@liamhuber
Copy link

Super, glad it is working so well!

Unfortunately the function you show for the quantum espresso example will never work. In pyiron_workflow, we insist that each node's IO is known already at the class level (ie at recipe time).

You might be able to hack something together with the inputs-to-dict meta-node (which creates a subclass of a generic input channels to dictionary node with fixed input channels). Better, IMO, would be to write a more explicit function for the workflow.

@jan-janssen
Copy link
Member Author

The challenge is that the calculate_qe() function takes a dictionary as an input with one parameter being the structure of the previous step, so we use the above function to insert the structure into the dictionary.

@liamhuber Can you explain a bit more more how the hacky solution would work?

@liamhuber
Copy link

The challenge is that the calculate_qe() function takes a dictionary as an input with one parameter being the structure of the previous step, so we use the above function to insert the structure into the dictionary.

The problem isn't the dict, it's the **kwargs. If you know a-priori what fields are going in and explicitly write a function to take each and put them in a dictionary, everything should work with pyiron_workflow out-of-the-box.

@liamhuber Can you explain a bit more more how the hacky solution would work?

Not really, I just wonder if there's some way to abuse inputs_to_dict to work in a bespoke way for this particular workflow.

In general though,

def get_dict(**kwargs) -> dict:
    return {k: v for k, v in kwargs.items()}

seems like a bad node to me. It doesn't have clear handles for its IO at the recipe level -- you only actually know what the node's interface looks like at runtime. I haven't dug into how you're getting it to work for the other examples, so maybe you have some graceful way around this.

@jan-janssen
Copy link
Member Author

@liamhuber I was able to implement a workaround for importing the Quantum ESPRESSO workflow from the Python Workflow Definition to pyiron_workflow. For the other way of exporting from pyiron_workflow to the Python Workflow Definition I am still struggling with the input_to_dict() function. For some reason the nodes are already executed when I call inputs_to_dict(), as you can see in the demonstration notebook:
https://github.com/pythonworkflow/python-workflow-definition/blob/pyiron_workflow/example_workflows/quantum_espresso/pyiron_workflow.ipynb

Furthermore, the inputs_to_dict() node is not correctly visualized when I draw the corresponding workflow. I created a small example to demonstrate this in a separate issue:
pyiron/pyiron_workflow#648

@liamhuber
Copy link

@jan-janssen, following the pyiron_workflow issue, let me know if you are still having hiccups once you skip the "generator" step. Fingers cross that will clear up the remaining issues.

@jan-janssen
Copy link
Member Author

@liamhuber Thanks a lot, that definitely clarified the issue with the execution before calling run(). I updated the corresponding notebook and I am optimistic that it is now possible to export it to the Python Workflow Definition.

Still there is one confusing issue remaining when I use the following code, everything works fine:

volume_dict = {"s_" + str(i): j["volume"] for i, j in enumerate(job_strain_lst)}
wf.volume_dict = inputs_to_dict(
    input_specification=list(volume_dict.keys()),
    **volume_dict
)
energy_dict = {"s_" + str(i): j["energy"] for i, j in enumerate(job_strain_lst)}
wf.energy_dict = inputs_to_dict(
    input_specification=list(energy_dict.keys()),
    **energy_dict,
)
wf.volume_lst = get_values_from_dict(
    input_dict=wf.volume_dict
)
wf.energy_lst = get_values_from_dict(
    input_dict=wf.energy_dict
)

But when I use the following modied code, it does not:

volume_dict = {"s_" + str(i): j["volume"] for i, j in enumerate(job_strain_lst)}
wf.volume_lst = get_values_from_dict(
    input_dict=inputs_to_dict(
        input_specification=list(volume_dict.keys()),
        **volume_dict
    )
)
energy_dict = {"s_" + str(i): j["energy"] for i, j in enumerate(job_strain_lst)}
wf.energy_lst = get_values_from_dict(
    input_dict=inputs_to_dict(
        input_specification=list(energy_dict.keys()),
        **energy_dict,
    )
)

I get the following error message:

KeyError: 'The channel [/my_workflow/volume_lst.input_dict](https://hub.gesis.mybinder.org/my_workflow/volume_lst.input_dict) has a connection to the upstream channel [/InputsToDictm3100931232959297192.dict](https://hub.gesis.mybinder.org/InputsToDictm3100931232959297192.dict), but the upstream owner InputsToDictm3100931232959297192 was not found among nodes. All nodes in the data flow dependency tree must be included.'

@liamhuber
Copy link

@jan-janssen, super!

Yes, I'm open to suggestions for how to make the error message friendlier, but it contains all the necessary information we need. Namely, we're trying to connect InputsToDictm3100931232959297192.dict -> my_workflow/volume_lst.input_dict, but InputsToDictm3100931232959297192 isn't a sibling so the connection causes trouble (I can't remember or tell from the error message if you're getting this immediately at connection time (optimal) or later at runtime (better than never, but failing earlier would be better)).

It works in the first example because you're explicitly declaring the inputs_to_dict node as wf.*_dict =. IMO this is the clearest and therefore best way to do it. If you want, you can use the second example where the node gets an automatically generated name by passing in the parentage at node instantiation. I think this will fix it for you if you:

volume_dict = {"s_" + str(i): j["volume"] for i, j in enumerate(job_strain_lst)}
wf.volume_lst = get_values_from_dict(
    input_dict=inputs_to_dict(
        input_specification=list(volume_dict.keys()),
        parent=wf,  # Parent the implicit node before using as input
        **volume_dict
    )
)
energy_dict = {"s_" + str(i): j["energy"] for i, j in enumerate(job_strain_lst)}
wf.energy_lst = get_values_from_dict(
    input_dict=inputs_to_dict(
        input_specification=list(energy_dict.keys()),
        parent=wf,  # Parent the implicit node before using as input
        **energy_dict,
    )
)

@jan-janssen
Copy link
Member Author

@liamhuber Thanks again, I was able to fix the last remaining issues. Now the interface between pyiron_workflow and the Python Workflow Definition supports all three examples in both directions import and export.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants