Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Next steps #68

Open
4 of 8 tasks
JNmpi opened this issue Nov 5, 2023 · 4 comments
Open
4 of 8 tasks

Next steps #68

JNmpi opened this issue Nov 5, 2023 · 4 comments
Labels
enhancement New feature or request

Comments

@JNmpi
Copy link

JNmpi commented Nov 5, 2023

At our workflow hackathons at the ADIS meeting, we had some interesting discussions including a wish list from the user perspective. I am also thinking a lot about features that we need in the pyiron_workflow module to realize a syntax and functionality that is close to that of pyiron_base and pyiron_atomistic, but that contains all the nice architectural features that we get with the node-based approach. Below is a likely rather incomplete list of features, modules, etc. that in my opinion are crucial. It would be helpful if we could extend and prioritize this list. Also, it would be helpful if volunteers adopt the specifications and developments of these features.

  • Use ironflow's graphical user interface and connect it to the workflow model. @samwaseda mentioned that he would be interested in looking into this development.
  • Provide functionality present in ironflow: e.g. batch (ideally in connection with executors) and show to access and process node data
  • Provide functionality for *saving and loading workflows: wf.save('my_workflow') and wf.save('my_workflow')
  • Provide functions to (optionally) store node input and output data in hdf5. Provide filters to define which data objects should be stored or not stored together with logging levels.
  • Same as above for storing/loading node and data objects into/from databases
  • Add ontologic typing
  • Provide easy access to executors and efficient and automatic distribution of the node-generated tasks
  • Provide drawing tools to show the provenance of an executed workflow, i.e., similar to the graphs produced by AIIDA

As overarching guiding principles of the required functionality we should test and benchmark against pyiron_base/atomistics and ironflow. The new pyiron_workflow class should provide an easy approach to realize the same functionality. As an example see the https://github.com/pyiron/pyiron_workflow/tree/JNmpi_lammps_nodes branch where I tried to construct lammps-based workflows that closely mimic our pyiron syntax (see e.g. the jupyter notebook in this branch: pyiron_like_workflows.ipynb).

Tasks

No tasks being tracked yet.
@JNmpi JNmpi added the enhancement New feature or request label Nov 5, 2023
@liamhuber
Copy link
Member

Use ironflow's graphical user interface and connect it to the workflow model. @samwaseda mentioned that he would be interested in looking into this development.

Super! I guess this will be a test of how well structured the ironflow code is 😅 I am sure we can adapt it to move the backend from ryven to pyiron_workflow while keeping (most of) the front end, but time will tell how easy that is. I'm happy to also be involved in this.

@liamhuber
Copy link
Member

Provide functionality present in ironflow: e.g. batch (ideally in connection with executors) and show to access and process node data

So this largely already exists in pyiron_workflow in the for-loop meta node! It even makes the looped/batched IO ALL_CAPS like ironflow does.

On the GUI side I am OK with having a "batch" button that replaces a node with a for-loop-wrapper of that node, and we can absolutely think about some nice syntactic sugar to make the transformation easier in the code-based interface as well. However, Ironflow actually embeds the batching inside the generic node, and I really can't recommend this any more; it creates unnecessary complication, had to be done one dimension at a time, and made the type checking hell. So under the hood I'd like to maintain the current paradigm of the for-meta-loop creating a bunch of node replicas.

In this context we will want to think about some syntactic shortcuts for the for-loop to apply an executor to each of its children instead of to itself, but that should be easy.

@liamhuber
Copy link
Member

Provide drawing tools to show the provenance of an executed workflow, i.e., similar to the graphs produced by AIIDA

I guess this is closely related to the node_identifier used in the Creator being something more sophisticated than a string for the module import. The current infrastructure was built with such an extension in mind, even though what's actually used is still very simple. We should pull @srmnitc in on this task.

@liamhuber
Copy link
Member

Provide easy access to executors and efficient and automatic distribution of the node-generated tasks

Accessing a standard python single-core parallel process is currently as easy as node.executor = True. These can't be nested.

Adding syntactic sugar to distribute executors to children, e.g. wf.nodes.executor = SOMETHING setting it for all children or what have you, is an ok idea and should be easy.

Looping in more sophisticated executors is critical -- pympipool is still pending this pr.

We should also pull in @pmrv with the idea of setting a tinybase.Submitter instead of an executor, so that multiple different nodes can pile onto the same remote resources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants