Content: file and shell nodes #195

liamhuber · 2024-02-05T19:19:40Z

To accomplish the NFDI example discussed in the org here, we need to develop some nodes for handling shell command execution and reading/writing files. IMO we should provide a suite of baseline nodes for such operations right in the standard library, as this is a super common use case.

@JNmpi already has tools for this in his branch, including showing how different atomic (in the greek sense) file/shell actions can be easily combined together in macros to run an executable that uses input and output files. These need some integration with the codebase, e.g. to leverage nodes' working_directory more directly, but are already great and prove the platform works for this case.

@samwaseda I thought we could us this issue to float some spec for std lib nodes? Here's what I had in mind so far:

Shell
- Input
  - command: str gets shell executed by python
  - return_output: bool = True: Whether to return the stdout/stderr of the call as a string
  - stdout_filename: Optional[str] = None
  - stderr_filename: Optional[str] = None
- Action: cd to the node's working directory and execute the code, and pipe the output to different places as requested by the input
- Output
  - output: str | None: Whatever the call spat out, if you asked for this
  - stdout_filepath: str | None: The full path to the output file, i.e. something like (self.working_directory.path / self.input.stdout_filename.value).resolve() if the stdout_filename was not None
  - stderr_filepath: str | None: Similar. Maybe these should be a pathlib.Path instead of string?
ShellWithEnv
- Similar to above, but wrapping things in a conda_subprocess
- This will take some dev work, as I'm not sure yet how to handle if a regular executor is trying to nest itself inside a conda_subprocess
WriteFile
- Input
  - content: str
  - file_name: str
- Action: Write the string to the name inside the working directory
- Output:
  - filepath: str: or maybe pathlib.Path, as above
ReadFile
- Input:
  - filepath: str | pathlib.Path
- Output:
  - content: str
Path
- Input:
  - name: str | pathlib.Path
  - location: Optional[str | pathlib.Path]
- Action make sure there is a file where you think there is, with some friendly error messages/output/readiness. Probably we can basically just wrap a pathlib.Path object
- Output:
  - path: pathlib.Path or str resolved path or something

Maybe also for moving files, walking directories, etc?

Then you have something like

from pyiron_workflow import Workflow

@Workflow.wrap_as.single_value_node("content")
def MyInputParser(a, b):
    """Python to a my exe's input file format"""
    return """
INP_VAR4TRANRULZ_A: {a}
INP_VAR4TRANRULZ_BBBBB: {b}
"""

@Workflow.wrap_as.single_value_node("command")
def MyExeCommand(input_file: str, output_file: str = "my_exe_out.csv"):
    return f"my_exe -fin {input_file} -fout {output_file}"

@Workflow.wrap_as.single_value_node("data")
def MyOutputParser(file):
    from numpy import genfromtxt
    output = genfromtxt(file, comments="~", skip_footer=42)
    return output

@Workflow.wrap_as.macro("data"):
def RunMyExe(macro, a, b, output_file_name="my_exe_out.csv"):
    macro.parse_input = MyInputParser(a, b)
    macro.write = Workflow.create.standard.WriteFile(
        content=macro.parse,
        file_name="funky_format.txt",
    )
    macro.command = MyExeCommand(
        input_file=macro.write,
        output_file=output_file_name
    )
    macro.shell = Workflow.create.standard.Shell(
        command=macro.command,
        return_output=False,
    )
    macro.output_path = Workflow.create.standard.Path(
        name=output_file_name,
        location=macro.shell.working_directory  # This feels super fragile, some thinking is needed
    )
    macro.output = MyOutputParser(macro.output_path)
    return macro.output

In the process of writing this up, I feel like some of the interactions with the working directory might need to be rethought. More experience will probably shed light. In the short term, one thing that would solve my immediate problem (# This feels super fragile...) is if Shell also took as input something like produced_files: Optional[str | list[str]] so we could keep using the node's working directory, but provide fully-resolved paths to the output files as part of the node output. Joerg's nodes all take a working directory as input, so one idea might be to allow such input and only fall back on the node working directory when this is not explicitly provided; that would let us do things like pass the macro's working directory as the working_directory input for all the children so everything happens in the same directory.

I'm also not sure whether these should be truly in the standard library, or if we should have some other "standard" libarary(ies), or even just decompose the standard into something like standard.io, standard.shell or what have you.

If we do this for a bit and start to notice regular patters, we could also include some "meta node" functions for building macros like the one above, where you just provide the parser and command nodes and macro-level IO, then it fills in all the standard stuff. Like

from pyiron_workflow import Workflow

...  # Defining the same custom nodes

MyMacro = Workflow.create.standard.meta.file_based_exe(
    input_parser=MyInputParser,
    command_generator=MyExeCommand,
    output_parser= MyOutputParser
)

m = MyMacro()

The text was updated successfully, but these errors were encountered:

jan-janssen mentioned this issue Feb 6, 2024

Create a pyiron_workflow example pyiron-dev/NFDI4Ing_pyiron_base#2

Open

liamhuber assigned samwaseda Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content: file and shell nodes #195

Content: file and shell nodes #195

liamhuber commented Feb 5, 2024 •

edited

Content: file and shell nodes #195

Content: file and shell nodes #195

Comments

liamhuber commented Feb 5, 2024 • edited

liamhuber commented Feb 5, 2024 •

edited