Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content: file and shell nodes #195

Open
liamhuber opened this issue Feb 5, 2024 · 0 comments
Open

Content: file and shell nodes #195

liamhuber opened this issue Feb 5, 2024 · 0 comments
Assignees

Comments

@liamhuber
Copy link
Member

liamhuber commented Feb 5, 2024

To accomplish the NFDI example discussed in the org here, we need to develop some nodes for handling shell command execution and reading/writing files. IMO we should provide a suite of baseline nodes for such operations right in the standard library, as this is a super common use case.

@JNmpi already has tools for this in his branch, including showing how different atomic (in the greek sense) file/shell actions can be easily combined together in macros to run an executable that uses input and output files. These need some integration with the codebase, e.g. to leverage nodes' working_directory more directly, but are already great and prove the platform works for this case.

@samwaseda I thought we could us this issue to float some spec for std lib nodes? Here's what I had in mind so far:

  • Shell
    • Input
      • command: str gets shell executed by python
      • return_output: bool = True: Whether to return the stdout/stderr of the call as a string
      • stdout_filename: Optional[str] = None
      • stderr_filename: Optional[str] = None
    • Action: cd to the node's working directory and execute the code, and pipe the output to different places as requested by the input
    • Output
      • output: str | None: Whatever the call spat out, if you asked for this
      • stdout_filepath: str | None: The full path to the output file, i.e. something like (self.working_directory.path / self.input.stdout_filename.value).resolve() if the stdout_filename was not None
      • stderr_filepath: str | None: Similar. Maybe these should be a pathlib.Path instead of string?
  • ShellWithEnv
    • Similar to above, but wrapping things in a conda_subprocess
    • This will take some dev work, as I'm not sure yet how to handle if a regular executor is trying to nest itself inside a conda_subprocess
  • WriteFile
    • Input
      • content: str
      • file_name: str
    • Action: Write the string to the name inside the working directory
    • Output:
      • filepath: str: or maybe pathlib.Path, as above
  • ReadFile
    • Input:
      • filepath: str | pathlib.Path
    • Output:
      • content: str
  • Path
    • Input:
      • name: str | pathlib.Path
      • location: Optional[str | pathlib.Path]
    • Action make sure there is a file where you think there is, with some friendly error messages/output/readiness. Probably we can basically just wrap a pathlib.Path object
    • Output:
      • path: pathlib.Path or str resolved path or something

Maybe also for moving files, walking directories, etc?

Then you have something like

from pyiron_workflow import Workflow

@Workflow.wrap_as.single_value_node("content")
def MyInputParser(a, b):
    """Python to a my exe's input file format"""
    return """
INP_VAR4TRANRULZ_A: {a}
INP_VAR4TRANRULZ_BBBBB: {b}
"""

@Workflow.wrap_as.single_value_node("command")
def MyExeCommand(input_file: str, output_file: str = "my_exe_out.csv"):
    return f"my_exe -fin {input_file} -fout {output_file}"

@Workflow.wrap_as.single_value_node("data")
def MyOutputParser(file):
    from numpy import genfromtxt
    output = genfromtxt(file, comments="~", skip_footer=42)
    return output

@Workflow.wrap_as.macro("data"):
def RunMyExe(macro, a, b, output_file_name="my_exe_out.csv"):
    macro.parse_input = MyInputParser(a, b)
    macro.write = Workflow.create.standard.WriteFile(
        content=macro.parse,
        file_name="funky_format.txt",
    )
    macro.command = MyExeCommand(
        input_file=macro.write,
        output_file=output_file_name
    )
    macro.shell = Workflow.create.standard.Shell(
        command=macro.command,
        return_output=False,
    )
    macro.output_path = Workflow.create.standard.Path(
        name=output_file_name,
        location=macro.shell.working_directory  # This feels super fragile, some thinking is needed
    )
    macro.output = MyOutputParser(macro.output_path)
    return macro.output

In the process of writing this up, I feel like some of the interactions with the working directory might need to be rethought. More experience will probably shed light. In the short term, one thing that would solve my immediate problem (# This feels super fragile...) is if Shell also took as input something like produced_files: Optional[str | list[str]] so we could keep using the node's working directory, but provide fully-resolved paths to the output files as part of the node output. Joerg's nodes all take a working directory as input, so one idea might be to allow such input and only fall back on the node working directory when this is not explicitly provided; that would let us do things like pass the macro's working directory as the working_directory input for all the children so everything happens in the same directory.

I'm also not sure whether these should be truly in the standard library, or if we should have some other "standard" libarary(ies), or even just decompose the standard into something like standard.io, standard.shell or what have you.

If we do this for a bit and start to notice regular patters, we could also include some "meta node" functions for building macros like the one above, where you just provide the parser and command nodes and macro-level IO, then it fills in all the standard stuff. Like

from pyiron_workflow import Workflow

...  # Defining the same custom nodes

MyMacro = Workflow.create.standard.meta.file_based_exe(
    input_parser=MyInputParser,
    command_generator=MyExeCommand,
    output_parser= MyOutputParser
)

m = MyMacro()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants