You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To accomplish the NFDI example discussed in the org here, we need to develop some nodes for handling shell command execution and reading/writing files. IMO we should provide a suite of baseline nodes for such operations right in the standard library, as this is a super common use case.
@JNmpi already has tools for this in his branch, including showing how different atomic (in the greek sense) file/shell actions can be easily combined together in macros to run an executable that uses input and output files. These need some integration with the codebase, e.g. to leverage nodes' working_directory more directly, but are already great and prove the platform works for this case.
@samwaseda I thought we could us this issue to float some spec for std lib nodes? Here's what I had in mind so far:
Shell
Input
command: str gets shell executed by python
return_output: bool = True: Whether to return the stdout/stderr of the call as a string
stdout_filename: Optional[str] = None
stderr_filename: Optional[str] = None
Action: cd to the node's working directory and execute the code, and pipe the output to different places as requested by the input
Output
output: str | None: Whatever the call spat out, if you asked for this
stdout_filepath: str | None: The full path to the output file, i.e. something like (self.working_directory.path / self.input.stdout_filename.value).resolve() if the stdout_filename was not None
stderr_filepath: str | None: Similar. Maybe these should be a pathlib.Path instead of string?
This will take some dev work, as I'm not sure yet how to handle if a regular executor is trying to nest itself inside a conda_subprocess
WriteFile
Input
content: str
file_name: str
Action: Write the string to the name inside the working directory
Output:
filepath: str: or maybe pathlib.Path, as above
ReadFile
Input:
filepath: str | pathlib.Path
Output:
content: str
Path
Input:
name: str | pathlib.Path
location: Optional[str | pathlib.Path]
Action make sure there is a file where you think there is, with some friendly error messages/output/readiness. Probably we can basically just wrap a pathlib.Path object
Output:
path: pathlib.Path or str resolved path or something
Maybe also for moving files, walking directories, etc?
Then you have something like
frompyiron_workflowimportWorkflow@Workflow.wrap_as.single_value_node("content")defMyInputParser(a, b):
"""Python to a my exe's input file format"""return"""INP_VAR4TRANRULZ_A: {a}INP_VAR4TRANRULZ_BBBBB: {b}"""@Workflow.wrap_as.single_value_node("command")defMyExeCommand(input_file: str, output_file: str="my_exe_out.csv"):
returnf"my_exe -fin {input_file} -fout {output_file}"@Workflow.wrap_as.single_value_node("data")defMyOutputParser(file):
fromnumpyimportgenfromtxtoutput=genfromtxt(file, comments="~", skip_footer=42)
returnoutput
@Workflow.wrap_as.macro("data"):
defRunMyExe(macro, a, b, output_file_name="my_exe_out.csv"):
macro.parse_input=MyInputParser(a, b)
macro.write=Workflow.create.standard.WriteFile(
content=macro.parse,
file_name="funky_format.txt",
)
macro.command=MyExeCommand(
input_file=macro.write,
output_file=output_file_name
)
macro.shell=Workflow.create.standard.Shell(
command=macro.command,
return_output=False,
)
macro.output_path=Workflow.create.standard.Path(
name=output_file_name,
location=macro.shell.working_directory# This feels super fragile, some thinking is needed
)
macro.output=MyOutputParser(macro.output_path)
returnmacro.output
In the process of writing this up, I feel like some of the interactions with the working directory might need to be rethought. More experience will probably shed light. In the short term, one thing that would solve my immediate problem (# This feels super fragile...) is if Shell also took as input something like produced_files: Optional[str | list[str]] so we could keep using the node's working directory, but provide fully-resolved paths to the output files as part of the node output. Joerg's nodes all take a working directory as input, so one idea might be to allow such input and only fall back on the node working directory when this is not explicitly provided; that would let us do things like pass the macro's working directory as the working_directory input for all the children so everything happens in the same directory.
I'm also not sure whether these should be truly in the standard library, or if we should have some other "standard" libarary(ies), or even just decompose the standard into something like standard.io, standard.shell or what have you.
If we do this for a bit and start to notice regular patters, we could also include some "meta node" functions for building macros like the one above, where you just provide the parser and command nodes and macro-level IO, then it fills in all the standard stuff. Like
frompyiron_workflowimportWorkflow
... # Defining the same custom nodesMyMacro=Workflow.create.standard.meta.file_based_exe(
input_parser=MyInputParser,
command_generator=MyExeCommand,
output_parser=MyOutputParser
)
m=MyMacro()
The text was updated successfully, but these errors were encountered:
To accomplish the NFDI example discussed in the org here, we need to develop some nodes for handling shell command execution and reading/writing files. IMO we should provide a suite of baseline nodes for such operations right in the standard library, as this is a super common use case.
@JNmpi already has tools for this in his branch, including showing how different atomic (in the greek sense) file/shell actions can be easily combined together in macros to run an executable that uses input and output files. These need some integration with the codebase, e.g. to leverage nodes'
working_directory
more directly, but are already great and prove the platform works for this case.@samwaseda I thought we could us this issue to float some spec for std lib nodes? Here's what I had in mind so far:
Shell
command: str
gets shell executed by pythonreturn_output: bool = True
: Whether to return the stdout/stderr of the call as a stringstdout_filename: Optional[str] = None
stderr_filename: Optional[str] = None
output: str | None
: Whatever the call spat out, if you asked for thisstdout_filepath: str | None
: The full path to the output file, i.e. something like(self.working_directory.path / self.input.stdout_filename.value).resolve()
if thestdout_filename
was notNone
stderr_filepath: str | None
: Similar. Maybe these should be apathlib.Path
instead of string?ShellWithEnv
conda_subprocess
conda_subprocess
WriteFile
content: str
file_name: str
filepath: str
: or maybepathlib.Path
, as aboveReadFile
filepath: str | pathlib.Path
content: str
Path
name: str | pathlib.Path
location: Optional[str | pathlib.Path]
pathlib.Path
objectpath: pathlib.Path
or str resolved path or somethingMaybe also for moving files, walking directories, etc?
Then you have something like
In the process of writing this up, I feel like some of the interactions with the working directory might need to be rethought. More experience will probably shed light. In the short term, one thing that would solve my immediate problem (
# This feels super fragile...
) is ifShell
also took as input something likeproduced_files: Optional[str | list[str]]
so we could keep using the node's working directory, but provide fully-resolved paths to the output files as part of the node output. Joerg's nodes all take a working directory as input, so one idea might be to allow such input and only fall back on the node working directory when this is not explicitly provided; that would let us do things like pass the macro's working directory as theworking_directory
input for all the children so everything happens in the same directory.I'm also not sure whether these should be truly in the
standard
library, or if we should have some other "standard" libarary(ies), or even just decompose the standard into something likestandard.io
,standard.shell
or what have you.If we do this for a bit and start to notice regular patters, we could also include some "meta node" functions for building macros like the one above, where you just provide the parser and command nodes and macro-level IO, then it fills in all the standard stuff. Like
The text was updated successfully, but these errors were encountered: