Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Proposal: make TaskParameter type field optional #76
Currently the TaskParameter type is a required field. The caller needs to specify if the path is a
However I have the impression this information add almost no value, because the server implementation needs in any case query the remote resource the apply the required operation (upload/download).
On the other hand, requiring this information as mandatory severely limits the ability of tools that dynamically discovers the type of output paths, such as Nextflow, to implement this specification.
My proposal is to make this field optional. When it is specified the semantic remains identical, and the server validates the inputs/outputs verifying the type matches. When the type is not specified the server just applies the operation to the path regardless if it is a file or directory.
As I understand, one argument for keeping this field is that it adds a level of type checking to the system: if you define the input/output as a Directory and the system encounters a File, the system should let the user know something unexpected occurred by returning an error.
A counterargument is that this type checking responsibility should be handled by workflow engines or other TES clients. In the case of Nextflow, this allows them to forgo type checking, which is a core feature of their system. Engines which want simple file/directory type checking would implement this by communicating with the storage layer directly.
Hope I captured the arguments accurately.
From a technical standpoint, we (the Funnel dev team) think it's possible to implement a system with a pre-defined input/output file type in Funnel, but we'd like the chance to get it fully implemented in order to discover any edge cases.
There's are three options here: 1) keep mandatory, 2) make it optional, 3) removed it.
I've explained in the original comment why having it as a mandatory field is a problem. I'm fine both with 2 and 3. Though I tend to think the best solution would be 2 (optional), because it would give the opportunity to enforce the type checking if needed/requested.
Also it could be useful in relation to #77 (eg. outputs all files/directories matching a glob pattern).
I would want to keep it around but making it optional is fine. I'm curious @pditommaso how NextFlow handles HTTP inputs when this is not defined? A lot of http endpoints don't allow directory listing so for us it is a quick way to throw it back with an error.
I understand that for FTP, S3 and other endpoints this is a moot field, so I'd agree with making it optional. Keeping it around would also prevent clobbering of directories with files if the output type is unclear or in case of some errors.