Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with imported files "below" or "cousins" of the primary descriptor #243

Open
denis-yuen opened this issue Jul 12, 2023 · 3 comments
Open
Milestone

Comments

@denis-yuen
Copy link
Member

denis-yuen commented Jul 12, 2023

See discussion at dockstore/dockstore#5594
We believe this is a general issue for TRS.

i.e. workflows with a structure like

workflows
  |- wgs
    | - main.wdl
tasks
  |- do-stuff.wdl
structs
  | - human
    | - sample.wdl

Affects
"/tools/{id}/versions/{version_id}/{type}/descriptor/{relative_path}":

┆Issue is synchronized with this Jira Story
┆Project Name: Zzz-ARCHIVE GA4GH tool-registry-service
┆Issue Number: TRS-67

@denis-yuen denis-yuen added this to the v2.1 milestone Jul 12, 2023
@uniqueg
Copy link
Collaborator

uniqueg commented Jul 17, 2023

Yes, nasty problem. Apart from the fact that it's impossible to recreate the original directory tree of a workflow with multiple descriptors (or, more generally, files, as non-descriptor files are affected, too), allowing ".." may have security implications as well.

Might be hard to come up with a non-breaking fix here, though given that the relative path behavior seems to be only explained in the descriptions, we might be able to just get away with rephrasing these such that the anchoring for relative paths should be the nearest parent directory for all files (e.g., the repository root). It'd still break existing clients though...

But if we're discussing breaking changes, we could also re-design the whole file handling to address the following issues:

One possibility could be to do away with the /descriptor, /tests and /containerfile endpoints in favor of a redesigned /files endpoint that makes use of HATEOAS (see #160) and uses file IDs instead of relative paths to fetch info about (ToolFile) and/or contents of (FileWrapper) individual files. It might also make the API less complex.

For example, I could hit /files to get an object of file ID keys and ToolFile values, as well as a _links section that points me to /files/{file_id} for each file to access the corresponding FileWrapper (or plain text). The bahvior for zip=True on /files could remain the same (i.e., fetch an archive of all files). For convenience, we could consider adding filters for file types (PRIMARY_DESCRIPTOR, TEST_FILE, CONTAINERFILE etc.).

@denis-yuen
Copy link
Member Author

denis-yuen commented Jul 17, 2023

One possibility could be to do away with the /descriptor, /tests and /containerfile endpoints in favor of a redesigned /files endpoint that makes use of HATEOAS (see #160) and uses file IDs instead of relative paths to fetch info about (ToolFile) and/or contents of (FileWrapper) individual files. It might also make the API less complex.

for the non-breaking change discussion

Still thinking it through, but one option that we have is to alter the behaviour of the zipped endpoint
https://github.com/ga4gh/tool-registry-service-schemas/blob/v2.0.1/openapi/openapi.yaml#L322 so that as you say, the root of the zip is the "nearest parent directory for all files" rather than the the directory for main.wdl The reason why this is less(?) of a breaking change is because our current implementation is simply broken for this case (i.e. one could argue that we're not making the situation worse at least) The client would need to do some comparison between the zip format and the json format of the same endpoint to locate main.wdl

for the breaking change discussion

A few new endpoints does seem less disruptive to backwards compatibility, thoughts @patmagee ?

@uniqueg
Copy link
Collaborator

uniqueg commented Jul 18, 2023

I also think that the minor/non-breaking changes are acceptable, given that there probably aren't that many TRS services and clients around.

As for a re-design: Having a new /files endpoint (possibly without /{type} or similar endpoint with the described functionalities while possibly deprecating the ones that such an endpoint would make redundant would indeed be sensible, especially for such a drastic change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants