Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: initial support for ftp locations #25

Merged
merged 51 commits into from
Oct 26, 2018
Merged

WIP: initial support for ftp locations #25

merged 51 commits into from
Oct 26, 2018

Conversation

mr-c
Copy link
Contributor

@mr-c mr-c commented Aug 17, 2018

Current status: 121 tests passed, 4 failures, 8 unsupported features

  • separate storage of inputs, intermediate products, and final outputs
  • test for correct handling of file naming conflicts
  • general cleanup

Copy link
Contributor

@adamstruck adamstruck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome thanks for adding this! This gives me a better idea of how we could support S3

@fgypas
Copy link

fgypas commented Aug 21, 2018

Will this solve the issue #26 ? Can you add an ftp example?

@mr-c
Copy link
Contributor Author

mr-c commented Aug 22, 2018

@fgypas That's the goal, but I'm having issues testing

@mr-c
Copy link
Contributor Author

mr-c commented Aug 23, 2018

[job bwa-mem-tool.cwl] Removing temporary directory /tmp/tmp0gyepo6l
Removing intermediate output directory /tmp/tmpyc7cwp6s
{
    "args": [
        "bwa",
        "mem",
        "-t",
        "2",
        "-I",
        "1,2,3,4",
        "-m",
        "3",
        "chr20.fa",
        "example_human_Illumina.pe_1.fastq",
        "example_human_Illumina.pe_2.fastq"
    ]
}
Final process status is success

@fgypas
Copy link

fgypas commented Sep 5, 2018

Hi. I would like to ask what is the status of this pull request. We would really need to have this working at the end of the month. Please let me know if there is anything I could do to help.

@aniewielska
Copy link

Hi. I second the request by @fgypas. What is the status of remote storage support? How to run it? Could you provide an example with FTP? I tried hashsplitter, passing --remote-storage-url, but without success.

@mr-c
Copy link
Contributor Author

mr-c commented Sep 5, 2018

Hello @fgypas and @aniewielska
Here are the latest results with the conformance tests being run against the EBI TASK endpoint:

32 tests passed, 87 failures, 13 unsupported features

The complete log: log-20180825.txt

To run the CWL conformance tests, check out the CWL specification https://github.com/common-workflow-language/common-workflow-language/

pip install cwltest html5lib
./run_test.sh RUNNER=cwl-tes EXTRA="--remote-storage-url ftp://ftp-private.ebi.ac.uk/upload/cwl_conf002 --tes https://tes-dev.tsi.ebi.ac.uk/"

You'll need the username and password set in ${HOME}/.netrc see https://www.gnu.org/software/inetutils/manual/html_node/The-_002enetrc-file.html
Mine looks like

machine ftp-private.ebi.ac.uk
login redacted_username
password redacted_password

@aniewielska
Copy link

Thanks. I figured out .netrc part. Should Hashsplitter work, though, with remote FTP storage? If not, is there any other simple at least 2-step workflow (that will use intermediate remote storage, if configured, or local otherwise), that I can use to check, that I am using CWL-TES correctly?

@mr-c
Copy link
Contributor Author

mr-c commented Sep 6, 2018

@aniewielska Hashsplitter and friends should work, but it doesn't due to a limitation in cwltool in how file sizes are calculated; which I'm working on now.

Here's how to run it

cwl-tes --remote-storage-url ftp://ftp-private.ebi.ac.uk/upload/some_unique_name --tes https://tes-dev.tsi.ebi.ac.uk/ tests/hashsplitter-md5.cwl.yml --input tests/resources/test.txt 

@mr-c
Copy link
Contributor Author

mr-c commented Sep 6, 2018

@aniewielska The hashsplitter tools and workflows work for me now:

cwl-tes  --remote-storage-url ftp://ftp-private.ebi.ac.uk/upload/unique_directory_002 \
    --tes https://tes-dev.tsi.ebi.ac.uk/ tests/hashsplitter-workflow.cwl.yml \
    --input tests/resources/test.txt 

@mr-c
Copy link
Contributor Author

mr-c commented Sep 18, 2018

@adamstruck The remaining errors are related to TES's poor support for empty directories. Shall I merge as in?

@aniewielska
Copy link

I ran the test suite yesterday (without the most recent commits) and I got:

110 tests passed, 11 failures, 12 unsupported features 1 tool tests failed

From the TESK perspective there were 7 tasks failing on inputs. In 5 cases TES received inputs with URLs referencing local file system. In 2 cases passed URL was malformed (file/ftp mix):

{"url": "file://ftp%3A//ftp-private.ebi.ac.uk/upload/anie/test-1/7f123688-c414-4a46-b9ea-087570c60112/ref.fasta", "path": "/SesUIM/ref.fasta", "type": "FILE", "name": "ref.fasta", "description": "InitialWorkDirRequirement:cwl_input:ref.fasta"}

@adamstruck
Copy link
Contributor

@mr-c the problem with empty directories IMO is that there isn't a way to represent it in an object store (without using placeholder files).

From the TES perspective, the spec says nothing with regards to how to handle an input/output directory that is empty. So its possible Funnel and TESK have different behaviors around these cases.

@adamstruck
Copy link
Contributor

@mr-c prior to merging this I think it would be good to update the Travis build so that it runs the conformance suite against local files and FTP.

We have some test code in Funnel (https://github.com/ohsu-comp-bio/funnel/tree/master/tests/ftp-test-server) for starting up a FTP server for testing that could be reused.

@geoffjentry
Copy link

What we were told when working on CWL in GCP (and thus object storage) was to add a dotfile to the directory so that it shows up in object storage as a "directory".

IMO that's a terrible hack and frankly a bug in CWL (and soon to be replicated in WDL as well) that it doesn't neatly handle the impedance mismatch between POSIX & object storage, but it's not like I have any better idea on how to solve it so c'est la vie. Dotfiles it is :)

With my GA4GH hat on I personally would oppose things which are not object storage friendly (it is the Cloud Workstream after all).

@mr-c
Copy link
Contributor Author

mr-c commented Sep 20, 2018

For my reference, the failing test numbers are:

  • 55 (deeper problem with schema_salad with non-file or http URLs)
  • 60
  • 84
  • 85
  • 86 Test directory output
  • 87
  • 88
  • 91
  • 93
  • 101 ExpressionTool destaging
  • 121 globbing with "*"
  • 122 Test InitialWorkDirRequirement with a nested directory structure from another step
  • 130 Test dynamic resource reqs referencing the size of Files inside a Directory

@mr-c mr-c merged commit 4d7e2e5 into master Oct 26, 2018
@mr-c mr-c deleted the ftp branch October 26, 2018 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants