-
Notifications
You must be signed in to change notification settings - Fork 403
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update: TERRA Workflow Documentation
Document 3 WDL workflows to be run on Terra: 1. ncov:master - run the basic ncov workflow 2. ncov:wdl/genbank_ingest - pull a public dataset and send them through our preprocessing scripts. 3. ncov:wdl/gisaid_ingest - pull a private dataset if a user has their own API key, account, and password. Mostly to make available our preprocessing scripts. The workflows are separated so that only parameters specific to a particular usecase are shown in Terra.
- Loading branch information
j23414
committed
Sep 12, 2022
1 parent
de1b4f7
commit c81a837
Showing
5 changed files
with
117 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
************************** | ||
Run Data Ingest on Terra | ||
************************** | ||
|
||
We have provided two pipelines for importing data into Terra: | ||
|
||
* GenBank Ingest - pull a public dataset and send them through our preprocessing scripts. | ||
* GISAID Ingest - pull a private dataset if a user has their own API key, account, and password. | ||
|
||
The pipelines were mainly motivated to provide access to our data pre-processing scripts. Currently, these are focused on pulling datasets for the ncov workflow and is roughly diagramed below: | ||
|
||
.. image:: ../images/terra-ingest.png | ||
|
||
|
||
Import ``ingest`` wdl workflow from Dockstore | ||
============================================= | ||
|
||
1. `Setup a Terra account <https://terra.bio/>`_ | ||
#. Navigate to Dockstore: `ncov:wdl/genbank_ingest`_ or `ncov:wdl/gisaid_ingest`_ depending on if you wish to pull open (genbank) data or private (and have an gisaid api key) data. | ||
#. Top right corner, under **Launch with**, click on **Terra** | ||
#. Under "Workflow Name" set a name, such as ``genbank_ingest`` or ``gisaid_ingest``, and select your "Destination Workspace" in the drop down menu. | ||
#. Click button **IMPORT** | ||
#. In your workspace, click on the **WORKFLOWS** tab and verify that the imported workflow is showing a card | ||
|
||
.. _`ncov:wdl/genbank_ingest`: https://dockstore.org/workflows/github.com/nextstrain/ncov:wdl/genbank_ingest?tab=info | ||
.. _`ncov:wdl/gisaid_ingest`: https://dockstore.org/workflows/github.com/nextstrain/ncov:wdl/gisaid_ingest?tab=info | ||
|
||
Create Terra Variables for GISAID API | ||
===================================== | ||
|
||
If you are pulling GISAID data you must have your own API key. If you are pulling GenBank data (open), click on your imported "genbank_ingest" and skip to step 6. | ||
|
||
1. Navigate to your workspace on Terra | ||
#. On the **Data** tab, from the left menu click **Workspace Data** | ||
#. Create and fill in values for the following workspace variables: | ||
|
||
+-----------------------------+----------------------------+-----------------------------------------------+ | ||
|Key | Value | Description | | ||
+=============================+============================+===============================================+ | ||
|GISAID_API_ENDPOINT | url api enpoint value here | Provided by GISAID for your account | | ||
+-----------------------------+----------------------------+-----------------------------------------------+ | ||
|GISAID_USERNAME_AND_PASSWORD | username:password | Your GISAID username password for api access | | ||
+-----------------------------+----------------------------+-----------------------------------------------+ | ||
|
||
Connect your workspace variables to the wdl ingest workflow | ||
=========================================================== | ||
|
||
1. Navigate back to the **Workflow** tab, and click on your imported "gisaid_ingest" workflow | ||
#. Click on the radio button "Run workflow(s) with inputs defined by data table" | ||
#. Under **Step 1**, select your root entity type **ncov_examples** from the drop down menu. | ||
#. ONLY select the 1st entry in the data table. We only want to run this once. | ||
#. Most of the values will be blank but fill in the values below: | ||
|
||
+-----------------+-------------------------------+-------+----------------------------------------+ | ||
|Task name | Variable | Type | Attribute | | ||
+=================+===============================+=======+========================================+ | ||
|Nextstrain_WRKFLW| GISAID_API_ENDPOINT | String| workspace.GISAID_API_ENDPOINT | | ||
+-----------------+-------------------------------+-------+----------------------------------------+ | ||
|Nextstrain_WRKFLW| GISAID_USERNAME_AND_PASSWORD | String| workspace.GISAID_USERNAME_AND_PASSWORD | | ||
+-----------------+-------------------------------+-------+----------------------------------------+ | ||
|
||
6. Click on the **OUTPUTS** tab | ||
#. Connect your generated output back to the workspace data, but filling in values: | ||
|
||
+-----------------+------------------+-------+----------------------------------+ | ||
|Task name | Variable | Type | Attribute | | ||
+=================+==================+=======+==================================+ | ||
|Nextstrain_WRKFLW| sequences_fasta | File | workspace.gisaid_sequences_fasta | | ||
+-----------------+------------------+-------+----------------------------------+ | ||
|Nextstrain_WRKFLW| metadata_tsv | File | workspace.gisaid_metadata_tsv | | ||
+-----------------+------------------+-------+----------------------------------+ | ||
|Nextstrain_WRKFLW| nextclade_tsv | File | workspace.gisaid_nextclade_tsv | | ||
+-----------------+------------------+-------+----------------------------------+ | ||
|
||
If you are pulling GenBank data, use something like ``workspace.genbank_sequences_fasta`` instead. | ||
|
||
8. Click on **Save** then click on **Run Analysis** | ||
#. Under the tab **JOB HISTORY**, verify that your job is running. | ||
#. When run is complete, check the **DATA** / **Workspace Data** tab and use the "workspace.gisaid_sequences_fasta" and "workspace.gisaid_metadata.tsv" during normal ncov Terra runs. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters