feat: new gif and other edits

stjudecloud · Mar 30, 2021 · 6190da2 · 6190da2
1 parent 21775e6
commit 6190da2
Show file tree

Hide file tree

Showing 4 changed files with 37 additions and 46 deletions.
diff --git a/docs/genomics-platform/workflow-guides/warden/index.md b/docs/genomics-platform/workflow-guides/warden/index.md
@@ -42,14 +42,14 @@ LIMMA analysis.
 Depending on which entry point is chosen, inputs may be an array of FastQ files, RNA-Seq BAM files, or feature count files.
 
 Each entrypoint has it's own input file type, but they all require a similarly formatted "sample sheet" which describes the relationships between samples.
-Each WARDEN workflow requires two types of input files and that two or 3 parameters be set manually. All other parameters are preset with reasonable defaults.
+Each WARDEN workflow requires an array of input files, a sample sheet, and has two to three parameters which must be set explicitly. All other parameters are preset with reasonable defaults.
 
-| Name                                | Type        | Description                                         | Example                                            |
-| ----------------------------------- | ----------- | --------------------------------------------------- | -------------------------------------------------- |
-| FastQ files (for WARDEN \[FastQ\])  | Input files | FastQ files generated by RNA-Seq experiment         | Sample1.fastq.gz, Sample2.fastq.gz                 |
-| BAM files (for WARDEN \[BAM\])      | Input files | BAM files generated by RNA-Seq experiment           | Sample1.bam, Sample2.bam                           |
-| Count files (for WARDEN \[Counts\]) | Input files | Feature count files generated by RNA-Seq experiment | Sample1.htseq_counts.txt, Sample2.htseq_counts.txt |
-| Sample sheet (**required**)         | Input file  | Sample sheet generated and uploaded by the user     | \*.txt or \*.xlsx                                  |
+| Name                                | Description                                         | Example                                            |
+| ----------------------------------- | --------------------------------------------------- | -------------------------------------------------- |
+| FastQ files (for WARDEN \[FastQ\])  | FastQ files generated by RNA-Seq experiment         | Sample1.fastq.gz, Sample2.fastq.gz                 |
+| BAM files (for WARDEN \[BAM\])      | BAM files generated by RNA-Seq experiment           | Sample1.bam, Sample2.bam                           |
+| Count files (for WARDEN \[Counts\]) | Feature count files generated by RNA-Seq experiment | Sample1.htseq_counts.txt, Sample2.htseq_counts.txt |
+| Sample sheet (**required**)         | Sample sheet generated and uploaded by the user     | \*.txt or \*.xlsx                                  |
 
 ### Sample sheet configuration
 
@@ -80,11 +80,11 @@ Each row in the spreadsheet (except for the last row, which we will talk about i
 
 * The sample name should be unique and should only contain letters, numbers, and underscores. They should start with a letter. WARDEN will attempt to correct malformed names.
 * The condition/phenotype column associates similar samples together. The values should contain only letters, numbers, and underscores. They should start with a letter. WARDEN will attempt to correct malformed names.
-* If using WARDEN [FastQ]:
+* If using WARDEN \[FastQ\]:
   * The third column should contain forward reads (e.g. `*.R1.fastq.gz` or `*_1.fastq.gz`).
   * The fourth column will contain reads in reverse orientation to the FastQ in column three (e.g. `*.R2.fastq.gz` or `*_2.fastq.gz`).
   * For single end reads a single dash (`-`) should be entered in the fourth column.
-* If using WARDEN [BAM] or WARDEN [Counts]:
+* If using WARDEN \[BAM\] or WARDEN \[Counts\]:
   * The third column should contain the name of the sample's BAM or counts file.
   * The fourth column is ignored and can be safely deleted or left blank.
 
@@ -101,9 +101,9 @@ This line may appear anywhere in the file, but the examples place it at the bott
 !!!example
 The following lines are all valid examples.
 
-1. `#comparisons=KO-WT`
-2. `#comparisons=Condition1-Control,Condition2-Control`
-3. `#comparisons=Phenotype2-Phenotype1,Phenotype3-Phenotype2,Phenotype3-Phenotype1`
+* `#comparisons=KO-WT`
+* `#comparisons=Condition1-Control,Condition2-Control`
+* `#comparisons=Phenotype2-Phenotype1,Phenotype3-Phenotype2,Phenotype3-Phenotype1`
 !!!
 
 !!!note
@@ -145,34 +145,32 @@ Refer to [the general workflow guide](../../analyzing-data/running-sj-workflows/
 
 ### Hooking up Inputs
 
-You'll need to hook up the FastQ files, BAM files, or count files (depending on which entrypoint you wish to use) and sample sheet you uploaded in [the upload data section](#uploading-input-files).
-Click the `FASTQ_FILES`, `BAM_FILES`, or `COUNT_FILES` input field and select **all** input files listed in your sample sheet. Next, click the `sample_list` input field and select the corresponding sample sheet.
+First, in the `Execution Output Folder` field, select a folder to output to. You can structure your experiments however you like (e.g. `/my_outputs`). If left blank, a directory named with the execution ID will be created in order to avoid cluttering your workspace and keep seperate runs seperate.
 
-![](./inputs-warden-2.gif)
+Next, you'll need to hook up the FastQ files, BAM files, or count files (depending on which entrypoint you wish to use) and sample sheet you uploaded in [the upload data section](#uploading-input-files). Click the `FASTQ_FILES`, `BAM_FILES`, or `COUNT_FILES` input field and select **all** input files listed in your sample sheet. Next, click the `sample_list` input field and select the corresponding sample sheet.
+
+Then select the `sequence_strandedness` drop down menu and choose the appropriate option. This information can be determined from the sequencing or source
+of the data. If you don't know what to put here, select "Unstranded".
+
+Finally, select the `Genome` pulldown menu, choose the appropriate option, and WARDEN is ready to be run! Continue reading to learn about the available advanced options.
+
+![](./warden-inputs.gif)
 
 ### Selecting Parameters
 
 We now need to configure the parameters for the pipeline, such as reference genome and sequencing method. For the general workflow instructions refer [here](../../analyzing-data/running-sj-workflows#selecting-parameters)
 
 !!!example Parameter setup steps
 
-1. In the `Execution Output Folder` field, select a folder to output to. You can
-structure your experiments however you like (e.g. `/My_Outputs`). If left blank, a directory named with the execution ID will be created in order to avoid cluttering your workspace and keep seperate runs seperate.
-2. Select the `sequence_strandedness` from the drop down menu.
-This information can be determined from the sequencing or source
-of the data. If you don't know what to put here, select "Unstranded".
-3. Select the `Genome` pulldown menu. Choose the appropriate box.
-4. Options under "Advanced: Run Control" can be enabled or disabled, though `generate_name_sorted_BAM`, `generate_transcriptome_BAM`, [`run_FastQC`](#quality-control-results-fastqc), and [`run_coverage`](#bigWig-viewer) are disabled by default to reduce run time and costs.
-5. `STAR_subsample_n_reads` can be used to reduce runtime and run costs. The default of 100,000,000 reads will map the entirety of many samples and is a sufficient of number of reads for differential expression analysis. Setting this value to "0" or "-1" will disable subsampling, and map the entirety of all input FastQs. With sufficiently large FastQs, this can take a long time and cost a significant amount of money. Large FastQs also occaisonally cause errors in the STAR step. If those are encountered, we recommend re-enabling subsampling or increasing the size of the `star_instance`. **Warning:** a larger STAR instance will incur larger costs.
-6. The LIMMA parameters can be left alone for most analyses. If you are
+* Options under "Advanced: Run Control" can be enabled or disabled, though `generate_name_sorted_BAM`, `generate_transcriptome_BAM`, [`run_FastQC`](#quality-control-results-fastqc), and [`run_coverage`](#bigwig-viewer) are disabled by default to reduce run time and costs.
+* `STAR_subsample_n_reads` can be used to reduce runtime and run costs. The default of 100 million reads will map the entirety of many samples and is a sufficient number of reads for differential expression analysis. Setting this value to "0" or "-1" will disable subsampling, and map the entirety of all input FastQs. With sufficiently large FastQs, this can take a long time and cost a significant amount of money. Large FastQs also occaisonally cause errors in the STAR step. If those are encountered, we recommend re-enabling subsampling or increasing the size of the `star_instance`. **Warning:** a larger STAR instance will incur larger costs.
+* The LIMMA parameters can be left alone for most analyses. If you are
 an advanced LIMMA user, you can change the various settings exposed.
-7. If you are interested in a feature besides genes, you should change the `feature_type` and `id_attribute` HTSeq-count parameters. Note that changing from the defaults will disable FPKM calculations. The other options should only be changed by advanced users of HTSeq-count.
-8. Similarly STAR parameters should only be changed by advanced users familiar with the STAR aligner. You can read the STAR v2.5.3a manual [here](https://github.com/alexdobin/STAR/blob/2.5.3a/doc/STARmanual.pdf).
-9. When all parameters have been set, you're ready to run WARDEN!
+* If you are interested in a feature besides genes, you should change the `feature_type` and `id_attribute` HTSeq-count parameters. Note that changing from the defaults will disable FPKM calculations. The other options should only be changed by advanced users of HTSeq-count. The HTSeq documentation can be found [here](https://htseq.readthedocs.io/en/master/count.html).
+* Similarly STAR parameters should only be changed by advanced users familiar with the STAR aligner. You can read the STAR v2.5.3a manual [here](https://github.com/alexdobin/STAR/blob/2.5.3a/doc/STARmanual.pdf).
+* When all parameters have been adjusted to your needs, you're ready to run WARDEN!
 !!!
 
-![](./parameters-warden-3.gif)
-
 ## Summary of Results
 
 Each tool in St. Jude Cloud produces a visualization that makes understanding results more accessible than working with excel spreadsheet or tab delimited files. This is the primary way we recommend you work with your results.
@@ -197,8 +195,7 @@ generated. An example can be seen below. These files will be labeled
 `mds_plot.limma.png`. For all comparisons, regardless of sample size, an MDS
 plot will also be generated with Counts per million (CPM) normalized
 gene counts by default. These files will be labeled `mds_plot.norm_cpm.png`.
-(Within the DNAnexus output directory structure, these files will be in
-the root directory.)
+These files will be in the root of the output directory.
 
 ![](./mdsPlot.png)
 
@@ -258,13 +255,12 @@ HTSeq-count files are combined into a file called `combined_counts.htseq.txt`. I
 
 #### Alignment statistics
 
-Several files should be examined initially to determine the quality of
-the results. **alignment_statistics.txt** shows alignment statistics for
+**alignment_statistics.txt** shows alignment statistics for
 all samples. This file is a plain text tab-delimited file that can be
 opened in Excel or a text editor such as Notepad++. This file contains
 information on the total reads per sample, the percentage of duplicate
 reads and the percentage of mapped reads. An example of this file is
-below. (Within the DNAnexus output directory structure, this file will be in the `STAR/` folder.)
+below. This file will be in the `STAR/` folder.
 
 > ![](./alignmentStatistics.png)
 
@@ -278,8 +274,7 @@ Other useful differential expression results will be created. This includes tabu
 
 #### GSEA.input.<*contrast*>.txt and GSEA.tStat.<*contrast*>.txt
 
-Input files that can be used for GSEA analysis. The tStat file is preferred for a more accurate analysis, but will not give a heatmap diagram.
-Within the DNAnexus output directory structure, these files will be in the `AUXILIARY/` directory.
+Input files that can be used for GSEA analysis. The tStat file is preferred for a more accurate analysis, but will not give a heatmap diagram. These files will be in the `AUXILIARY/` directory.
 
 #### Coverage results
 
@@ -290,10 +285,7 @@ strandedness, there will be bigWig files labeled,
 `*.sortedcoverage_file.bed.bw` where '\*' is the sample name. For
 stranded data there will also be `*.sortedPoscoverage_file.bed.bw` and
 `*.sortedNegcoverage_file.bed.bw` which contains coverage information
-for the positive and negative strand of the genome.
-
-(Within the DNAnexus output directory structure, these files will be in
-the `BIGWIG/` directory.)
+for the positive and negative strand of the genome. These files will be in the `BIGWIG/` directory.
 
 #### Quality Control Results (FastQC)
 
@@ -322,8 +314,7 @@ files can be found [here](http://labshare.cshl.edu/shares/gingeraslab/www-data/d
 files are labeled `*.Chimeric.out.bam` and
 `*.Chimeric.out.junction`.
 
-(Within the DNAnexus output directory structure, `*.SJ.out.tab` files will be in `STAR/TABS`
-and the chimeric BAMs and chimeric junction files will be in the `STAR/CHIMERIC/` directory.)
+`*.SJ.out.tab` files will be in `STAR/TABS` and chimeric BAMs and chimeric junction files will be in the `STAR/CHIMERIC/` directory.
 
 #### FPKM and count files (per sample)
 
@@ -335,19 +326,19 @@ the sample name. Counts files will be in the `HTSEQ/` directory, and FPKM files
 
 #### Workflow parameters
 
-`WARDEN_parameters.json` is the full list of parameters, including defaults, that were passed into this run of WARDEN.
+`WARDEN_parameters.json` is the full list of parameters, including defaults, that were passed into this run of WARDEN. It can be found in the `AUXILIARY/` folder.
 
 ## Rerunning analysis
 
-If you complete a WARDEN run from FastQs or BAM files and wish to change some of the final differential expresssion parameters, we recommend you use the count files already generated as input to the "WARDEN [Counts]" app. This should save you sigfnificant amounts of time and money.
+If you complete a WARDEN run from FastQs or BAM files and wish to change some of the final differential expresssion parameters, we recommend you use the count files already generated as input to the "WARDEN \[Counts\]" app. This should save you sigfnificant amounts of time and money.
 
-Similarly, if you started with FastQs and wish to rerun with different parameters to HTSeq-Count, we recommend using the previously generated BAM files as input to the "WARDEN [BAM]" app.
+Similarly, if you started with FastQs and wish to rerun with different parameters to HTSeq-Count, we recommend using the previously generated BAM files as input to the "WARDEN \[BAM\]" app.
 
 BAM files and count files are output as soon as they are created (as opposed to only appearing after a successful analysis), so if you find WARDEN has failed for any reason at a later stage, you should be able to use the already output BAMs or count files to skip rerunning the stages which completed successfully.
 
 ## Frequently Asked Questions
 
-Source code for the WARDEN apps can be found [here][https://github.com/stjude/WARDEN].
+Source code for the WARDEN apps can be found on our [GitHub](https://github.com/stjude/WARDEN).
 
 If you have any questions not covered here, feel free to reach
 out on [our contact

diff --git a/docs/genomics-platform/workflow-guides/warden/inputs-warden-2.gif b/docs/genomics-platform/workflow-guides/warden/inputs-warden-2.gif
diff --git a/docs/genomics-platform/workflow-guides/warden/parameters-warden-3.gif b/docs/genomics-platform/workflow-guides/warden/parameters-warden-3.gif
diff --git a/docs/genomics-platform/workflow-guides/warden/warden-inputs.gif b/docs/genomics-platform/workflow-guides/warden/warden-inputs.gif