Skip to content

Commit

Permalink
improvement: multiple improvements to docs
Browse files Browse the repository at this point in the history
  • Loading branch information
claymcleod committed Dec 28, 2020
1 parent a4e9fe3 commit 75c724c
Show file tree
Hide file tree
Showing 27 changed files with 108 additions and 274 deletions.
4 changes: 2 additions & 2 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
{
"editor.formatOnSave": true
}
"editor.formatOnSave": false
}
46 changes: 24 additions & 22 deletions docs/citing-stjude-cloud/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,23 +13,25 @@ The St. Jude Cloud manuscript is currently under review. Until further notice, w
2. Cite the relevant paper for each dataset and/or resource that you used in your study (see ‘Dataset’ and ‘Resource’ reference tables below)

3. State in the **Results** and/or **Methods** section that the relevant data and/or resource was obtained from St. Jude Cloud. Example statement:
>"Whole genome sequencing data for relapse tumor samples from 345 pediatric patients were obtained from St. Jude Cloud."

> Whole genome sequencing data for relapse tumor samples from 345 pediatric patients were obtained from St. Jude Cloud.
4. State in the **Data availability** section of the manuscript that data and/or resource can be accessed via St. Jude Cloud. Example statement:
>"Whole genome sequencing data for pediatric relapse tumor samples used for analysis in this study were obtained from St. Jude Cloud (https://www.stjude.cloud) – a publicly accessible pediatric genomic data resource requiring approval for controlled data access."

> Whole genome sequencing data for pediatric relapse tumor samples used for analysis in this study were obtained from St. Jude Cloud (https://www.stjude.cloud) – a publicly accessible pediatric genomic data resource requiring approval for controlled data access.
## Dataset Reference Table

Please download the Schedule 1(s) (linked in table below) to find dataset specific wording of acknowledgement(s).

| St. Jude Cloud Dataset | Reference |
| -------------------------------- | ----------------- |
| Pediatric Cancer Genome Project (PCGP) dataset | [PCGP perspectives paper](https://www.ncbi.nlm.nih.gov/pubmed/22641210) and the [relevant tumor type paper(s)](http://pecan.stjude.cloud/pcgp-explore); [<i class="material-icons material-icons-sjcloud-custom">file_download</i> PCGP Schedule 1](../files/PCGP-Schedule1.pdf) |
| St. Jude Lifetime (SJLIFE) dataset | [SJLIFE paper](https://www.ncbi.nlm.nih.gov/pubmed/?term=29847298); [<i class="material-icons material-icons-sjcloud-custom">file_download</i> SJLIFE Schedule 1](../files/SJLIFE-Schedule1.pdf) |
| Clinical Genomics (Clinical Pilot, Genomes for Kids, Real-Time Clinical Genomics) dataset | [Clinical Pilot paper](https://www.ncbi.nlm.nih.gov/pubmed/30262806); [<i class="material-icons material-icons-sjcloud-custom">file_download</i> Clinical Genomics Schedule 1](../files/ClinGen-Schedule1.pdf) |
| Sickle Cell Genome Project (SGP) dataset | paper in progress; [<i class="material-icons material-icons-sjcloud-custom">file_download</i> SGP Schedule 1](../files/SGP-Schedule1.pdf) |
| Childhood Cancer Survivor Study (CCSS) dataset | [CCSS study design paper](https://www.ncbi.nlm.nih.gov/pubmed/11920786); [<i class="material-icons material-icons-sjcloud-custom">file_download</i> CCSS Schedule 1](../files/CCSS-Schedule1.pdf) |
| Pan-Acute Lymphoblastic Leukemia (PanALL) dataset | [<i class="material-icons material-icons-sjcloud-custom">file_download</i> PanALL Schedule 1](../files/PanALL-Schedule1.pdf) |
| St. Jude Cloud Dataset | Reference |
| ----------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Pediatric Cancer Genome Project (PCGP) dataset | [PCGP perspectives paper](https://www.ncbi.nlm.nih.gov/pubmed/22641210) and the [relevant tumor type paper(s)](http://pecan.stjude.cloud/pcgp-explore); [<i class="material-icons material-icons-sjcloud-custom">file_download</i> PCGP Schedule 1](../files/PCGP-Schedule1.pdf) |
| St. Jude Lifetime (SJLIFE) dataset | [SJLIFE paper](https://www.ncbi.nlm.nih.gov/pubmed/?term=29847298); [<i class="material-icons material-icons-sjcloud-custom">file_download</i> SJLIFE Schedule 1](../files/SJLIFE-Schedule1.pdf) |
| Clinical Genomics (Clinical Pilot, Genomes for Kids, Real-Time Clinical Genomics) dataset | [Clinical Pilot paper](https://www.ncbi.nlm.nih.gov/pubmed/30262806); [<i class="material-icons material-icons-sjcloud-custom">file_download</i> Clinical Genomics Schedule 1](../files/ClinGen-Schedule1.pdf) |
| Sickle Cell Genome Project (SGP) dataset | paper in progress; [<i class="material-icons material-icons-sjcloud-custom">file_download</i> SGP Schedule 1](../files/SGP-Schedule1.pdf) |
| Childhood Cancer Survivor Study (CCSS) dataset | [CCSS study design paper](https://www.ncbi.nlm.nih.gov/pubmed/11920786); [<i class="material-icons material-icons-sjcloud-custom">file_download</i> CCSS Schedule 1](../files/CCSS-Schedule1.pdf) |
| Pan-Acute Lymphoblastic Leukemia (PanALL) dataset | [<i class="material-icons material-icons-sjcloud-custom">file_download</i> PanALL Schedule 1](../files/PanALL-Schedule1.pdf) |

!!!note
If you are unsure what dataset(s) the data that you have been vended belongs to, you can find this information in the sj_datasets column of the [SAMPLE_INFO.txt](../genomics-platform/requesting-data/about-our-data/#metadata) file.
Expand All @@ -41,17 +43,17 @@ Publishing using any of the data files _before_ the [embargo date](../genomics-p

## Resource Reference Table

| St. Jude Cloud Resource | Reference |
| -------------------------------- | ----------------- |
| ProteinPaint | [ProteinPaint paper](https://www.nature.com/articles/ng.3466) |
| GenomePaint | paper in progress |
| PeCan Pie | [PeCan Pie paper](https://genome.cshlp.org/content/29/9/1555.full) |
| ChIP-Seq Peak Calling | unpublished |
| Rapid RNA-Seq Fusion Detection | paper in progress |
| WARDEN | unpublished |
| Mutational Signatures | [Mutational Patterns paper](https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-018-0539-0) |
| cis-x | paper in progress |
| XenoCP | paper in progress |
| St. Jude Cloud Resource | Reference |
| ------------------------------ | -------------------------------------------------------------------------------------------------------- |
| ProteinPaint | [ProteinPaint paper](https://www.nature.com/articles/ng.3466) |
| GenomePaint | paper in progress |
| PeCan Pie | [PeCan Pie paper](https://genome.cshlp.org/content/29/9/1555.full) |
| ChIP-Seq Peak Calling | unpublished |
| Rapid RNA-Seq Fusion Detection | paper in progress |
| WARDEN | unpublished |
| Mutational Signatures | [Mutational Patterns paper](https://genomemedicine.biomedcentral.com/articles/10.1186/s13073-018-0539-0) |
| cis-x | paper in progress |
| XenoCP | paper in progress |

<!-- NeoepitopePred | [NeoepitopePred paper](https://www.ncbi.nlm.nih.gov/pubmed/28854978) -->

Expand Down
4 changes: 2 additions & 2 deletions docs/genomics-platform/managing-data/upload-cluster/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ For example, with a DNAnexus project named `flagstat` and a file named
$ ua --do-not-compress --project flagstat sample.1.bam
```

!!!question "Why is `--do-not-compress` always set?"
!!!question Why is `--do-not-compress` always set?
Upload Agent uses an unfortunate default where uncompressed files are
automatically gzipped. For example, uploading the text file
`samplesheet.txt` results in the file `samplesheet.txt.gz` on DNAnexus.
Expand Down Expand Up @@ -175,7 +175,7 @@ $ bsub \
<src>
```

!!!question "Where does the 2882 MiB resource requirement come from?"
!!!question Where does the 2882 MiB resource requirement come from?
There is a [note in the source code of Upload Agent][ua-main-mem] that
gives an estimate of how much memory is used for a transfer:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -142,7 +142,7 @@ The Sickle Cell Genome Project (SGP) is a collaboration between St. Jude Childre
**[CCSS](https://stjude.cloud/studies/clinical-genomics) is a germline-only dataset consisting of whole genome sequencing of childhood cancer survivors.**
CCSS is a multi-institutional, multi-disciplinary, NCI-funded collaborative resource established to evaluate long-term outcomes among survivors of childhood cancer. It is a retrospective cohort consisting of >24,000 five-year survivors of childhood cancer who were diagnosed between 1970-1999 at one of 31 participating centers in the U.S. and Canada. The primary purpose of this sequencing of CCSS participants is to identify all inherited genome sequence and structural variants influencing the development of childhood cancer and occurrence of long-term adverse outcomes associated with cancer and cancer-related therapy.

!!!warning "CCSS: Potential Bacterial Contamination"
!!!warning CCSS: Potential Bacterial Contamination

Samples for the Childhood Cancer Survivorship Study were collected by sending out Buccal swab kits to enrolled participants and having them complete the kits at home. This mechanism of collecting saliva and buccal cells for sequencing is highly desirable because of its non-invasive nature and ease of execution. However, collection of samples in this manner also has higher probability of contamination from external sources (as compared to, say, samples collected using blood). We have observed some samples in this cohort which suffer from bacterial contamination. To address this issue, we have taken the following steps:

Expand Down
2 changes: 1 addition & 1 deletion docs/genomics-platform/workflow-guides/chipseq/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ have questions, please [contact us](https://stjude.cloud/contact).
| Remove blacklist peaks | Whether or not to remove known problem areas | True |
| Fragment length | Hardcoded fragment length of your reads. 'NA' for auto-detect. | NA |

!!!caution
!!!warning
Please be aware of the following stumbling points when setting parameters:

* Do not use spaces anywhere in your input file names, your output
Expand Down
2 changes: 1 addition & 1 deletion docs/genomics-platform/workflow-guides/fkpm/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Refer to [the general workflow guide](../../analyzing-data/running-sj-workflows/

Refer to [the general workflow guide](../../analyzing-data/running-sj-workflows/#running-the-workflow) to learn how to launch the workflow, hook up input files, adjust parameters, start a run, and monitor run progress.

*!!!caution
*!!!warning
any cautionary notes specific to running this workflow*
!!!

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ MethylationToActivity (M2A) is a machine learning framework using convolutional
| WGBS data file | Input file | M-values by chromosome and position (non-standard format, see below). | *.txt (tab-delimited)|
| Promoter region definition file (*provided, or user defined*) | Input file | File describing promoter regions to be predicted. Provided regions include both hg19 and GRCh38 definitions (non-standard format, see below). | *.txt (tab-delimited) |

!!!Note "App-provided model inputs:"
!!!Note App-provided model inputs:
Model weights (.h5) file: 1) H3K27ac or 2) H3K4me3
!!!

Expand Down
4 changes: 2 additions & 2 deletions docs/genomics-platform/workflow-guides/neoepitope/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ by users.
| Gene1 | SampleA | chr10 | 106150600 | missense | R663H | NM_00101 | A | T |
| Gene2 | SampleA | chr2 | 32330151 | missense | N329N | NM_00102 | T | G |

!!!example "Notes on preparing the above file"
!!!example Notes on preparing the above file
- The chromosome requires a 'chr' prefix.
- The position requires a suffix of HG19/HG38 to indicate the human genome assembly version.
- Only the missense mutations/gene fusion is supported currently and the other types of mutations will not be processed.
Expand Down Expand Up @@ -139,7 +139,7 @@ Refer to [the general workflow guide](../../analyzing-data/running-sj-workflows/

Refer to [the general workflow guide](../../analyzing-data/running-sj-workflows/#running-the-workflow) to learn how to launch the workflow, hook up input files, adjust parameters, start a run, and monitor run progress.

!!!caution
!!!warning
This pipeline assumes HG19 coordinates in the mutation file. If the
coordinates are based on HG38, the coordinates will lifted over to HG19
to perform epitope affinity prediction.
Expand Down
10 changes: 5 additions & 5 deletions docs/genomics-platform/workflow-guides/rapid-rnaseq/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ with FastQ files or a BAM file.
| Paired FastQ files | Gzipped FastQ files generated by human RNA-Seq | Sample_R1.fastq.gz and Sample_R2.fastq.gz |
| BAM file | Aligned reads file from human RNA-Seq | Sample.bam |

!!!caution
!!!warning
If you provide a BAM file to the pipeline, it **must** be aligned to GRCh37-lite.
Running a BAM aligned to any other reference genome is not supported. Maybe more
importantly, we do not check the genome build of the BAM, so errors in computation
Expand Down Expand Up @@ -125,7 +125,7 @@ You can navigate to the Rapid RNA-Seq workflow page [here](https://platform.stju

## Uploading Input Files

!!!caution
!!!warning
This pipeline assumes GRCh37-lite coordinates. If your BAM is
*not* aligned to this genome build, we recommend converting the BAM
back to FastQ files using [Picard's SamToFastq](https://broadinstitute.github.io/picard/command-line-overview.html#SamToFastq)
Expand Down Expand Up @@ -193,15 +193,15 @@ of this guide. Here, we will discuss each of the different output files in more

## Known issues

!!!caution "Adapter contamination"
!!!warning Adapter contamination
This pipeline does not, at present, remove adapter sequences. If the
sequencing library is contaminated with adapters, CICERO runtimes can
increase exponentially. We recommend running FastQ files through a QC
pipeline such as FastQC and trimming adapters with tools such as
Trimmomatic if adapters are found.
!!!

!!!caution "High coverage regions"
!!!warning High coverage regions
Certain cell types show very high transcription of certain loci, for
example, the immunoglobulin heavy chain locus in plasma cells. The
presence of very highly covered regions (typically 100,000-1,000,000+ X)
Expand All @@ -210,7 +210,7 @@ solution to this problem as strategies such as down-sampling may reduce
sensitivity over important regions of the genome.
!!!

!!!bug "Interactive Visualizations Exon vs Intron Nomenclature"
!!!bug Interactive Visualizations Exon vs Intron Nomenclature
When a codon is split over a fusion gene junction, the annotation
software marks the event as intronic when really, the event should be
exonic. We are working to fix this bug. In the mean time, if a fusion is
Expand Down
2 changes: 1 addition & 1 deletion docs/genomics-platform/workflow-guides/rna-indel/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ Refer to [the general workflow guide](../../analyzing-data/running-sj-workflows/

Refer to [the general workflow guide](../../analyzing-data/running-sj-workflows/#running-the-workflow) to learn how to launch the workflow, hook up input files, adjust parameters, start a run, and monitor run progress.

*!!!caution
*!!!warning
any cautionary notes specific to running this workflow*
!!!

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ with a counts file or a BAM file.
| BAM file | Aligned reads file from human RNA-Seq | Sample.bam |
| Counts file | htseq-count output feature counts file from human RNA-Seq | Sample.counts.txt |

!!!caution
!!!warning
If you provide counts data to the counts-based pipeline,
it **must** be aligned to `GRCh38_no_alt`.
Running a sample aligned to any other reference genome is not supported. Maybe more
Expand Down Expand Up @@ -76,7 +76,7 @@ A t-Distributed Stochastic Neighbor Embedding (t-SNE) visualization is produced

### Getting Started

!!!caution
!!!warning
If you provide counts data to the counts-based pipeline,
it **must** be aligned to `GRCh38_no_alt`.
Running a sample aligned to any other reference genome is not supported. Maybe more
Expand All @@ -99,7 +99,7 @@ Reference data can also be retrieved through the [Genomics Platform Data Browser

These must then be provided to the workflow through the `reference_counts` parameter. By default, all reference files will be used by the app, but this can be restricted to one of the three tumor types [Blood, Brain, Solid] through the app settings.

!!!caution
!!!warning
The RNA-Seq Expression Classification tool does not allow the same sample name to be included more than once. If data from multiple projects is requested through St. Jude Cloud Genomics Platform, a sample may be included more than once. We offer an opinionated deduplication method at https://github.com/stjudecloud/utilities.
!!!

Expand Down Expand Up @@ -208,7 +208,7 @@ for batch effect based on strandedness of the RNA-Seq sample, library type, read

There are a few known cautions with the RNA-Seq Expression Classification workflow.

!!!caution "Data must fit well defined values"
!!!warning Data must fit well defined values
The RNA-Seq Expression Classification pipeline reference data is based on GRCh38 aligned, Gencode v31 annotated samples from fresh, frozen tissue samples. It has not been evaluated for samples that do not meet this criteria.

The RNA-Seq Expression Classification pipeline reference data uses sequencing data from fresh, frozen
Expand Down
4 changes: 2 additions & 2 deletions docs/genomics-platform/workflow-guides/sequencerr/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ By sequencing a common DNA library on different sequencers, we demonstrate that
* This app currently only supports DNA sequencing.
* The bam file should be generated by “bwa aln”. It works on “bwa MEM” but may take a lot more resources.

!!!example "Notes on preparing the BAM file"
!!!example Notes on preparing the BAM file
- Read names must have all the 7 fields as described below,
- **\[instrument]:\[run number]:\[flowcell ID]:\[lane]:\[tile]:\[x-pos]:\[y-pos]**
- Example: **A041:30:HHTYVDSXX:1:2242:28366:18897**
Expand Down Expand Up @@ -94,7 +94,7 @@ Please refer to the following steps to learn how to launch the workflow, hook up

![](./Sequencerr_log_in_1.png)

!!!caution "For Manuscript Reviewers only"
!!!warning For Manuscript Reviewers only
Please select **_DNAnexus account_** option to log in.

Follow the stepwise instructions available in the **PDF [here](./SequencErr_Instructions.pdf)** .
Expand Down

0 comments on commit 75c724c

Please sign in to comment.