Skip to content

Commit

Permalink
revisions to definitions and update to yaml
Browse files Browse the repository at this point in the history
  • Loading branch information
Ssandor13 committed May 16, 2024
1 parent e05a3ee commit 8b2fc11
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 21 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -14,18 +14,18 @@ Below are the set of tags which may exist for any given file in St. Jude Cloud.

| Property | Description |
| -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `file_path` | The path to the file in your St. Jude Cloud. |
| `file_path` | The path to the file in your St. Jude Cloud project. |
| `file_id` | A unique identifier for the file on DNAnexus, you can see this value listed as “ID” in the DNAnexus user interface. |
| `subject_name` | A unique subject identifier assigned internally at St. Jude.|
| `sample_name` | A unique sample identifier assigned internally at St. Jude. |
| `sample_type` | One of Autopsy, Cell line, Diagnosis, Germline, Metastasis, Relapse, or Xenograft.|
| `sequencing_type` | Whether the file was generated from Whole Genome (WGS), Whole Exome (WES), or RNA-Seq. |
| `file_type` | File type to be rendered in the St. Jude Cloud Genomics Platform data browser. Note that indices will be marked as the file type they accompany, and this is because these files will be vended together in our data browser. If you wish to be able to distinguish these file types, please parse the `file_path`. |
| `file_type` | Specifies the type of file. Note that index files will be labeled as the file type they accompany and will automatically be selected together in our data browser. If you wish to distinguish between the two in your project, please parse the `file_path` where index files are appended with an additional string, such as `.bai`. |
| `description` | Optional field that may contain additional file information. |
| `file_size` | The size of the file in bytes, not exceeding 12 integers. |
| `sj_dataset_accession` | The permanent accession number assigned in St. Jude Cloud. |
| `sj_dataset_accession` | The permanent accession number assigned to a dataset in St. Jude Cloud. |
| `sj_embargo_date` | The [embargo date](../../requesting-data/glossary/#embargo-date), which specifies the first date which the files can be used in a publication. |
| `sj_long_disease_name` | The complete written name of the sj_diseases diagnosis or disease identifier - see sj_diseases for important information. For more information about our ontology go [here](https://university.stjude.cloud/docs/genomics-platform/about-our-data/ontology).|
| `sj_long_disease_name` | The complete written name of the disease associated with the disease code store in the `sj_disease` attribute. For more information about our ontology go [here](https://university.stjude.cloud/docs/genomics-platform/about-our-data/disease-ontology).|
| `attr_age_at_diagnosis` | Age at first diagnosis. This field is normalized as a decimal value. If empty, the physician or research team did not indicate a value for this field. |
| `attr_diagnosis` | Unharmonized primary diagnosis as reported by the lab or PI upon submission of data to St. Jude Cloud. |
| `attr_sex` | Self-reported sex. |
Expand All @@ -41,13 +41,13 @@ Below are the set of tags which may exist for any given file in St. Jude Cloud.
| `sj_pub_accessions` | The related St. Jude Cloud accession number(s), if the file was associated with a paper(s). These group the files into publications as displayed on the Genomics Platform data browser. |
| `sj_pmid_accessions` | The related [Pubmed][pubmed] accession number, if the file was associated with a paper. |
| `attr_subtype_biomarkers` | A molecular mutation, SV or fusion event associated with a particular disease subtype that is used to define membership in that subtype. |
| `sj_associated_diagnoses` | This list captures each diagnosis across all samples associated with the provided subject_name.|
| `attr_germline_sample` | The germline sample that was used when creating the Somatic VCF file, if applicable. |
| `sj_associated_diagnoses` | List of all available associated diagnoses for the subject (from the tumor samples or from a patient's clinical history.|
| `attr_germline_sample` | The paired germline sample that was used when creating the Somatic VCF file, if applicable. |
| `attr_diagnosis_group` | Each file is categorized into one of five diagnosis groups based on the type of tumor - hematologic malignancy, solid tumor, brain tumor, germ cell tumor, or not applicable (for germline samples). |
| `sj_ega_accessions` | The related [EGA][ega] accession number, if the file was associated with a paper. |
| `sj_access_unit` | Lists which Data Access Unit(s) (DAU) the file belongs to. For more on Data Access Units, see here. (https://university.stjude.cloud/docs/genomics-platform/about-our-data/dau-and-datasets/#data-access-unit) |
| `sj_diseases` | If your data request was process after August 18, 2020, the field should be interpreted as the harmonized St. Jude Cloud diagnosis based on the best available information (data provided by the lab or PI and followup by scientists on the St. Jude Cloud team). If your data request was processed before August 18, 2020, this field should be interpreted as the disease identifier assigned at the time of genomic sequencing (keyly, the diagnosis known at the time of genomic testing may not be the best available information). **If your data request was processed after August 18, 2020 and you'd like to use the most up to date, harmonized diagnosis**, we recommend using `sj_diseases` when including diagnosis in your analysis. If your data request was made before this time *or* if you wish to use the values exactly as provided by the lab or PI, we recommend using the lab-provided value in `attr_diagnosis`. For more information about our ontology go [here](https://university.stjude.cloud/docs/genomics-platform/about-our-data/ontology). |
| `sj_datasets` | The datasets in the data browser which this file is associated with. |
| `sj_access_unit` | Lists which Data Access Unit (DAU) the file belongs to. For more on Data Access Units, see here. (https://university.stjude.cloud/docs/genomics-platform/about-our-data/dau-and-datasets/#data-access-unit) |
| `sj_diseases` | If your data request was process after August 18, 2020, the field should be interpreted as the harmonized St. Jude Cloud diagnosis based on the best available information (data provided by the lab or PI and followup by scientists on the St. Jude Cloud team). If your data request was processed before August 18, 2020, this field should be interpreted as the disease identifier assigned at the time of genomic sequencing (keyly, the diagnosis known at the time of genomic testing may not be the best available information). **If your data request was processed after August 18, 2020 and you'd like to use the most up to date, harmonized diagnosis**, we recommend using `sj_diseases` when including diagnosis in your analysis. If your data request was made before this time *or* if you wish to use the values exactly as provided by the lab or PI, we recommend using the lab-provided value in `attr_diagnosis`. For more information about our disease ontology go [here](https://university.stjude.cloud/docs/genomics-platform/about-our-data/disease-ontology). |
| `sj_datasets` | The dataset(s) in the data browser which this file is associated with. |
| `sj_pipeline_name` | Specifies which specific version of the pipeline was used when generating the file. |
| `attr_tissue_preservative` | The preservation method used for the tissue sample, with two options: FFPE (formalin-fixed, paraffin-embedded) or Fresh/Frozen. |
| `attr_lab_strandedness` | Lab reported strandedness of RNA-seq data. |
Expand All @@ -58,13 +58,14 @@ Below are the set of tags which may exist for any given file in St. Jude Cloud.
!!!note
During the release of the St. Jude Cloud paper, we undertook a massive effort to curate and harmonize diagnosis values within St. Jude Cloud. We provide two values for diagnosis, and you should select carefully which value you use based on your use case:

1. `sj_diseases`, which, since August 18, 2020, represents the harmonized diagnosis value curated by scientists on the St. Jude Cloud team (before that time it represented the diagnosis known at time of sequencing). For more information about our ontology go [here](https://university.stjude.cloud/docs/genomics-platform/about-our-data/ontology).
1. `sj_diseases`, which, since August 18, 2020, represents the harmonized diagnosis value curated by scientists on the St. Jude Cloud team (before that time it represented the diagnosis known at time of sequencing). For more information about our disease ontology go [here](https://university.stjude.cloud/docs/genomics-platform/about-our-data/disease-
ontology).
2. `attr_diagnosis`, which contains the unharmonized diagnosis value directly as it was submitted to us from the lab or PI.

**If your data request was processed after August 18, 2020 and you'd like to use the most up to date, harmonized diagnosis**, we recommend using `sj_diseases` field. If your data request was made before this time *or* if you wish to use the values exactly as provided by the lab or PI, we recommend using the value in `attr_diagnosis`. For more information about our ontology go [here](https://university.stjude.cloud/docs/genomics-platform/about-our-data/ontology).
**If your data request was processed after August 18, 2020 and you'd like to use the most up to date, harmonized diagnosis**, we recommend using `sj_diseases` field. If your data request was made before this time *or* if you wish to use the values exactly as provided by the lab or PI, we recommend using the value in `attr_diagnosis`. For more information about our disease ontology go [here](https://university.stjude.cloud/docs/genomics-platform/about-our-data/disease-ontology).
!!!

The `SAMPLE_INFO.txt` file that comes with your data request will contain the list of associated harmonized diagnosis codes (`sj_diseases`) for each sample. These codes represent the harmonized diagnosis values curated by the St. Jude Cloud team and reflect the most up to date information about the sample. For more information about our full ontology, please navigate to our [St. Jude Cloud Ontology section](https://university.stjude.cloud/docs/genomics-platform/about-our-data/ontology) to read our white paper and access our downloadable disease ontology.
The `SAMPLE_INFO.txt` file that comes with your data request will contain the list of associated harmonized diagnosis codes (`sj_diseases`) for each sample. These codes represent the harmonized diagnosis values curated by the St. Jude Cloud team and reflect the most up to date information about the sample. For more information about our full disease ontology, please navigate to our [St. Jude Cloud Disease Ontology section](https://university.stjude.cloud/docs/genomics-platform/about-our-data/ontology) to read our white paper and access our downloadable disease ontology.


[pubmed]: https://www.ncbi.nlm.nih.gov/pubmed/
Expand Down
14 changes: 7 additions & 7 deletions docs/genomics-platform/about-our-data/ontology/index.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
---
title: St. Jude Cloud Ontology
title: St. Jude Cloud's Disease Ontology
---

# The St. Jude Cloud Pediatric Cancer Classification Ontology: An Evolving Framework

Click [here](https://permalinks.stjude.cloud/permalinks/st-jude-cloud-ontology) to download the full St. Jude Cloud custom ontology (v0).
Click [here](https://permalinks.stjude.cloud/permalinks/st-jude-cloud-ontology) to download the full St. Jude Cloud custom disease ontology (v0).

## Introduction:

Ontologies designed for disease classification have redefined our understanding of diseases by providing a hierarchical structure of complex biomedical data. In cancer research, they are critical for data sharing, integration, and collaboration among researchers. However, existing ontologies on pediatric cancer classification are limited. The World Health Organization (WHO) and OncoTree primarily focus on adult cancers while leaving gaps in many pediatric cancer subtypes driven by molecular etiology presented in recent scientific literature. To enable data sharing and integration of the whole-genome, whole-exome and RNA-seq data generated from 13,956 cases of pediatric cancer and long-term survivors on St. Jude Cloud, we recognized the significance of such gaps and initiated the development of a tailored ontology to address this issue.
Ontologies designed for disease classification have redefined our understanding of diseases by providing a hierarchical structure of complex biomedical data. In cancer research, they are critical for data sharing, integration, and collaboration among researchers. However, existing ontologies on pediatric cancer classification are limited. The World Health Organization (WHO) and OncoTree primarily focus on adult cancers while leaving gaps in many pediatric cancer subtypes driven by molecular etiology presented in recent scientific literature. To enable data sharing and integration of the whole-genome, whole-exome and RNA-seq data generated from 13,956 cases of pediatric cancer and long-term survivors on St. Jude Cloud, we recognized the significance of such gaps and initiated the development of a tailored disease ontology to address this issue.

## Principles:

Expand Down Expand Up @@ -54,7 +54,7 @@ To date, the development was primarily motivated by omics data that was being up

- There's a notable shift in classifying diffuse intrinsic pontine glioma to midline glioma, reflecting evolving understanding and diagnostic criteria noted by the WHO CNS5 guidelines.

- Additionally, our ontology's inclusion of modifiers such as anaplastic or diffuse diverges from the recent WHO CNS5 classification updates for grading, particularly concerning tumors like astrocytoma and glioblastoma.
- Additionally, our disease ontology's inclusion of modifiers such as anaplastic or diffuse diverges from the recent WHO CNS5 classification updates for grading, particularly concerning tumors like astrocytoma and glioblastoma.

**Review of Embryonal Tumors<sup>6</sup>:**

Expand All @@ -67,13 +67,13 @@ To date, the development was primarily motivated by omics data that was being up
- Explore merging subtypes such as osteoblastic osteosarcoma and chondroblastic osteosarcoma under the umbrella of osteosarcoma, aligning with evolving research insights.

## Conclusion
Our ontology is integral to various applications within St. Jude Cloud, driving initiatives like the Genomics Platform and Pediatric Knowledge Base (PeCan). However, its growth and effectiveness rely on community involvement. The current ontology framework has been developed with the input from pathologists and researchers involved in molecular subtyping. We welcome additional input and collaboration from researchers and clinicians to ensure its ongoing improvement and relevance to pediatric oncology, ultimately contributing to better outcomes for children facing cancer and catastrophic diseases.
Our disease ontology is integral to various applications within St. Jude Cloud, driving initiatives like the Genomics Platform and Pediatric Knowledge Base (PeCan). However, its growth and effectiveness rely on community involvement. The current ontology framework has been developed with the input from pathologists and researchers involved in molecular subtyping. We welcome additional input and collaboration from researchers and clinicians to ensure its ongoing improvement and relevance to pediatric oncology, ultimately contributing to better outcomes for children facing cancer and catastrophic diseases.


**Contact:** For inquiries, collaborative opportunities, or to provide feedback on improving the St. Jude Cloud ontology, please contact support@stjude.cloud.
**Contact:** For inquiries, collaborative opportunities, or to provide feedback on improving the St. Jude Cloud disease ontology, please contact support@stjude.cloud.

![](./hierarchy.png)
**Figure 1: St. Jude Cloud Ontology.** High-level overview of the ontology that supports applications in St. Jude Cloud.
**Figure 1: St. Jude Cloud Disease Ontology.** High-level overview of the ontology that supports applications in St. Jude Cloud.

### References

Expand Down
4 changes: 2 additions & 2 deletions src/config/docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ domains:
path: /docs/genomics-platform/about-our-data/file-formats-and-sequencing/
- title: "Metadata and Clinical Information"
path: /docs/genomics-platform/about-our-data/metadata-and-clinical/
- title: "St. Jude Cloud's Ontology"
path: /docs/genomics-platform/about-our-data/ontology/
- title: "St. Jude Cloud's Disease Ontology"
path: /docs/genomics-platform/about-our-data/disease-ontology/
- title: "Managing Data"
collapsable: true
pages:
Expand Down

0 comments on commit 8b2fc11

Please sign in to comment.