Skip to content

Commit

Permalink
update links
Browse files Browse the repository at this point in the history
  • Loading branch information
Ssandor13 committed May 17, 2024
1 parent e7afcf5 commit 38ffb5a
Showing 1 changed file with 6 additions and 7 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ redirect_from:
Each data request includes a text file called `SAMPLE_INFO.txt` that provides a number of file level properties (sample identifiers, clinical attributes, etc).

### Definitions
Below are the set of tags which may exist for any given file in St. Jude Cloud. Tags with ‘sj_ prepended are required fields. Tags with ‘attr_ prepended are information queried from the physician or research team’s records at the time of sample submission to St. Jude Cloud and are considered optional, as the level of information gathered for each sample varies.
Below are the set of tags which may exist for any given file in St. Jude Cloud. Tags with `sj` prepended are required fields. Tags with `attr` prepended are information queried from the physician or research team’s records at the time of sample submission to St. Jude Cloud and are considered optional, as the level of information gathered for each sample varies.

| Property | Description |
| -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
Expand All @@ -25,7 +25,7 @@ Below are the set of tags which may exist for any given file in St. Jude Cloud.
| `file_size` | The size of the file in bytes, not exceeding 12 integers. |
| `sj_dataset_accession` | The permanent accession number assigned to a dataset in St. Jude Cloud. |
| `sj_embargo_date` | The [embargo date](../../requesting-data/glossary/#embargo-date), which specifies the first date which the files can be used in a publication. |
| `sj_long_disease_name` | The complete written name of the disease associated with the disease code store in the `sj_disease` attribute. For more information about our ontology go [here](https://university.stjude.cloud/docs/genomics-platform/about-our-data/disease-ontology).|
| `sj_long_disease_name` | The complete written name of the disease associated with the disease code store in the `sj_disease` attribute. For more information about our disease ontology go [here](https://university.stjude.cloud/docs/genomics-platform/about-our-data/disease-ontology).|
| `attr_age_at_diagnosis` | Age at first diagnosis. This field is normalized as a decimal value. If empty, the physician or research team did not indicate a value for this field. |
| `attr_diagnosis` | Unharmonized primary diagnosis as reported by the lab or PI upon submission of data to St. Jude Cloud. |
| `attr_sex` | Self-reported sex. |
Expand All @@ -41,11 +41,11 @@ Below are the set of tags which may exist for any given file in St. Jude Cloud.
| `sj_pub_accessions` | The related St. Jude Cloud accession number(s), if the file was associated with a paper(s). These group the files into publications as displayed on the Genomics Platform data browser. |
| `sj_pmid_accessions` | The related [Pubmed][pubmed] accession number, if the file was associated with a paper. |
| `attr_subtype_biomarkers` | A molecular mutation, SV or fusion event associated with a particular disease subtype that is used to define membership in that subtype. |
| `sj_associated_diagnoses` | List of all available associated diagnoses for the subject (from the tumor samples or from a patient's clinical history.|
| `sj_associated_diagnoses` | List of all available associated diagnoses for the subject (from the tumor samples or from a patient's clinical history).|
| `attr_germline_sample` | The paired germline sample that was used when creating the Somatic VCF file, if applicable. |
| `attr_diagnosis_group` | Each file is categorized into one of five diagnosis groups based on the type of tumor - hematologic malignancy, solid tumor, brain tumor, germ cell tumor, or not applicable (for germline samples). |
| `sj_ega_accessions` | The related [EGA][ega] accession number, if the file was associated with a paper. |
| `sj_access_unit` | Lists which Data Access Unit (DAU) the file belongs to. For more on Data Access Units, see here. (https://university.stjude.cloud/docs/genomics-platform/about-our-data/dau-and-datasets/#data-access-unit) |
| `sj_access_unit` | Lists which Data Access Unit (DAU) the file belongs to. For more on Data Access Units, see [here](https://university.stjude.cloud/docs/genomics-platform/about-our-data/dau-and-datasets/#data-access-unit). |
| `sj_diseases` | If your data request was process after August 18, 2020, the field should be interpreted as the harmonized St. Jude Cloud diagnosis based on the best available information (data provided by the lab or PI and followup by scientists on the St. Jude Cloud team). If your data request was processed before August 18, 2020, this field should be interpreted as the disease identifier assigned at the time of genomic sequencing (keyly, the diagnosis known at the time of genomic testing may not be the best available information). **If your data request was processed after August 18, 2020 and you'd like to use the most up to date, harmonized diagnosis**, we recommend using `sj_diseases` when including diagnosis in your analysis. If your data request was made before this time *or* if you wish to use the values exactly as provided by the lab or PI, we recommend using the lab-provided value in `attr_diagnosis`. For more information about our disease ontology go [here](https://university.stjude.cloud/docs/genomics-platform/about-our-data/disease-ontology). |
| `sj_datasets` | The dataset(s) in the data browser which this file is associated with. |
| `sj_pipeline_name` | Specifies which specific version of the pipeline was used when generating the file. |
Expand All @@ -58,14 +58,13 @@ Below are the set of tags which may exist for any given file in St. Jude Cloud.
!!!note
During the release of the St. Jude Cloud paper, we undertook a massive effort to curate and harmonize diagnosis values within St. Jude Cloud. We provide two values for diagnosis, and you should select carefully which value you use based on your use case:

1. `sj_diseases`, which, since August 18, 2020, represents the harmonized diagnosis value curated by scientists on the St. Jude Cloud team (before that time it represented the diagnosis known at time of sequencing). For more information about our disease ontology go [here](https://university.stjude.cloud/docs/genomics-platform/about-our-data/disease-
ontology).
1. `sj_diseases`, which, since August 18, 2020, represents the harmonized diagnosis value curated by scientists on the St. Jude Cloud team (before that time it represented the diagnosis known at time of sequencing). For more information about our disease ontology go [here](https://university.stjude.cloud/docs/genomics-platform/about-our-data/disease-ontology).
2. `attr_diagnosis`, which contains the unharmonized diagnosis value directly as it was submitted to us from the lab or PI.

**If your data request was processed after August 18, 2020 and you'd like to use the most up to date, harmonized diagnosis**, we recommend using `sj_diseases` field. If your data request was made before this time *or* if you wish to use the values exactly as provided by the lab or PI, we recommend using the value in `attr_diagnosis`. For more information about our disease ontology go [here](https://university.stjude.cloud/docs/genomics-platform/about-our-data/disease-ontology).
!!!

The `SAMPLE_INFO.txt` file that comes with your data request will contain the list of associated harmonized diagnosis codes (`sj_diseases`) for each sample. These codes represent the harmonized diagnosis values curated by the St. Jude Cloud team and reflect the most up to date information about the sample. For more information about our full disease ontology, please navigate to our [St. Jude Cloud Disease Ontology section](https://university.stjude.cloud/docs/genomics-platform/about-our-data/ontology) to read our white paper and access our downloadable disease ontology.
The `SAMPLE_INFO.txt` file that comes with your data request will contain the list of associated harmonized diagnosis codes (`sj_diseases`) for each sample. These codes represent the harmonized diagnosis values curated by the St. Jude Cloud team and reflect the most up to date information about the sample. For more information about our full disease ontology, please navigate to our [St. Jude Cloud Disease Ontology section](https://university.stjude.cloud/docs/genomics-platform/about-our-data/disease-ontology) to read our white paper and access our downloadable disease ontology.


[pubmed]: https://www.ncbi.nlm.nih.gov/pubmed/
Expand Down

0 comments on commit 38ffb5a

Please sign in to comment.