Skip to content
This repository has been archived by the owner on Oct 28, 2022. It is now read-only.

Biosample sample characteristics #711

Closed
wants to merge 2 commits into from

Conversation

david4096
Copy link
Member

Reopened from #710 to remove extraneous commits. @mbaudis if you're happy with it you can close the other.

This PR addresses the need for a different structure and naming of the Biosample.disease attribute:

there has been consensus in the metadata task team that an attribute name of disease is misleading in the context of Biosample and should be reserved for the individual object level
a Biosample can use a number of ontologies, e.g. for patho-histology, anatomic location, tissue type ...; see also the discussion at #707
The renaming to sample_characteristics is in line with the use e.g. at GEO; however, the sample_ prefix may not be strictly necessary. Alternatives welcome.

@david4096
Copy link
Member Author

Copying response from #710:

This makes sense: to have a tag-bag field that strictly represents Ontology Terms, although the semantic context of those terms can be lost if no name is provided.

Please also consider the approach of allowing tagging via Ontology Term (and others) in a generic attributes field by upgrading the info field. This would allow a data curator to define named tag bags on a biosample with some more context without adding new named fields to the message. This replaces the info field and both approaches are not exclusive. PR for this feature here.

I'm +1 for this approach as it solves the immediate issue of specifying tissue type for TCGA data.

@mdmiller53
Copy link

+1 for the merge

@sarahhunt
Copy link
Contributor

+1 Looks good to me.

@david4096
Copy link
Member Author

Seconding my +1. When loading the TCGA biosamples it appears some are labeled with more than one disease, making the singly valued disease field not especially helpful!

@mbaudis
Copy link
Member

mbaudis commented Sep 14, 2016

So merge depending on the integration team. @david4096 @kozbo?! (If no objections to the attribute's name).

@mcourtot
Copy link

mcourtot commented Oct 4, 2016

Hi @david4096,

We talked further with @mbaudis and would like to propose replacing the sample_characteristics attribute with the characteristics object, with structure:

characteristics: [
      {
      description: “squamous cell carcinoma, base of tongue, stage 2”,
      type: phenotype (could be organism, disease...)
      repeated OntologyTerm ontologyTerms: [
          {
        ontologyId:  “http://purl.obolibrary.org/obo/DOID_0050865”,
        term:  “tongue squamous cell carcinoma”,
        },
        {
        ontologyId: “http://purl.obolibrary.org/obo/UBERON_0006919”,
        term:  “tongue squamous epithelium”,
        },
        {
        ontologyId:  “http://purl.obolibrary.org/obo/UBERON_0010033”
        term:  “posterior part of tongue”,
        },
        ],
      }
]

where each ontologyTerm is simplified in the above, but would in fact be the OntologyTerm structure as we agreed on at https://github.com/ga4gh/schemas/pull/694/files (and include version and source).

@david4096
Copy link
Member Author

Neat! That's certainly a more flexible way of describing characteristics! Could you take a look at #700 ? I believe it is attempting to perform a similar facility and would apply across the API. The additions you've made from what I can tell are restricting the characteristics to ontology terms, and providing a description and controlled vocabulary for the type.

@mbaudis
Copy link
Member

mbaudis commented Oct 18, 2016

Following the discussions at Vancouver: Closing this in favour #725.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants