Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RO_0002600 ('causes disease') vs. RO_0002200 ('has phenotype') for G2D associations #195

Closed
mbrush opened this issue Sep 4, 2015 · 11 comments
Assignees
Labels
Milestone

Comments

@mbrush
Copy link
Member

mbrush commented Sep 4, 2015

In several data sources (ClinVar, OMIM, Orphanet), we use the RO_0002200 'has phenotype' relation for linking variants to diseases. This doesn’t seem right, and is not in line with the definition of 'has phenotype'. Since it is our practice is to use 'has phenotype' in this data only when a variant causes a disease, can we switch to use RO_0002600 ('capable of upregulating or causing pathological process')? Seems a fit, given its alternative term 'causes disease'.

One potential issue here is the fact that this relation is under the 'causal relation between material entity and a process' branch of RO, which of course holds properties relating material entities and processes. A variant in our modeling is not a material entity, but a generically dependent continuant that is materialized in material DNA molecules (where 'materialized in' is shorthand for 'is concretized as' o 'inheres in'). This raises the more general question of whether we can use properties like this that are defined for material entity subjects to describe both (1) the relationship between material genetic variants and a disease, and (2) the relationship between a GDC sequence that is materialized in such genetic material and a disease. It would obviously be nice to have this flexibility so as to leverage existing formal relationships, and not have to create duplicative ones where needed to apply at the GDC level. Posted a more detailed ticket in the RO tracker here on this topic.

@cmungall and @mellybelly, care to comment?

@mellybelly
Copy link

Some thoughts from our discussion:

Domain: genotype, genotypic part, and/or Environment
Range: disease or phenotype

correlates with
causes or contributes to
----causes
----contributes_to
--------contributes to severity/expressivity
--------contributes to penetrance/frequency
----preventative_for


Domain: disease or breed/strain
Range: phenotype
use has_phenotype for Disease -> phenotype
and for breed/strain genotype -> phenotype
but not for G or E -> phenotype

@mbrush
Copy link
Member Author

mbrush commented Nov 19, 2015

Need one clarification here. RO already has a 'correlated with' property (with sub-property 'is marker for') that are not specific for linking to 'conditions'. Did we decide to use these as they are for linking variants to conditions, or did we want to create new properties specifically to be used for linking to conditions, like the causes relations (e.g. 'correlates with condition' and 'marker for condition', as sub-properties of the more general existing ones)? @cmungall and @nlwashington?

@nlwashington
Copy link
Collaborator

i have been using those properties thus far for those instances where we don't have strong causal evidence between a variant and a disease/condition.

however, it feels wrong to use these to link whole genotypes (like for strains/breeds), but maybe it's okay.

@nlwashington
Copy link
Collaborator

but i think they are under a separate hierarchy than that for has_phenotype, so when doing the scigraph queries, i have to specify different parts of the RO subclasses.

@mbrush
Copy link
Member Author

mbrush commented Nov 19, 2015

Also, we really didn't consider the relationship we would assert between a gene and a condition. This is not really one of causation - because the gene itself isn't causative, only specific variants of it. Ideally we'd like to use the same relationship between genes-conditions as we do between variants-conditions which was one of the benefits of having a very general relation like has_phenotype.

Perhaps we create a similarly generic property has_condition, and place the new properties of causation here (in addition to under the causal relationship hierarchy). Then at least the gene-condition relation will be a direct ancestor of the variant-condition causation properties.

Something like:

has condition (new property - very generic relation that can hold between a gene and condition)
---causes or contributes to condition (new property - used for variants but not genes)
------causes condition (new property)
------contributes to condition (new property)
----------contributes to severity of condition (new property)
----------contributes to frequency of condition (new property)
------preventative for condition (new property)

and this property hierarchy would also live under causally related to as originally proposed:

causally related to (exists already)
---causes or contributes to condition (new property)
------causes condition (new property)
------contributes to condition (new property)
----------contributes to severity of condition (new property)
----------contributes to frequency of condition (new property)
------preventative for condition (new property)

_Note that it is entirely possible that I am overthinking this and we can agree to go ahead use the original causation/contribution relations between genes and conditions, and not worry about the nuances outlined above._

@cmungall
Copy link
Member

I think having a grouping relation is fine

@mbrush
Copy link
Member Author

mbrush commented Nov 19, 2015

OK. I think for now I will hold off on implementing the generic/grouping relation because is it not needed immediately. Want to consider more if we really need it - i.e. perhaps we can just use 'causes or contributes to condition' to link genes to conditions, in which case we may not need this more generic relation. And for now I will also create a 'correlated with condition' sub-property of the existing 'correlated with' property, for consistency and grouping with the causes condition properties.

@mbrush
Copy link
Member Author

mbrush commented Dec 3, 2015

The question arose this week of variants that are risk factors that increase susceptibility to a disease.
Is a new property needed here ('contributes to susceptibility to condition', or 'increases susceptibility for condition'). Or can we use the existing 'contributes to frequency of condition' property here?

Condition frequency is a population-level concept, while risk factor/susceptibility is an individual-level concept - but the increased susceptibility of an individual would increase the frequency in a population. Still, i would think a new relation is needed to describe the individual-level concept whereby a variant makes an individual more likely to get a disease, but doesn't directly and deterministically cause the disease. Thoughts @mellybelly, @cmungall , @nlwashington

@pnrobinson
Copy link
Member

contributes to frequency of condition definitely seems wrong, but on the other hand, we probably we never be asserting that a particular variant in a given individual is causal for a common complex disease, since we probably will never know exactly enough. i.e., I think it would be wrong to say

var 1 contributes to susceptibility in individual A
var 2 contributes to susceptibility in individual A
var 3 contributes to susceptibility in individual A
...
var 257 contributes to susceptibility in individual A

since in the end we do not know if it was a combination of some subset of those variants and some environmental exposure and some stochasticity. Thus maybe we are just annotating these variants on a population level anyway. I do not like the relation "contributes to frequency of condition" at all, and wonder if we cannot come up with something better to say "is a risk factor for".

-Peter

Dr. med. Peter N. Robinson, MSc.
Professor of Medical Genomics
Professor of Bioinformatics, Freie Universität Berlin
Institut für Medizinische Genetik und Humangenetik
Charité - Universitätsmedizin Berlin
Augustenburger Platz 1
13353 Berlin
Germany
+4930 450566006
Mobile: 0160 93769872
peter.robinson@charite.de
http://compbio.charite.de
http://www.human-phenotype-ontology.org
I have learned from my mistakes, and I am sure I can repeat them exactly
ORCID ID:http://orcid.org/0000-0002-0736-9199
Scopus Author ID 7403719646
Appointment request: http://doodle.com/pnrobinson


Von: mbrush [notifications@github.com]
Gesendet: Donnerstag, 3. Dezember 2015 23:16
An: monarch-initiative/dipper
Betreff: Re: [dipper] RO_0002600 ('causes disease') vs. RO_0002200 ('has phenotype') for G2D associations (#195)

The question arose this week of variants that are risk factors that increase susceptibility to a disease.
Is a new property needed here ('contributes to susceptibility to condition', or 'increases susceptibility for condition'). Or can we use the existing 'contributes to frequency of condition' property here?

Condition frequency is a population-level concept, while risk factor/susceptibility is an individual-level concept - but the increased susceptibility of an individual would increase the frequency in a population. Still, i would think a new relation is needed to describe the individual-level concept whereby a variant makes an individual more likely to get a disease, but doesn't directly and deterministically cause the disease. Thoughts @mellybellyhttps://github.com/mellybelly, @cmungallhttps://github.com/cmungall , @nlwashingtonhttps://github.com/nlwashington


Reply to this email directly or view it on GitHubhttps://github.com//issues/195#issuecomment-161802141.

@mbrush
Copy link
Member Author

mbrush commented Dec 4, 2015

I agree - I am not a fan of the 'contributes to frequency of condition' relation, and would prefer replacing/redefining this as a relation about susceptibility. In my mind, the notion of contributing to susceptibility to a condition is the same as being a risk factor it, so my proposal is to rename/replace the 'contributes to frequency of condition' relation with 'contributes to susceptibility to' (and give this alternative labels 'is risk factor for condition' and 'increases susceptibility to condition'). Any variant that increases susceptibility to a condition in an individual would increase the frequency of the condition in a population, so this relation could potentially be used to describe variant-condition associations at either level.

In any case, I would like to explore some real data use cases around this new hierarchy of properties to see how it works before committing to final decisions here.

@kshefchek
Copy link
Contributor

replaced with #254

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants