Skip to content
This repository has been archived by the owner on Dec 13, 2019. It is now read-only.

Variant representation #10

Open
pnrobinson opened this issue Nov 12, 2015 · 11 comments
Open

Variant representation #10

pnrobinson opened this issue Nov 12, 2015 · 11 comments

Comments

@pnrobinson
Copy link

We should discuss how to best represent variants. Probably we need something flexible like

HGVS
NM_123:c.-123C>T

with various types that also work for chromosomes, microdeletions, and other sets of findings that might be protein biomarkers etc, so that this standard can be used with a wide range of diseases and publications.

@cmungall
Copy link
Member

cmungall commented Feb 4, 2016

Apologies, the commit above appears to be unrelated

This is what we have as an example:

schema: phenopacket-level-1
comment: This is an example phenopacket containing one variant to phenotype association
ontologies:
  - id: hp
    version: "2016-02-01"
variants:
  - id: _:v1
    positions:
      - type: HGVS
        value: "NM_123:c.-123C>T"
phenotype_profile:
  - entity: _:v1
    evidence:
      type: TAS
      source:
        id: PMID:FAKE1234
        title: Mutations in NM_123 cause multisystem proteinopathy and ALS
    phenotype:
      type:
        id: HP:0003560
        label: Muscular dystrophy
      onset:
        type:
          id: HP:0003584
          label: Late onset
      description: blah blah
    created: 2016-01-14
    contributors:
      - id: ORCID:nnnn-nnnn-nnnn

on the one hand this is scope creep. On the other hand this is practically v useful. The approach is to be modular. The variant part is separable, can be represented outside and referenced, or can be embedded in. Same approach for ped.

@cmungall
Copy link
Member

cmungall commented Mar 8, 2016

Can someone take a shot at making some fake examples, we will derive the model from this

@tudorgroza
Copy link

@cmungall @pnrobinson @jmcmurry: Why are we not adapting the MME schema for variants? It is fairly comprehensive and would enable PXF to be aligned with it. If you agree, I can have a first stab at implementing it.

@cmungall
Copy link
Member

cmungall commented Apr 6, 2016

Can you have a go at a PR on the reference implementation?

There is also the main GA4GH variant representation. But why don't you
take a first pass at a PR on the reference implementaion?

On 5 Apr 2016, at 21:18, tudorgroza wrote:

@cmungall @pnrobinson @jmcmurry: Why are we not adapting the MME
schema for variants? It is fairly comprehensive and would enable PXF
to be aligned with it. If you agree, I can have a first stab at
implementing it.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#10 (comment)

@pnrobinson
Copy link
Author

Tudor and I just discussed this. I would suggest that we design the format to be easily extensible to other lab abnormalities - say a paper about a protein biomarker and some disease. Or ISCN, glycomics, and metabolomics. Might be a lot for v1
cheers Peter

Dr. med. Peter N. Robinson, MSc.
Professor of Medical Genomics
Professor of Bioinformatics, Freie Universität Berlin
Institut für Medizinische Genetik und Humangenetik
Charité - Universitätsmedizin Berlin
Augustenburger Platz 1
13353 Berlin
Germany
+4930 450566006
Mobile: 0160 93769872
peter.robinson@charite.de
http://compbio.charite.de
http://www.human-phenotype-ontology.org
I have learned from my mistakes, and I am sure I can repeat them exactly
ORCID ID:http://orcid.org/0000-0002-0736-9199
Scopus Author ID 7403719646
Appointment request: http://doodle.com/pnrobinson


Von: Chris Mungall [notifications@github.com]
Gesendet: Mittwoch, 6. April 2016 07:12
An: phenopackets/phenopacket-format
Cc: Robinson, Peter
Betreff: Re: [phenopackets/phenopacket-format] Variant representation (#10)

Can you have a go at a PR on the reference implementation?

There is also the main GA4GH variant representation. But why don't you
take a first pass at a PR on the reference implementaion?

On 5 Apr 2016, at 21:18, tudorgroza wrote:

@cmungall @pnrobinson @jmcmurry: Why are we not adapting the MME
schema for variants? It is fairly comprehensive and would enable PXF
to be aligned with it. If you agree, I can have a first stab at
implementing it.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#10 (comment)


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHubhttps://github.com//issues/10#issuecomment-206120620

@cmungall
Copy link
Member

cmungall commented Apr 6, 2016

On 5 Apr 2016, at 22:27, Peter Robinson wrote:

Tudor and I just discussed this. I would suggest that we design the
format to be easily extensible to other lab abnormalities - say a
paper about a protein biomarker and some disease. Or ISCN, glycomics,
and metabolomics. Might be a lot for v1

I'm not totally following the relevance to this ticket (other than
ISCN).

Just a clarifying note about versions and levels. These are in theory
orthogonal. Think OWL profiles and OWL versions, or GO-vs-GO-slims and
GO versions. Version updates will be about clarifying semantics,
improvements not related to expressivity, etc. Should stabilize a bit
after v1. Levels are more like profiles or subsets.

Having said that since we switched to JSON-schema everything is rolled
into the same level. It's actually easier to make the more complete
model and then think about the kinds of profiles we would derive from
it. It's also likely that we won't be able to capture everything in v1,
and some of the higher level stuff will appear in future versions. But
just a cautionary note on equating versions with expressivity and
flexibility.

Let's capture some of these requirements e.g. glycomics in separate
tickets.

@tudorgroza
Copy link

@cmungall : Ok. Can you please have a look at the current PR I've put in?

@cmungall
Copy link
Member

cmungall commented Apr 6, 2016

Thanks!

So Association was originally conceived of as an association between a
thing like a person, disease, variant and an ontological description of
that thing, Of course it makes perfect sense to genericise this somewhat
for person-variant associations, but I'll need to think to make sure
that no assumptions are broken. But this can happen later.

On 5 Apr 2016, at 23:23, tudorgroza wrote:

@cmungall : Ok. Can you please have a look at the current PR I've put
in?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#10 (comment)

@tudorgroza
Copy link

Thanks.
I'll add it to PA and see what other things are missing.

@julesjacobsen
Copy link
Contributor

julesjacobsen commented May 6, 2016

I think we shoul leave out the HGVS description - it only applies to humans and we want this to be more generic than that.

Also I think we should follow the GA4GH variant schema more closely. The MME one is pretty closely aligned to this anyway. We'll only be able to capture SNPs and indels, but that's the current state of things.

We also need the ability to link out to other sources, e.g VCF files. Probably a simple uri will suffice?

@cmungall
Copy link
Member

cmungall commented May 6, 2016

On 6 May 2016, at 3:51, Jules Jacobsen wrote:

Also I think we should follow the GA4GH variant schema more closely.
The MME one is pretty closely aligned to this anyway. We'll only be
able to capture SNPs and indels, but that's the current state of
things.

That's fine - we will use a genotype object for other scenarios

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants