Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for comments: should PEDIGREE VCF header lines be required to have an ID field? #96

Closed
droazen opened this issue Jul 22, 2015 · 4 comments

Comments

@droazen
Copy link

droazen commented Jul 22, 2015

The proposed VCF 4.3 spec (#88) mandates that "All structured lines that have their value enclosed within ”<>” require an ID which must be unique within their type." This new requirement necessitates a change to ##PEDIGREE header lines, which now require an ID. Eg.,

    ##PEDIGREE=<ID=PedigreeID,Name_0=G0-ID,Name_1=G1-ID,...,Name_N=GN-ID>).

Since this breaks backwards compatibility for the sake of consistency within the spec, are there any objections to making this change?

@abecasis
Copy link

I think we may be able to accomplish the change in a way that is backward compatible (or at least more so).

Specifically, we could change the pedigree tag to look like this:

a) To indicate clonal relationship (e.g. between tumor and germline sample)

##PEDIGREE=<ID=TumorSampleID,Original=GermlineID>

[or, to specify two samples derived from same germline]

##PEDIGREE=<ID=TumourSample,Original=GermlineID>
##PEDIGREE=<ID=SomaticNonTumour,Original=GermlineID>

b) To specify a family relationship for diploids, like humans:

##PEDIGREE=<ID=ChildID,Father=FatherID,Mother=MotherID>

c) And in the arbitrary case,would become:

##PEDIGREE=<ID=SampleID,Name_1=Ancestor_1,...,Name_N=Ancestor_N>

We should be able to describe any relationship and should probably make clear that, to fully specify the pedigree, the VCF can define relationships between IDs that are not present in the VCF (but are ancestors of others that are).

So, in short, the basic idea is to replace Name_0 (typically, Child or Derived) in the original definition with ID. Since each Child or Derived sample ID should be unique, this should be seamless.

Goncalo

@pd3
Copy link
Member

pd3 commented Jul 23, 2015

@abecasis I like this, +1 from me

pd3 added a commit that referenced this issue Jul 29, 2015
Plus minor change of wording in ALT * description, "overlapping" rather
than "upstream" deletion.
@pd3
Copy link
Member

pd3 commented Jul 29, 2015

As there were no other comments, I made the change and will close the issue now.
Thanks

@pd3 pd3 closed this as completed Jul 29, 2015
@heuermh
Copy link
Contributor

heuermh commented Aug 3, 2016

Sorry to comment on an old closed issue.

From my reading of the specification, the pedigree meta lines record "relationships between genomes", and as there is "a distinction between sample and genome" supported by the sample meta line, I understood the IDs used in pedigree meta lines to refer to Genomes:

##SAMPLE=<ID=Blood,Genomes=Germline,Mixture=1.,Description="Patient germline genome">
##SAMPLE=<ID=TissueSample,Genomes=Germline;Tumor,Mixture=.3;.7,Description="Patient germline genome;Patient tumor genome">

I.e. Germline and Tumor above.

This change still refers to genomes in the text but the examples appear to be using sample IDs instead, Blood and TissueSample above.

Plus, if "the VCF can define relationships between IDs that are not present in the VCF" that should be mentioned in the specification.

zaeleus added a commit to zaeleus/noodles that referenced this issue Sep 16, 2023
When the input is VCF 4.2, this allows the `Child` or `Derived` field to
act as the record ID in the value collection.

See samtools/hts-specs#96 for the reasoning behind this definition.

Closes #201 and closes #202.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants