Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify when VCF meta entries are mandatory #67

Closed
cyenyxe opened this issue Feb 25, 2015 · 6 comments
Closed

Clarify when VCF meta entries are mandatory #67

cyenyxe opened this issue Feb 25, 2015 · 6 comments
Labels

Comments

@cyenyxe
Copy link
Member

cyenyxe commented Feb 25, 2015

The specifications states the following:

It is strongly encouraged that information lines describing the INFO, FILTER and FORMAT entries used in the body of the VCF file be included in the meta-information section. Although they are optional, if these lines are present then they must be completely well-formed.

But vcftools at least creates an error when they are not present. I found it in the code (http://sourceforge.net/p/vcftools/code/HEAD/tree/trunk/perl/Vcf.pm#l2366), not from the console output. What would be the preferred behaviour?

For ALT metadata describing alleles such as <DEL> or <INS>, nothing is stated and vcftools doesn't raise an error if they are not described. Should I assume these are optional too? Anyway, if some are mandatory but others don't it would be interesting to highlight this difference.

@jmarshall jmarshall added the vcf label May 8, 2015
@eitanbanks
Copy link

My vote is that they be treated just like any other header lines: they are optional but if present must be well-formed.
Note that tools may still choose to require them (and will exit with a friendly error saying "sorry, but we won't process your VCF until it has them included"). But the spec doesn't need to enforce that.

@pd3
Copy link
Member

pd3 commented May 21, 2015

@eitanbanks +1 from me.

However, to make it clear that we do not encourage undefined tags, I included the following sentence in the draft:

Note that BCF, the binary counterpart of VCF, requires that all entries are
present, therefore the use of undefined tags is strongly discouraged.

@cyenyxe
Copy link
Member Author

cyenyxe commented May 21, 2015

Thank you both for your answers, I agree with the "optional but well-formed if present" idea. The BCF note is also adequate, but I still find the previous paragraph confusing.

If I can make a suggestion, I would write something like the following. The order is somewhat changed to refer first to meta-information lines in general, then recommendations for specific tags.

File meta-information is included after the ## string and must be key=value
pairs. Meta-information lines are optional, but if they are present then
they must be completely well-formed {\color{red} and their ID must be
unique within their type. Note that BCF, the binary counterpart of VCF, requires
that all entries are present.

It is strongly encouraged to include meta-information lines describing the \texttt{INFO},
\texttt{FILTER}, \texttt{FORMAT} and {\color{red}\texttt{contig}} entries used
in the body of the VCF file.

Meta-information lines can be in any order with the exception of fileformat
which must come first.

Please let me know your opinions, and if I should make a PR for this.

@pd3
Copy link
Member

pd3 commented May 21, 2015

I think that reads better, thank you. There was a proposal discussed at length offline to make the ID attribute mandatory for all structured values. With this addition, the section would read like this:

File meta-information is included after the ## string and must be key=value
pairs. Meta-information lines are optional, but if they are present then
they must be completely well-formed. {\color{red} Note that BCF, the binary
counterpart of VCF, requires that all entries are present. It is strongly
encouraged to include meta-information lines describing the entries used in the
body of the VCF file.

All structured lines that have their value enclosed within "$<>$" require an ID
which must be unique within their type.

Meta-information lines can be in any order with the exception of fileformat
which must come first.}

@cyenyxe
Copy link
Member Author

cyenyxe commented May 21, 2015

Looks good to me :)

pd3 added a commit that referenced this issue Jun 2, 2015
- meta-information lines must be key=value pairs (#67)
- an ID attribute is required in structured header lines, unique within its type
- the above point newly requires ID in the reserved PEDIGREE tag
- new reserved AD, ADF, and ADR FORMAT and INFO fields added, resolves #78
- reorder list of INFO and FORMAT tags alphabetically
- removed UNICODE-characters-not-supported sentence from BCF specification, in partial response to #65
@pd3
Copy link
Member

pd3 commented Jun 11, 2015

Since there were no further comments, I am closing the issue as solved.

@pd3 pd3 closed this as completed Jun 11, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants