Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Acceptable values for confidence interval tags #224

Open
Anishka0107 opened this issue Jul 4, 2017 · 6 comments
Open

Acceptable values for confidence interval tags #224

Anishka0107 opened this issue Jul 4, 2017 · 6 comments
Labels
Milestone

Comments

@Anishka0107
Copy link

Anishka0107 commented Jul 4, 2017

I was curious as to what values are acceptable for confidence interval fields like CIPOS in Structural Variants.

The fields involving confidence intervals are : CIPOS, CIEND, CILEN, CICN and CICNADJ
For example, CIEND should contain 2 integers and in the example in the spec, the first value is negative and the second one positive.

So, what range of values is acceptable for these two integers? Can they be 0 too? Also, can they have equal values?

@d-cameron
Copy link
Contributor

d-cameron commented Sep 12, 2017

So, what range of values is acceptable for these two integers?

Technically, the specifications don't place any restrictions other on CIPOS other than they be two number so a confidence interval of CIPOS=10,-10 is, whilst not having a meaningful interpretation, is technically compliant.

Can they be 0 too? Also, can they have equal values?

A CIPOS with a (non-zero) positive starting position or a negative ending position doesn't really make sense since the called variant position would be outside the confidence interval.

@thefferon
Copy link

This ticket highlights the need for the VCF spec to be more explicit in its definition and treatment of CIPOS and CIEND. In order for values for these fields to make sense (as @d-cameron points out), there is some basic logic that must be followed. That logic should be spelled out in the spec. The following is a proposal ('CI*' means 'CIPOS and/or CIEND'):

  • CI* should always consist of two integers separated by a comma
  • the first integer must be <=0 and the second integer must be >=0
  • the only case in which both values can be equal is "(0,0)". But since "(0,0)" effectively indicates there is no confidence interval, the only time it could be used in a meaningful way is to emphasize that a particular breakpoint is known exactly, e.g., when most of the other breakpoints in the dataset to which it belongs are NOT known exactly (i.e. have confidence intervals consisting of non-zero values)

Another important aspect of CI* that needs to be explicitly defined in the spec is: What exactly do the values in CI* represent? Are they the boundaries of a statistical 95% confidence interval? Are they "hard boundaries"? See related discussion in #132 .

@cwhelan
Copy link

cwhelan commented Oct 24, 2017

I just want to mention that clarifications/modifications to the CIPOS/CIEND and IMPRECISE tags in the spec are being discussed in the conversation going on around #231 that is being moderated by @thefferon . Anyone interested in this issue should probably comment on that PR and/or contact @thefferon to join the detailed discussion around the issues that have been brought up there.

@droazen
Copy link

droazen commented Oct 24, 2017

Closing in favor of #231

@droazen droazen closed this as completed Oct 24, 2017
@cyenyxe cyenyxe added this to the VCF v4.3 final revision milestone Feb 8, 2018
@cyenyxe
Copy link
Member

cyenyxe commented Mar 1, 2018

#231 was closed without resolution and this still needs to be clarified in #266 . Reopening this issue.

@cyenyxe cyenyxe reopened this Mar 1, 2018
@d-cameron
Copy link
Contributor

Given we're reopened this, we should also explicitly state what CIPOS/IMPRECISE and CIPOS/HOMSEQ mean. I/GRIDSS are writing CIPOS/IMPRECISE when there is only RP support and the exact position is known and writing CIPOS/HOMSEQ when the exact breakpoint sequence is know but there is a microhomology at the breakpoint.

We should either:

  1. explicitly state these two alternate meanings of CIPOS

  2. define an additional field (HOMPOS?) for the latter and reserve CIPOS purely for imprecise calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants