Call function: How to interpret log2(absolute copy number integers of a segment) #45

jperales · 2015-09-09T10:01:10Z

Hi,
I don't fully understand why I obtained a pattern of integers (x) and half-integers (x + 0.5) when using call function. I expected to get only integers.

My test:
Assuming 100% tumor purity across my samples, I calculated the -threshold as follow (R code):

> c("No.Copies"=log2(0.1/2),"one.Copy"=log2(1/2),
"neutral.Diploid"=log2(2/2),"three.copies"=log2(3/2),
"four.copies"=log2(4/2))
#      No.Copies        one.Copy neutral.Diploid    three.copies     four.copies 
#     -4.3219281      -1.0000000       0.0000000       0.5849625       1.0000000

So I run the cnvkit.py call function using the following parameters:

$cnvkit.py call ${sampleName}".cns" -m threshold -t=-4.3,-1,0,0.6,1 -o ${sampleName}".absoluteCN.cns"
Treating sample gender as female
Calling copy number with thresholds: -4.3 => 0, -1 => 1, 0 => 2, 0.6 => 3, 1 => 4

Please, notice the output said: Calling copy number with thresholds: -4.3 => 0, -1 => 1, 0 => 2, 0.6 => 3, 1 => 4, where I guess the first number of the pair is the log2 and the second one is the absolute copy number.

Then, if we retrieve all unique estimated copy-numbers from the output ${sampleName}".absoluteCN.cns" file, and revert them from the log2 transformation (using bash):

$tail -n +2  ${sampleName}".absoluteCN.cns" | cut -f 5 | sort -n -u | awk '{print "log2(CN)="$1"; then CN="2**$1;}'
log2(CN)=-10.9658; then CN=0.000499995
log2(CN)=-1; then CN=0.5
log2(CN)=0; then CN=1
log2(CN)=0.584963; then CN=1.5
log2(CN)=1; then CN=2
log2(CN)=1.32193; then CN=2.5
log2(CN)=1.58496; then CN=2.99999
log2(CN)=2; then CN=4
log2(CN)=2.16993; then CN=4.50002
log2(CN)=3.39232; then CN=10.5

I wonder why there are half-integers (i.e. 0.5, 1.5, 2.5, 4.5 and 10.5) in the output. Are these copy-numbers defined in the context of a haploid genome or in the whole genome (diploid)?

Thanks in advance,
Javier

My version:

$cnvkit.py version
0.6.1-dev

The text was updated successfully, but these errors were encountered:

etal · 2015-09-09T17:16:51Z

The copy numbers are relative to the specified ploidy (default 2) -- they are still log2 ratios with the same interpretation as the segmentation output (.cns). This is so that you can plot the .absoluteCN.cns file with the scatter/heatmap/diagram commands the same way you would with the original .cns file. The call command just rounds the segment means to what they would be if the true copy number were an integer.

To get the absolute integers in a human-readable form, try:

cnvkit.py export bed sampleName.absoluteCN.cns --show-neutral

The thresholds work like this (see cnvlib.call.absolute_threshold ):

if log2 <= -4.3: CN=0
if log2 <= -1: CN=1
if log2 <= 0: CN=2
if log2 <= 0.6: CN=3
...

So the germline thresholds, rounding to the nearest log2(integer), could be calculated like:

> log2( (0:4 + .5 ) / 2)
[1] -2.0000000 -0.4150375  0.3219281  0.8073549  1.1699250

Sorry for the confusion. I'll update the docs.

jperales · 2015-09-10T10:14:09Z

Thank you very much! It is clear now.
Javier

etal added a commit that referenced this issue Sep 9, 2015

doc: Explain the call command better (#45)

4701b90

etal closed this as completed Sep 10, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Call function: How to interpret log2(absolute copy number integers of a segment) #45

Call function: How to interpret log2(absolute copy number integers of a segment) #45

jperales commented Sep 9, 2015

etal commented Sep 9, 2015

jperales commented Sep 10, 2015

Call function: How to interpret log2(absolute copy number integers of a segment) #45

Call function: How to interpret log2(absolute copy number integers of a segment) #45

Comments

jperales commented Sep 9, 2015

etal commented Sep 9, 2015

jperales commented Sep 10, 2015