Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Call function: How to interpret log2(absolute copy number integers of a segment) #45

Closed
jperales opened this issue Sep 9, 2015 · 2 comments

Comments

@jperales
Copy link

jperales commented Sep 9, 2015

Hi,
I don't fully understand why I obtained a pattern of integers (x) and half-integers (x + 0.5) when using call function. I expected to get only integers.

My test:
Assuming 100% tumor purity across my samples, I calculated the -threshold as follow (R code):

> c("No.Copies"=log2(0.1/2),"one.Copy"=log2(1/2),
"neutral.Diploid"=log2(2/2),"three.copies"=log2(3/2),
"four.copies"=log2(4/2))
#      No.Copies        one.Copy neutral.Diploid    three.copies     four.copies 
#     -4.3219281      -1.0000000       0.0000000       0.5849625       1.0000000

So I run the cnvkit.py call function using the following parameters:

$cnvkit.py call ${sampleName}".cns" -m threshold -t=-4.3,-1,0,0.6,1 -o ${sampleName}".absoluteCN.cns"
Treating sample gender as female
Calling copy number with thresholds: -4.3 => 0, -1 => 1, 0 => 2, 0.6 => 3, 1 => 4

Please, notice the output said: Calling copy number with thresholds: -4.3 => 0, -1 => 1, 0 => 2, 0.6 => 3, 1 => 4, where I guess the first number of the pair is the log2 and the second one is the absolute copy number.

Then, if we retrieve all unique estimated copy-numbers from the output ${sampleName}".absoluteCN.cns" file, and revert them from the log2 transformation (using bash):

$tail -n +2  ${sampleName}".absoluteCN.cns" | cut -f 5 | sort -n -u | awk '{print "log2(CN)="$1"; then CN="2**$1;}'
log2(CN)=-10.9658; then CN=0.000499995
log2(CN)=-1; then CN=0.5
log2(CN)=0; then CN=1
log2(CN)=0.584963; then CN=1.5
log2(CN)=1; then CN=2
log2(CN)=1.32193; then CN=2.5
log2(CN)=1.58496; then CN=2.99999
log2(CN)=2; then CN=4
log2(CN)=2.16993; then CN=4.50002
log2(CN)=3.39232; then CN=10.5

I wonder why there are half-integers (i.e. 0.5, 1.5, 2.5, 4.5 and 10.5) in the output. Are these copy-numbers defined in the context of a haploid genome or in the whole genome (diploid)?

Thanks in advance,
Javier

My version:

$cnvkit.py version
0.6.1-dev
@etal
Copy link
Owner

etal commented Sep 9, 2015

The copy numbers are relative to the specified ploidy (default 2) -- they are still log2 ratios with the same interpretation as the segmentation output (.cns). This is so that you can plot the .absoluteCN.cns file with the scatter/heatmap/diagram commands the same way you would with the original .cns file. The call command just rounds the segment means to what they would be if the true copy number were an integer.

To get the absolute integers in a human-readable form, try:

cnvkit.py export bed sampleName.absoluteCN.cns --show-neutral

The thresholds work like this (see cnvlib.call.absolute_threshold ):

if log2 <= -4.3: CN=0
if log2 <= -1: CN=1
if log2 <= 0: CN=2
if log2 <= 0.6: CN=3
...

So the germline thresholds, rounding to the nearest log2(integer), could be calculated like:

> log2( (0:4 + .5 ) / 2)
[1] -2.0000000 -0.4150375  0.3219281  0.8073549  1.1699250

Sorry for the confusion. I'll update the docs.

@jperales
Copy link
Author

Thank you very much! It is clear now.
Javier

@etal etal closed this as completed Sep 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants