How to set founding.cluster? #4

crazyhottommy · 2017-02-19T05:47:29Z

Hi,
I have several questions on how to set the parameters for the infer.clonal.models function.
First, the help page does not explain all the arguments.
e.g. how do I set the founding.cluster?
in the github page example, there is a cluster.center argument, but I do not see it in the help.

How many times of bootstrap usually is enough? default is 1000, and for around 300 mutations, it is taking several hours to run, and not finish yet..

Thanks for giving more information.

Best,
Tommy

The text was updated successfully, but these errors were encountered:

hdng · 2017-02-19T05:57:47Z

Assuming this is monoclonal cancer initiation, the founding.cluster is often the cluster with the highest CCF (or VAF) in most (or in ideal situation, all) samples. 1000 bootstraps is reasonable for first run. The number of samples and the number of clusters affect the performance time more. How many clusters/samples do you have for your case? You may need some clean-up (eg. removing small or noise clusters). Also, are there many clusters with very small CCF? That may increase the number of models to consider.

crazyhottommy · 2017-02-19T17:43:38Z

Thanks! I only have two samples, but do have smaller clusters. I removed the cluster with very few mutations (what's the cut-off? , I removed clusters with only 1 mutation).

it finished much sooner, but none model were found even at p < 0.1.

what is alpha for?

x <- infer.clonal.models(variants= clonevol_input,
            cluster.col.name="cluster",
            model = "polyclonal",
            vaf.col.names=vaf.col.names,
            subclonal.test="bootstrap",
            subclonal.test.model="non-parametric",
            cluster.center="mean",
            num.boots=1000,
            founding.cluster= NULL,
            min.cluster.vaf=0.01,
            p.value.cutoff= 0.1,
            alpha= 0.5,
            random.seed=63108)


Sample 1: Pa26T1.vaf <-- Pa26T1.vaf
Sample 2: Pa26T2.vaf <-- Pa26T2.vaf
Using polyclonal model
Note: all VAFs were divided by 100 to convert from percentage to proportion.
Generating non-parametric boostrap samples...
Pa26T1.vaf : Enumerating clonal architectures...
Determining if cluster VAF is significantly positive...
Exluding clusters whose VAF < min.cluster.vaf=0.001
Non-positive VAF clusters:  
Pa26T1.vaf : 910 clonal architecture model(s) found

Pa26T2.vaf : Enumerating clonal architectures...
Determining if cluster VAF is significantly positive...
Exluding clusters whose VAF < min.cluster.vaf=0.001
Non-positive VAF clusters:  
Pa26T2.vaf : 376 clonal architecture model(s) found

Finding matched clonal architecture models across samples...
Found  0 compatible model(s)
Found 0 compatible evolution models
Pruning merged clonal evolution trees....
Number of unique pruned trees: 0 
Scoring models...
0 model(s) with p-value <= 0.1 

***WARN: Inra-tumor heterogeneity could result in a clone (eg. founding)
         that is not present (no cells) in any samples, although  detectable via
         clonal marker variants due to that its subclones are distinct across
         samples. Therefore, a model with a higher p-value for the CCF of such
         a clone can still be biologically consistent, interpretable, and
         interesting! Manual investigation of those higher p-value models
         is recommended

hdng · 2017-02-20T06:20:49Z

The cutoff depends on how confident you are on the clusters, eg. if your data are deep enough such that you can trust a cluster of >=5 variants, then use 5. However, this is a bit arbitrary though, and there is no formula for this. I would visualize the clusters first (eg. a pairwise plot of CCF of the variants grouped by clusters for the two samples), and check if the clusters look reasonable. Also, it seems that your scaled VAF is from 0-1. The default for clonevol is 0-100 (percentage). This probably does not affect the underlying inference, but it is good to use 1-100 or set vaf.in.percent=FALSE when calling infer.clonal.models. You should also set alpha and p.value.cutoff the same, and try lower value, eg. 0.01.

crazyhottommy · 2017-02-20T18:07:20Z

Thanks! I did scale variant_prevalence by *100/2 to get VAF ranging from 0-50.
so alpha and p.value should be the same? what alpha is?

Even after playing with different arguments, I still do not have any matched clonal architecture models across samples. what should I do with them?

I know the paper is under review, I would like to read to understand more of the tool.

Thanks,
Tommy

hdng · 2017-02-20T20:06:13Z

Visualization of the clusters would help identifying problems with the clustering, eg. is there an outlier or abnormal cluster that conflicts between samples or shows sign of polyclonal cancer initiation? Clonevol does require a reasonably good clustering to start with. Alpha is the significant threshold to identify if a clone exists in a sample. I am working on the manual and technical document (takes too long already sorry), and the manuscript should be posted on bioRxiv soon too. Stay tuned. Thanks.

crazyhottommy · 2017-02-20T21:26:17Z

looking forward to read it.
at the same time, I attached a cluster VAF scatter plot. clonevol does not output any model, you see any problems with it?
Thanks!

Tommy

Pa30T1_vs_Pa30T2.pdf

hdng · 2017-02-23T17:43:07Z

Looks a little odd. Is this the scaled VAF based on CCF estimated via Pyclone? Should the highest VAF be near 50%?

crazyhottommy · 2017-02-23T19:14:09Z

sorry, when plot the VAF scatter plot, I used the raw output from pyclone,
I did not *100/2 to get the scaled VAF.
but if I scale it, it will be the same pattern but just different scale on X and Y axis.

However, when I put the data into clonevol, I did scale to 0-50.
So, the pattern of the VAF scatter plot look odd ?

Thanks very much for your help!

hdng · 2017-03-02T17:20:31Z

Sorry for slow response. I think some models can be inferred, but you need a tweak.

You can try plotting the models using plot.clonal.models function. If no matched model is there, clonevol will plot all models for individual samples. This helps debugging what clusters are not compatible across samples.
Looks like pyclone put a hard cut on the CCF at 100% due to their statistical model. It seems that tumor purity does not affect their model, so I think if you divide your original VAF by 2 to fake that the tumor purity is always < 50%, thus posterior CCF estimated by pyclone is maxed at 50% (equivalent to 100%), while still allowing some to go over 50%, this would make the scaled VAF looks reasonable, and you can multiply scaled VAF by 2 to get back to the original purity, and run clonevol. Don't forget to plot CCF or VAF like you did.

Are their many copy altered variants? If not, you can try to cluster by SciClone as well.

crazyhottommy · 2017-03-15T21:31:38Z

Thank you for your answer.

those two samples are from the same tumor of the same patient , so what's wrong if no compatible models are inferred? I plot the models individually. please see attachment. how to explain it?
you are saying I need to divide the VAF (calculated from samtools mpileup) by 2 before feeding into pyclone? then multiply pyclone output CCF by 2 to run clonevol? what's the advantage to do this?

I also attached the scaled VAF (pyclone CCF *100/2) for you to test if you can.

Thanks again!

model-Pa30T2.vaf.pdf
model-Pa30T1.vaf.pdf
clonevol_input.txt

hdng · 2017-03-25T07:55:38Z

Assuming that the two samples have the same origin, they must have the same clonal structure (tree).
Imagine a variant with VAF=60% (no CN) will be estimated to have CCF=100% by pyclone, thus when converting back to VAF giving you scaled VAF=50%. If you divide VAF by 2 prior to pyclone analysis (instead of dividing after pyclone analysis), then use CCF from pyclone as VAF, you'll get scaled VAF = 60%. Let me know if this works. I am happy to dig deeper to find the best strategy for analyzing pyclone results.

crazyhottommy · 2017-04-05T16:06:21Z

My data have a lot of copy number changes, so I did not use sciclone. For pyclone, one has to provide both the tumor purity (inferred from other tools such as sequenza) and the VAF to the tool. so the end results of CFF always ranges from 0 to 1.

hdng · 2017-04-06T16:45:03Z

What CCF range you'll get if you specify tumor purity of 100% and also divide VAF by 2?

crazyhottommy · 2017-04-07T03:05:55Z

CCF will be always from 0 to 1. pyclone uses the tumor purity and copynumber information in addition to VAF to identify clusters.

hdng · 2017-04-07T13:52:42Z

Yes it will be from 0-1, but do you see a smooth range when plotting or a hard cut at 1? If it is smooth, then you can use clonevol.

hdng mentioned this issue May 18, 2017

too many models across samples #6

Closed

hdng added the help wanted label May 30, 2017

hdng mentioned this issue Jun 30, 2017

NA error #2

Open

This was referenced Jun 2, 2018

Pyclone, cellular_prevalence, variant_allele_frequency #21

Closed

questions on the tree #22

Closed

hdng closed this as completed Jun 21, 2018

hdng mentioned this issue Mar 23, 2019

question: specifying time-points & working with low VAFs #28

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to set founding.cluster? #4

How to set founding.cluster? #4

crazyhottommy commented Feb 19, 2017

hdng commented Feb 19, 2017 •

edited

Loading

crazyhottommy commented Feb 19, 2017

hdng commented Feb 20, 2017

crazyhottommy commented Feb 20, 2017

hdng commented Feb 20, 2017

crazyhottommy commented Feb 20, 2017

hdng commented Feb 23, 2017

crazyhottommy commented Feb 23, 2017

hdng commented Mar 2, 2017

crazyhottommy commented Mar 15, 2017

hdng commented Mar 25, 2017

crazyhottommy commented Apr 5, 2017 •

edited

Loading

hdng commented Apr 6, 2017

crazyhottommy commented Apr 7, 2017

hdng commented Apr 7, 2017 •

edited

Loading

How to set founding.cluster? #4

How to set founding.cluster? #4

Comments

crazyhottommy commented Feb 19, 2017

hdng commented Feb 19, 2017 • edited Loading

crazyhottommy commented Feb 19, 2017

hdng commented Feb 20, 2017

crazyhottommy commented Feb 20, 2017

hdng commented Feb 20, 2017

crazyhottommy commented Feb 20, 2017

hdng commented Feb 23, 2017

crazyhottommy commented Feb 23, 2017

hdng commented Mar 2, 2017

crazyhottommy commented Mar 15, 2017

hdng commented Mar 25, 2017

crazyhottommy commented Apr 5, 2017 • edited Loading

hdng commented Apr 6, 2017

crazyhottommy commented Apr 7, 2017

hdng commented Apr 7, 2017 • edited Loading

hdng commented Feb 19, 2017 •

edited

Loading

crazyhottommy commented Apr 5, 2017 •

edited

Loading

hdng commented Apr 7, 2017 •

edited

Loading