Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to set founding.cluster? #4

Closed
crazyhottommy opened this issue Feb 19, 2017 · 15 comments
Closed

How to set founding.cluster? #4

crazyhottommy opened this issue Feb 19, 2017 · 15 comments

Comments

@crazyhottommy
Copy link

Hi,
I have several questions on how to set the parameters for the infer.clonal.models function.
First, the help page does not explain all the arguments.
e.g. how do I set the founding.cluster?
in the github page example, there is a cluster.center argument, but I do not see it in the help.

How many times of bootstrap usually is enough? default is 1000, and for around 300 mutations, it is taking several hours to run, and not finish yet..

Thanks for giving more information.

Best,
Tommy

@hdng
Copy link
Owner

hdng commented Feb 19, 2017

Assuming this is monoclonal cancer initiation, the founding.cluster is often the cluster with the highest CCF (or VAF) in most (or in ideal situation, all) samples. 1000 bootstraps is reasonable for first run. The number of samples and the number of clusters affect the performance time more. How many clusters/samples do you have for your case? You may need some clean-up (eg. removing small or noise clusters). Also, are there many clusters with very small CCF? That may increase the number of models to consider.

@crazyhottommy
Copy link
Author

Thanks! I only have two samples, but do have smaller clusters. I removed the cluster with very few mutations (what's the cut-off? , I removed clusters with only 1 mutation).

it finished much sooner, but none model were found even at p < 0.1.

what is alpha for?

x <- infer.clonal.models(variants= clonevol_input,
            cluster.col.name="cluster",
            model = "polyclonal",
            vaf.col.names=vaf.col.names,
            subclonal.test="bootstrap",
            subclonal.test.model="non-parametric",
            cluster.center="mean",
            num.boots=1000,
            founding.cluster= NULL,
            min.cluster.vaf=0.01,
            p.value.cutoff= 0.1,
            alpha= 0.5,
            random.seed=63108)


Sample 1: Pa26T1.vaf <-- Pa26T1.vaf
Sample 2: Pa26T2.vaf <-- Pa26T2.vaf
Using polyclonal model
Note: all VAFs were divided by 100 to convert from percentage to proportion.
Generating non-parametric boostrap samples...
Pa26T1.vaf : Enumerating clonal architectures...
Determining if cluster VAF is significantly positive...
Exluding clusters whose VAF < min.cluster.vaf=0.001
Non-positive VAF clusters:  
Pa26T1.vaf : 910 clonal architecture model(s) found

Pa26T2.vaf : Enumerating clonal architectures...
Determining if cluster VAF is significantly positive...
Exluding clusters whose VAF < min.cluster.vaf=0.001
Non-positive VAF clusters:  
Pa26T2.vaf : 376 clonal architecture model(s) found

Finding matched clonal architecture models across samples...
Found  0 compatible model(s)
Found 0 compatible evolution models
Pruning merged clonal evolution trees....
Number of unique pruned trees: 0 
Scoring models...
0 model(s) with p-value <= 0.1 

***WARN: Inra-tumor heterogeneity could result in a clone (eg. founding)
         that is not present (no cells) in any samples, although  detectable via
         clonal marker variants due to that its subclones are distinct across
         samples. Therefore, a model with a higher p-value for the CCF of such
         a clone can still be biologically consistent, interpretable, and
         interesting! Manual investigation of those higher p-value models
         is recommended

@hdng
Copy link
Owner

hdng commented Feb 20, 2017

The cutoff depends on how confident you are on the clusters, eg. if your data are deep enough such that you can trust a cluster of >=5 variants, then use 5. However, this is a bit arbitrary though, and there is no formula for this. I would visualize the clusters first (eg. a pairwise plot of CCF of the variants grouped by clusters for the two samples), and check if the clusters look reasonable. Also, it seems that your scaled VAF is from 0-1. The default for clonevol is 0-100 (percentage). This probably does not affect the underlying inference, but it is good to use 1-100 or set vaf.in.percent=FALSE when calling infer.clonal.models. You should also set alpha and p.value.cutoff the same, and try lower value, eg. 0.01.

@crazyhottommy
Copy link
Author

Thanks! I did scale variant_prevalence by *100/2 to get VAF ranging from 0-50.
so alpha and p.value should be the same? what alpha is?

Even after playing with different arguments, I still do not have any matched clonal architecture models across samples. what should I do with them?

I know the paper is under review, I would like to read to understand more of the tool.

Thanks,
Tommy

@hdng
Copy link
Owner

hdng commented Feb 20, 2017

Visualization of the clusters would help identifying problems with the clustering, eg. is there an outlier or abnormal cluster that conflicts between samples or shows sign of polyclonal cancer initiation? Clonevol does require a reasonably good clustering to start with. Alpha is the significant threshold to identify if a clone exists in a sample. I am working on the manual and technical document (takes too long already sorry), and the manuscript should be posted on bioRxiv soon too. Stay tuned. Thanks.

@crazyhottommy
Copy link
Author

looking forward to read it.
at the same time, I attached a cluster VAF scatter plot. clonevol does not output any model, you see any problems with it?
Thanks!

Tommy

Pa30T1_vs_Pa30T2.pdf

@hdng
Copy link
Owner

hdng commented Feb 23, 2017

Looks a little odd. Is this the scaled VAF based on CCF estimated via Pyclone? Should the highest VAF be near 50%?

@crazyhottommy
Copy link
Author

sorry, when plot the VAF scatter plot, I used the raw output from pyclone,
I did not *100/2 to get the scaled VAF.
but if I scale it, it will be the same pattern but just different scale on X and Y axis.

However, when I put the data into clonevol, I did scale to 0-50.
So, the pattern of the VAF scatter plot look odd ?

Thanks very much for your help!

@hdng
Copy link
Owner

hdng commented Mar 2, 2017

Sorry for slow response. I think some models can be inferred, but you need a tweak.

  1. You can try plotting the models using plot.clonal.models function. If no matched model is there, clonevol will plot all models for individual samples. This helps debugging what clusters are not compatible across samples.

  2. Looks like pyclone put a hard cut on the CCF at 100% due to their statistical model. It seems that tumor purity does not affect their model, so I think if you divide your original VAF by 2 to fake that the tumor purity is always < 50%, thus posterior CCF estimated by pyclone is maxed at 50% (equivalent to 100%), while still allowing some to go over 50%, this would make the scaled VAF looks reasonable, and you can multiply scaled VAF by 2 to get back to the original purity, and run clonevol. Don't forget to plot CCF or VAF like you did.

Are their many copy altered variants? If not, you can try to cluster by SciClone as well.

@crazyhottommy
Copy link
Author

Thank you for your answer.

  1. those two samples are from the same tumor of the same patient , so what's wrong if no compatible models are inferred? I plot the models individually. please see attachment. how to explain it?

  2. you are saying I need to divide the VAF (calculated from samtools mpileup) by 2 before feeding into pyclone? then multiply pyclone output CCF by 2 to run clonevol? what's the advantage to do this?

I also attached the scaled VAF (pyclone CCF *100/2) for you to test if you can.

Thanks again!

model-Pa30T2.vaf.pdf
model-Pa30T1.vaf.pdf
clonevol_input.txt

@hdng
Copy link
Owner

hdng commented Mar 25, 2017

  1. Assuming that the two samples have the same origin, they must have the same clonal structure (tree).
  2. Imagine a variant with VAF=60% (no CN) will be estimated to have CCF=100% by pyclone, thus when converting back to VAF giving you scaled VAF=50%. If you divide VAF by 2 prior to pyclone analysis (instead of dividing after pyclone analysis), then use CCF from pyclone as VAF, you'll get scaled VAF = 60%. Let me know if this works. I am happy to dig deeper to find the best strategy for analyzing pyclone results.

@crazyhottommy
Copy link
Author

crazyhottommy commented Apr 5, 2017

My data have a lot of copy number changes, so I did not use sciclone. For pyclone, one has to provide both the tumor purity (inferred from other tools such as sequenza) and the VAF to the tool. so the end results of CFF always ranges from 0 to 1.

@hdng
Copy link
Owner

hdng commented Apr 6, 2017

What CCF range you'll get if you specify tumor purity of 100% and also divide VAF by 2?

@crazyhottommy
Copy link
Author

CCF will be always from 0 to 1. pyclone uses the tumor purity and copynumber information in addition to VAF to identify clusters.

@hdng
Copy link
Owner

hdng commented Apr 7, 2017

Yes it will be from 0-1, but do you see a smooth range when plotting or a hard cut at 1? If it is smooth, then you can use clonevol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants