-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tumor purity estimate is NA #66
Comments
Purity is NA because there is no copy number change in the sample. Have you looked at the plot? |
NO. It can be anything from 0 to 100%. If the "tumor" didn't have any tumor cells at all then there will be no copy number change and the purity is zero. On the other hand if the tumor is driven purely by mutation or epigenetics then there won't be any copy number change and the purity can be any number. |
Yes there is no copy number change. But I have also seen cases with purity estimate when ploidy = 2.018... |
Ploidy is just the average copy number. If there is a 1 copy loss spanning 50 megabases in one chromosome and an offsetting 1 copy gain of 50 megabases in a different chromosome then ploidy will be 2 and purity will be estimable. That is not the case when there is no change. |
Thanks for the explanation. So when there's not copy number changes, can FACETS using other metrics (BAF, etc) to estimate purity? |
FACETS only deals with het SNPs present in the normal. So BAF won't help. If there are mutations you can use that to estimate purity. |
Sorry can you elaborate on "If there are mutations you can use that to estimate purity."? And do you mean using FACETS or develop my own way? Thanks. |
Generate a MAF file using a mutation pipeline (say mutect). Then the mutant allele frequency of calls with high confidence will be an estimate of 0.5*purity since most mutations happen in only one chromosome and no copy number change means copy number is 2 everywhere. More reliable if the depth at the mutant location is high. |
But if the mutation is subclonal, how can that be used to accurately estimate the purity? |
Then purity cannot be estimated with available data. |
Makes sense. Thanks. BTW, just want to confirm I provided the right vcf file in the first snp-pileup step: I use the vcf file generated from the tumor sample only (initially I tried to use somatic.vcf but with very few het loci FACETS couldn't proceed correctly): Does the approach look correct to you? Thanks. |
That is the wrong vcf to use. You need a vcf from dbsnp or 1000 genome. FACETS needs the locations of known polymorphic loci in the population. |
So you mean the vcf only serves to define known polymorphic loci in a population instead of the patient specific polymorphic loci (aka, for different tumor/normal paired sequencing, I can reuse this ONE vcf for snp-pileup)? If so, can you provide a link for such vcf? Thanks. |
The dbSnp files are here ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/VCF/ |
That's very helpful, and I guess I should use common_all (MAF >= 0.01) version for FACETS? |
Yes. |
Thanks. Now thinking back, I am wondering why FACETS needs a population-based polymorphic site file instead of the patient's own germline polymorphic file (aka a vcf called from normal.bam)? Would the latter be more accurate? |
This question is beyond the scope of this forum in which issues regarding the software are discussed not the methodology. |
Thanks. I downloaded the common_all vcf from the ftp portal, however by loose counting the unique positions in the file, there's 37M lines instead of ~1.9M that's mentioned in the tutorial or paper. Anything I missed here? |
1.9 million are ones that fall within WES targets. The VCF file has lot more. Phase 3 release of 1000 genomes for instance has ~80 million. |
That makes more sense now. Thank you for your quick reply, Dr! |
@trptyrphe11 |
Hi,
I have successfully run FACETS on several samples. However for my latest sample, I got tumor purity estimate is NA.
[1] "tumor purity estimated:NA"
[1] "tumor ploidy estimated:2"
I inspected that I have enough het loc, so don't know what's the reason. Thanks.
sum(oo$jointseg$het): 28816
nrow(oo$jointseg): 283433
The text was updated successfully, but these errors were encountered: