Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low nhet and lcn.em NA issues #50

Open
fangxiaolan opened this issue Sep 21, 2017 · 10 comments
Open

Low nhet and lcn.em NA issues #50

fangxiaolan opened this issue Sep 21, 2017 · 10 comments

Comments

@fangxiaolan
Copy link

fangxiaolan commented Sep 21, 2017

Hi, I'm analyzing a few old TCGA WXS data (done by NimbleGen/hg18 and VCrome/hg19), and all 8 samples have extremely low nhet (0 or 1) and lcn.em is NA. We were suspicious that it might be coverage problem, and seems that the coverage are all 150-200X, so this is ruled out. Any suggestions what might cause this problem? 4 samples are Breast cancer samples and the other 4 are colorectal cancer samples, which are not likely to have extremely low SNVs. The cluster number is ~30 so it should not be over-fragmented either (just read issue ticket#39). FYI, for other samples using SureSelect/GRCh37 capture kit, everything is fine. Thanks!

@veseshan
Copy link
Collaborator

If the coverage for those samples is indeed 150x then you should get around 250k SNPs in the analysis of which 8-10% will be hets. You can check this by looking at the numbers sum(oo$jointseg$het) and nrow(oo$jointseg) where oo is the name of the object procSample returns. Seems like you don't have a hyperfragmented sample either. If you can provide the TCGA ID of the samples I can check if we have the results of those at our end.

Venkat

@fangxiaolan
Copy link
Author

fangxiaolan commented Sep 22, 2017

following is one of the tumor examples:
8e5f741c-996c-4b44-84c4-c9e9e5529944/TCGA-E2-A15A-01A-11D-A12B-09_IlluminaGA-DNASeq_exome_gdc_realn.bam
Normal control is:
a2d7ab5a-935c-4b96-bf38-1891fa437922/TCGA-E2-A15A-10A-01D-A12B-09_IlluminaGA-DNASeq_exome_gdc_realn.bam

And the coverage for tumor is 259X and normal is 287X. Both are WXS samples.

@fangxiaolan
Copy link
Author

fangxiaolan commented Sep 22, 2017

I'm not sure how to check sum(oo$jointseg$het) and nrow(oo$jointseg), and are those metrics included in one of the data files resulting from FACETS analysis? I checked the procSample-jseg file for this sample and there are 8856 segments, yet het for all segments is 0. I can send you the file if that helps. Let me know.

@veseshan
Copy link
Collaborator

Any time you have more than 300-400 segments the sample is hyperfragmented. Yours with 8856 is certainly. So try increasing cval to see if it helps.

From the vignette the steps for running facets are:

rcmat = readSnpMatrix(datafile)
xx = preProcSample(rcmat)
oo=procSample(xx,cval=150)

So you can issue sum(oo$jointseg$het) and nrow(oo$jointseg) in R command line right after.

Venkat

@fangxiaolan
Copy link
Author

I'm not sure whether the segments are consistent through the files. In the FACETS_heterogeneity_cncf_EM file the segment number is 33. procSample-jseg file has 8856 objects, which I assume are segments as well? Just want to clarify and make sure. Thanks!

@veseshan
Copy link
Collaborator

Hyperfragmentation should be based on segmentation only and hence prior to EM. Multiple segments that look similar are grouped together into clusters.

@fangxiaolan
Copy link
Author

No cluster was grouped in this case as reported in the title. lcn.em is all NA. That's the issue we want to solve...

@veseshan
Copy link
Collaborator

Hyperfragmented samples are a bad starting point. Nothing can be done to get reasonable results from them.

@andyjslee
Copy link

Is the number of segments given by nrow(oo$jointseg)?

@veseshan
Copy link
Collaborator

No. That is the number of loci used in the analysis. The number of segments is nrow(oo$out) for the procSample output or nrow(oo$cncf) for the emcncf output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants