Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-sample CNV segmentation and events #60

Closed
tedtoal opened this issue Nov 19, 2018 · 5 comments
Closed

Multi-sample CNV segmentation and events #60

tedtoal opened this issue Nov 19, 2018 · 5 comments
Labels

Comments

@tedtoal
Copy link

tedtoal commented Nov 19, 2018

I'm working with multiple samples per person, and want to identify CNV events in common between samples. I started by just trying to identify CNV segment calls that were in common. One problem with that is that the segment boundaries might be close but not identical, and I'd really like to use multi-sample segmentation to align boundaries between samples. However, it is not so easy to do that even though PureCN provides for external segmentation, because I essentially need to run the first part of PureCN on all the samples, then run the multi-sample segmentation, then run the last part of PureCN. It would be nice if PureCN.R could be broken up into more individual steps so that that would be possible. I ended up simply adjusting the position of segment edges when they were closer than some distance (I used 500 Kbp) to the midpoint between the two. Not ideal but I think not too bad of a compromise.

Also, I decided that CNV segments are not good representations of CNV events. For example, two events, one in the middle of the other, would create 3 segments. I tried finding software that could analyze PureCN segment calls and produce CNV event calls, but I found nothing. Then I had an idea that I believe is reasonably sound, but want to run it past you to see if you see any big problems with it. The idea is that generally one would not expect the edges of two separate CNV events to align, or at least not often. So, each edge of a CNV segment could be taken as one CNV event, which might overcount CNV events by a factor of 2, but I think that is less of a problem than trying to untangle common CNV segments between samples when other CNV events might intervene in some of those samples.

In addition, I wanted to do a statistical test of the markers on each side of each segment edge, in EACH SAMPLE whether or not a segment edge was called at that position. So, I took all PureCN markers on each side of each segment edge, up to either the segment boundary on the other side, or 5 Mbp, whichever was smaller, adjusted their copy ratios for purity and ploidy, then tested the two sets of copy ratios for a difference in means greater than 0.1. I was hoping for a clear separation of each sample into YES edge is present or NO edge is not present, but it seems that there are a lot of cases where the p-value for edge present is not all that small but is small enough to not allow me to say the edge is not present. This might be because of subclonal presence of the CNV at a small frequency in the samples where it seems to be not present. I'm working now on examining results of this. Do you see any big problems with this method? If the method is reasonable, it would be nice if PureCN had an "edge" output, with p-values.

@lima1
Copy link
Owner

lima1 commented Nov 19, 2018

Are you referring to the new multi-sample segmentation described in section 10.1.3 of https://bioconductor.org/packages/devel/bioc/vignettes/PureCN/inst/doc/PureCN.pdf?

There is no command line tool yet, but it does not need the output of PureCN.R. Run Coverage.R on all tumor samples, and then call processMultipleSamples in R for each patient. You can then concatenate all of the output segmentations into one file that includes all samples. Then simply run PureCN.R as you normally would, simply add --segfile and --funsegmentation Hclust.

In general, all breakpoints PureCN reports are highly significant, but technical issues can affect many probes and thus resulting in low p-values. DNAcopy is doing a pretty good job (meaning it's probably hard to clean up downstream), but in your case, the multi-sample segmentation might indeed provide significant improvements.

@tedtoal
Copy link
Author

tedtoal commented Nov 19, 2018 via email

@lima1
Copy link
Owner

lima1 commented Nov 20, 2018

The segementationHclust function will cluster segments and join consecutive ones when they are in the same cluster. It’s not perfect, but should clean it up a bit.

@tedtoal
Copy link
Author

tedtoal commented Nov 20, 2018 via email

@lima1
Copy link
Owner

lima1 commented Nov 20, 2018

You'll need a recent GitHub or Bioconductor devel for that. There weren't any changes to the likelihood model since 1.10 or in general no major changes, so should be a smooth update.

See https://github.com/lima1/PureCN/blob/master/NEWS

@lima1 lima1 added the question label Dec 8, 2018
@lima1 lima1 closed this as completed Dec 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants