Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency of Peak Significance on Peak width #18

Closed
smanne07 opened this issue Aug 29, 2019 · 4 comments
Closed

Dependency of Peak Significance on Peak width #18

smanne07 opened this issue Aug 29, 2019 · 4 comments

Comments

@smanne07
Copy link

Hi Tim,
Is there any way in Signac where the peak width can be taken as a factor while calculating the differential peaks or in the upstream normalization so that we can minimize the effect of peak width.

Best regards
Sasi

@timoast
Copy link
Collaborator

timoast commented Aug 30, 2019

Hi Sasi,

For differential peak accessibility, the peak size will not matter as you're comparing the accessibility of the same peak across different cells. For upstream normalization, you could potentially divide the counts by peak width before running LSI to normalize for different peak sizes. I have not tested this extensively, but in testing with the mouse brain dataset this had no effect that I could see on the resulting embeddings.

If you think this could be an issue for your dataset then I'd encourage you to try correcting for peak width and comparing with results without correction, and would be interested to hear your feedback.

Here is an example showing how to correct for peak width and comparing with/without this correction:

library(Signac)
library(Seurat)
library(ggplot2)

# using data from the mouse brain vignette as an example
brain <- readRDS("brain.rds")

# first compute the length of each peak
brain <- RegionStats(
  object = brain,
  genome = BSgenome.Mmusculus.UCSC.mm10,
  sep = c(":", "-")
)

# extract the width of each peak
peak.width <- GetAssayData(brain, assay = 'peaks', slot = 'meta.features')$sequence.length

# extract the raw counts
counts <- GetAssayData(brain, assay = 'peaks', slot = 'counts')

# divide the counts for each peak by the width of the peak
norm.counts <- counts / peak.width

# create a new assay with the normalized counts and perform the same processing 
brain[['normPeaks']] <- CreateAssayObject(counts = norm.counts)
DefaultAssay(brain) <- 'normPeaks'
brain <- RunTFIDF(brain)
brain <- FindTopFeatures(brain, min.cutoff = 'q5')
brain <- RunSVD(brain, n = 50, reduction.key = 'LSI_', reduction.name = 'lsiNorm')
brain <- RunUMAP(brain, dims = 1:50, reduction = 'lsiNorm', reduction.name = 'umapNorm')

# compare UMAP embeddings for normalized vs raw counts
p1 <- DimPlot(brain, reduction = 'umapNorm', label = TRUE) + NoLegend() + ggtitle('Length-normalized peak counts')
p2 <- DimPlot(brain, reduction = 'umap', label = TRUE) + NoLegend() + ggtitle('Raw peak counts')

cowplot::plot_grid(p1, p2)

peak_norm

Above shows a UMAP computed using length-normalized counts (left) and raw counts (right), with cells colored the same in both plots. I can't see much of a difference, but this is just one case. If you can show an example where this makes a difference I'd can add a function to perform this normalization in Signac.

Tim

@smanne07
Copy link
Author

Hi Tim,
Thank you for the detailed explanation,will try the peak width correction and see how it impacts our data.

We found that peaks with more width are significant than the smaller peaks.

The distribution of widths of differential peaks is skewed towards right.

Best regards
Sasi

@timoast
Copy link
Collaborator

timoast commented Aug 30, 2019

That's interesting, I think this can be a complicated issue because there are likely technical and biological factors involved. Longer peaks will tend to have higher counts (since there are more chances to overlap a read), and so potentially you have higher power to detect differential accessibility. I will have to think more about this, but thanks for bringing it up.

@smanne07
Copy link
Author

Yes ,will check and update if the peak width correction is helping .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants