Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FeatureMatrix() should count cut sites? #1369

Open
nesetozel opened this issue Apr 4, 2023 · 2 comments
Open

FeatureMatrix() should count cut sites? #1369

nesetozel opened this issue Apr 4, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@nesetozel
Copy link

Dear Tim,

I noticed that when I generated a new peak/barcode matrix with an expanded peak set using Signac, I got significantly lower nCounts_ATAC values compared to the original one I had from CellRanger. I was very confused till I found this:
#1119
First, I was wondering if there was a reason for this choice (counting fragments rather than cut sites)? I imagine it probably doesn't make a lot of difference especially with anything downstream of LSI. But cut sites still sound like a better proxy for accessibility for me, so why wouldn't we want to count a fragment twice for a peak if it falls completely within it (as opposed to one whose other end is outside of the peak)?
Would it be possible to add this as an option for this function in a future release?
I also wanted mention I've been using Signac for quite a while and I didn't really know about this until now. Considering a lot of your users probably use CellRanger to get their data, it may be good to clarify this behavior (and the fact that it's different from CellRanger) better in the documentation, which to me isn't obvious right now.

Thank you,
Neset

@nesetozel nesetozel added the enhancement New feature or request label Apr 4, 2023
@timoast
Copy link
Collaborator

timoast commented Apr 10, 2023

Hi, I agree we can be clearer on the documentation here. Going forward, we would like to expose a parameter to enable users to decide on a counting method. There are other approaches that may be better, for example paired insertion counting: https://www.biorxiv.org/content/10.1101/2022.04.20.488960v1

@nesetozel
Copy link
Author

Thanks Tim, that would be very useful! I'll look into ArchR meanwhile to do this.

PIC would seem to address the most obvious issue with counting fragments, which are the rare cases of long fragments both of whose ends are actually outside a peak and should obviously not be counted at all. But to be honest it's still not clear to me why in the case of counting insertions this "artifact of depleted odd numbers" is a problem or why it is even considered an artifact. This paper does argue the data is easier to model with Poisson when most counts are 1s rather than 2s (which is what you get when counting insertions). I can see how this may improve some downstream applications but I'm still worried that the aggregate counts will be skewed especially for anything that involves pseudobulking the data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants