Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLR normalization and scaling #1268

Closed
benslack19 opened this issue Mar 22, 2019 · 6 comments
Closed

CLR normalization and scaling #1268

benslack19 opened this issue Mar 22, 2019 · 6 comments
Labels
more-information-needed We need more information before this can be addressed

Comments

@benslack19
Copy link

benslack19 commented Mar 22, 2019

Hi Seurat team,

I'm using multi-processing output from CITE-seq count as input to Seurat v3. I used CLR normalization on my ADT counts, but I'm wondering about an observation I'm seeing with regards to the distribution of noisy signals. From left to right in the image below, I plotted a simple distribution of raw counts, the CLR normalized values from Seurat and then values where CLR is calculated manually.

image

You can see that raw counts are heavily right-tailed skewed… not a huge surprise. The Seurat version makes all values positive and has some right skewness, ranging from about 0 to 10 (if you estimate from the x-axis of the plot). However, antibodies that have low signal (the majority) get fattened in this transformation. This okay for the very high expressing antibodies, but for some that have a real but more subtle positive signal, it would get lost in the noise. The manually calculated CLR you can see has a similar range as the Seurat normalization, but you can see that the distribution of the noise is thinner, allowing positive values in the right tail come out. (Some values are less than 0 but I think that’s okay.)

It appears that Seurat applies a scaling factor that brings up the noise of antibody signals which would be otherwise low in the manually calculated CLR normalization. Can you provide more insight on what the scaling factor is doing and possibly comment on our data? (I can't see what the normalization function is doing). Your insights would be much appreciated.

Thanks,
Ben

@satijalab
Copy link
Collaborator

What function are you using to calculate CLR?

@mojaveazure mojaveazure added the more-information-needed We need more information before this can be addressed label Apr 26, 2019
@satijalab
Copy link
Collaborator

I'm closing this issue now as we have not heard back, but please note that you can see the exact code that we use when running CLR normalization in the NormalizeData.default function.

@massonix
Copy link

massonix commented Oct 9, 2020

Hi,

I am also concerned in the way CITE-seq data is normalized. Here you can see how the scanpy team is tackling the issue, although they do not provide much detail. Do you have any updates on the current best practices?

Thanks a lot!

@ttriche
Copy link

ttriche commented Jul 6, 2021

It looks like the inverse used for log1p is exp instead of expm1, which is going to cause problems eventually:

clr_function <- function(x) {
                  return(log1p(x = x/(exp(x = sum(log1p(x = x[x > 
                    0]), na.rm = TRUE)/length(x = x)))))
                }

@ttriche
Copy link

ttriche commented Jul 6, 2021

an invertible version of the Seurat CLR is as follows:

cl1pr <- function(x) {
  log1p(x=x/(expm1(x=sum(log1p(x=x[x > 0]), na.rm=TRUE)/length(x=x))))
}

This gives roughly the same results as the stock Seurat CLR but has the benefit of being invertible if needed, given ADT counts.

@ttriche
Copy link

ttriche commented Jul 6, 2021

Note that the corrected nonnegative CLR is roughly interchangeable with stock (plotted on some dendritic cells).
centeredLogRatioTransforms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
more-information-needed We need more information before this can be addressed
Projects
None yet
Development

No branches or pull requests

4 participants