Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is NormalizeData(LogNormalize) not generating RPM values? #185

Closed
dmoaks opened this issue Oct 20, 2017 · 5 comments
Closed

Is NormalizeData(LogNormalize) not generating RPM values? #185

dmoaks opened this issue Oct 20, 2017 · 5 comments

Comments

@dmoaks
Copy link

dmoaks commented Oct 20, 2017

I was hoping to ask about the exact calculation happening in NormalizeData under the LogNormalize option, as it is not explicitly stated in the documentation. I’m trying to obtain RPM/TPM values for the data table, but this function seems to be dividing every cell by the same number rather than a cell-specific scaling factor. What does the “scale factor” variable do in NormalizeData? How do I normalize to the expression of the individual cell?

@thbin
Copy link

thbin commented Oct 20, 2017

The following CPP codes are the implementation for the LogNormalize.
The normalize algorithm is log1p(value/colSums[cell-idx] *scale_factor), and colSums[cell-idx] is total expression value for each cell.

Eigen::SparseMatrix<double> LogNorm(Eigen::SparseMatrix<double> data, int scale_factor, bool display_progress = true){
  Progress p(data.outerSize(), display_progress);
  Eigen::VectorXd colSums = data.transpose() * Eigen::VectorXd::Ones(data.cols());
  for (int k=0; k < data.outerSize(); ++k){
    p.increment();
    for (Eigen::SparseMatrix<double>::InnerIterator it(data, k); it; ++it){
      data.coeffRef(it.row(), it.col()) = log1p(double(it.value()) / colSums[k] * scale_factor);
    }
  }
  return data;
}

@jcamunas
Copy link

I was also facing a similar doubt with LogNormalize. I understand from this answer that you are actually using the natural logarithm (not log10 or log2) when doing log1p. Could you confirm this point? Thanks!

@leonfodoulian
Copy link
Contributor

@jcamunas : log1p(x) is equivalent to log(x + 1).

Best,
Leon

@iiiir
Copy link

iiiir commented Jan 9, 2018

This might be a naive question: why the log transformed matrix, still having median/mean ~1700 (UMIs), compared to before normalization median of ~2200.
I am referring to the PBMC 2700 dataset, the @DaTa slot of teh seurat objects.
rplot15

@jon-xu
Copy link

jon-xu commented Sep 11, 2020

@iiiir the unit of measure has been different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants