Is NormalizeData(LogNormalize) not generating RPM values? #185

dmoaks · 2017-10-20T17:58:33Z

I was hoping to ask about the exact calculation happening in NormalizeData under the LogNormalize option, as it is not explicitly stated in the documentation. I’m trying to obtain RPM/TPM values for the data table, but this function seems to be dividing every cell by the same number rather than a cell-specific scaling factor. What does the “scale factor” variable do in NormalizeData? How do I normalize to the expression of the individual cell?

thbin · 2017-10-20T19:21:19Z

The following CPP codes are the implementation for the LogNormalize.
The normalize algorithm is log1p(value/colSums[cell-idx] *scale_factor), and colSums[cell-idx] is total expression value for each cell.

Eigen::SparseMatrix<double> LogNorm(Eigen::SparseMatrix<double> data, int scale_factor, bool display_progress = true){
  Progress p(data.outerSize(), display_progress);
  Eigen::VectorXd colSums = data.transpose() * Eigen::VectorXd::Ones(data.cols());
  for (int k=0; k < data.outerSize(); ++k){
    p.increment();
    for (Eigen::SparseMatrix<double>::InnerIterator it(data, k); it; ++it){
      data.coeffRef(it.row(), it.col()) = log1p(double(it.value()) / colSums[k] * scale_factor);
    }
  }
  return data;
}

jcamunas · 2017-11-12T02:45:26Z

I was also facing a similar doubt with LogNormalize. I understand from this answer that you are actually using the natural logarithm (not log10 or log2) when doing log1p. Could you confirm this point? Thanks!

leonfodoulian · 2017-11-12T06:58:55Z

@jcamunas : log1p(x) is equivalent to log(x + 1).

Best,
Leon

iiiir · 2018-01-09T21:23:13Z

This might be a naive question: why the log transformed matrix, still having median/mean ~1700 (UMIs), compared to before normalization median of ~2200.
I am referring to the PBMC 2700 dataset, the @DaTa slot of teh seurat objects.

jon-xu · 2020-09-11T06:29:37Z

@iiiir the unit of measure has been different.

satijalab closed this as completed Oct 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is NormalizeData(LogNormalize) not generating RPM values? #185

Is NormalizeData(LogNormalize) not generating RPM values? #185

dmoaks commented Oct 20, 2017

thbin commented Oct 20, 2017 •

edited

jcamunas commented Nov 12, 2017

leonfodoulian commented Nov 12, 2017

iiiir commented Jan 9, 2018

jon-xu commented Sep 11, 2020

Is NormalizeData(LogNormalize) not generating RPM values? #185

Is NormalizeData(LogNormalize) not generating RPM values? #185

Comments

dmoaks commented Oct 20, 2017

thbin commented Oct 20, 2017 • edited

jcamunas commented Nov 12, 2017

leonfodoulian commented Nov 12, 2017

iiiir commented Jan 9, 2018

jon-xu commented Sep 11, 2020

thbin commented Oct 20, 2017 •

edited