Compute log fold change with linear model #1035

danielcgingerich · 2022-03-10T16:50:13Z

One thing that has always bugged me about the way fold change is calculated is that it is not adjusted for latent variables. Only the p value calculation controls for confounding variables. Referring to the 'LR' test commonly used for scATAC data

Why not use a linear model to calculate fold change?

formula might look something like:

log(peak) ~ group + latent.vars

perhaps also applying the log transformation to the latent variables as well.

group is the categorical variable you are testing, and latent.vars are confounders. If I understand correct, the beta coefficient for group would therefore be the logFC of the two groups adjusted for confounders. I believe the MAST DE test does this for scRNA

The text was updated successfully, but these errors were encountered:

danielcgingerich · 2022-04-12T14:29:30Z

@timoast any word on this? Another person at my work recently asked me the same question. No rush!

timoast · 2022-04-12T19:27:21Z

I think what you suggest is very reasonable, although it would not be the fold change so I wouldn't necessarily replace the fold change column in the FindMarkers output with this. I don't have bandwidth to work on this right now but you could make a PR for Seurat if you're interested in implementing it, or you could make a separate function in a new package

danielcgingerich · 2022-04-14T21:21:57Z

@timoast Hey Tim, thanks for getting back to me. I am a little confused by how it would not be a log fold change. If the dependent variable is log transformed, and the predictor variable is binary (0 or 1), then the slope should be the log fold change. This would be the case for modeling log(TF-IDF count) ~ disease_status. Maybe I am misunderstanding something - would like to know your thoughts!

timoast · 2022-04-14T21:30:22Z

Fold change is defined as the ratio between the two counts, this would be slightly different if it took latent variables into account, that's all I mean. Without latent variables in the model I agree it's the same

danielcgingerich · 2022-04-14T21:33:18Z

@timoast Gotcha, thank you!

Would you say it is more common for researchers to report fold change as the unadjusted ratio between two values, or to adjust for latent variables as well? Not strictly referring to single cell but other bioinformatic fields too

timoast · 2022-04-14T21:46:31Z

I haven't seen many cases where people adjust fold change for latent variables, but I don't see any problem with reporting that as long as it's clear what's being calculated. IMO, calling it "fold change" most people would probably assume that means a simple ratio between two values

danielcgingerich added the enhancement New feature or request label Mar 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compute log fold change with linear model #1035

Compute log fold change with linear model #1035

danielcgingerich commented Mar 10, 2022 •

edited

danielcgingerich commented Apr 12, 2022

timoast commented Apr 12, 2022 •

edited

danielcgingerich commented Apr 14, 2022 •

edited

timoast commented Apr 14, 2022

danielcgingerich commented Apr 14, 2022 •

edited

timoast commented Apr 14, 2022

Compute log fold change with linear model #1035

Compute log fold change with linear model #1035

Comments

danielcgingerich commented Mar 10, 2022 • edited

danielcgingerich commented Apr 12, 2022

timoast commented Apr 12, 2022 • edited

danielcgingerich commented Apr 14, 2022 • edited

timoast commented Apr 14, 2022

danielcgingerich commented Apr 14, 2022 • edited

timoast commented Apr 14, 2022

danielcgingerich commented Mar 10, 2022 •

edited

timoast commented Apr 12, 2022 •

edited

danielcgingerich commented Apr 14, 2022 •

edited

danielcgingerich commented Apr 14, 2022 •

edited