New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compute log fold change with linear model #1035
Comments
@timoast any word on this? Another person at my work recently asked me the same question. No rush! |
I think what you suggest is very reasonable, although it would not be the fold change so I wouldn't necessarily replace the fold change column in the FindMarkers output with this. I don't have bandwidth to work on this right now but you could make a PR for Seurat if you're interested in implementing it, or you could make a separate function in a new package |
@timoast Hey Tim, thanks for getting back to me. I am a little confused by how it would not be a log fold change. If the dependent variable is log transformed, and the predictor variable is binary (0 or 1), then the slope should be the log fold change. This would be the case for modeling log(TF-IDF count) ~ disease_status. Maybe I am misunderstanding something - would like to know your thoughts! |
Fold change is defined as the ratio between the two counts, this would be slightly different if it took latent variables into account, that's all I mean. Without latent variables in the model I agree it's the same |
@timoast Gotcha, thank you! Would you say it is more common for researchers to report fold change as the unadjusted ratio between two values, or to adjust for latent variables as well? Not strictly referring to single cell but other bioinformatic fields too |
I haven't seen many cases where people adjust fold change for latent variables, but I don't see any problem with reporting that as long as it's clear what's being calculated. IMO, calling it "fold change" most people would probably assume that means a simple ratio between two values |
One thing that has always bugged me about the way fold change is calculated is that it is not adjusted for latent variables. Only the p value calculation controls for confounding variables. Referring to the 'LR' test commonly used for scATAC data
Why not use a linear model to calculate fold change?
formula might look something like:
log(peak) ~ group + latent.vars
perhaps also applying the log transformation to the latent variables as well.
group
is the categorical variable you are testing, andlatent.vars
are confounders. If I understand correct, the beta coefficient forgroup
would therefore be the logFC of the two groups adjusted for confounders. I believe the MAST DE test does this for scRNAThe text was updated successfully, but these errors were encountered: