Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute log fold change with linear model #1035

Open
danielcgingerich opened this issue Mar 10, 2022 · 6 comments
Open

Compute log fold change with linear model #1035

danielcgingerich opened this issue Mar 10, 2022 · 6 comments
Labels
enhancement New feature or request

Comments

@danielcgingerich
Copy link

danielcgingerich commented Mar 10, 2022

One thing that has always bugged me about the way fold change is calculated is that it is not adjusted for latent variables. Only the p value calculation controls for confounding variables. Referring to the 'LR' test commonly used for scATAC data

Why not use a linear model to calculate fold change?

formula might look something like:

log(peak) ~ group + latent.vars

perhaps also applying the log transformation to the latent variables as well.

group is the categorical variable you are testing, and latent.vars are confounders. If I understand correct, the beta coefficient for group would therefore be the logFC of the two groups adjusted for confounders. I believe the MAST DE test does this for scRNA

@danielcgingerich danielcgingerich added the enhancement New feature or request label Mar 10, 2022
@danielcgingerich
Copy link
Author

@timoast any word on this? Another person at my work recently asked me the same question. No rush!

@timoast
Copy link
Collaborator

timoast commented Apr 12, 2022

I think what you suggest is very reasonable, although it would not be the fold change so I wouldn't necessarily replace the fold change column in the FindMarkers output with this. I don't have bandwidth to work on this right now but you could make a PR for Seurat if you're interested in implementing it, or you could make a separate function in a new package

@danielcgingerich
Copy link
Author

danielcgingerich commented Apr 14, 2022

@timoast Hey Tim, thanks for getting back to me. I am a little confused by how it would not be a log fold change. If the dependent variable is log transformed, and the predictor variable is binary (0 or 1), then the slope should be the log fold change. This would be the case for modeling log(TF-IDF count) ~ disease_status. Maybe I am misunderstanding something - would like to know your thoughts!

@timoast
Copy link
Collaborator

timoast commented Apr 14, 2022

Fold change is defined as the ratio between the two counts, this would be slightly different if it took latent variables into account, that's all I mean. Without latent variables in the model I agree it's the same

@danielcgingerich
Copy link
Author

danielcgingerich commented Apr 14, 2022

@timoast Gotcha, thank you!

Would you say it is more common for researchers to report fold change as the unadjusted ratio between two values, or to adjust for latent variables as well? Not strictly referring to single cell but other bioinformatic fields too

@timoast
Copy link
Collaborator

timoast commented Apr 14, 2022

I haven't seen many cases where people adjust fold change for latent variables, but I don't see any problem with reporting that as long as it's clear what's being calculated. IMO, calling it "fold change" most people would probably assume that means a simple ratio between two values

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Future
Development

No branches or pull requests

2 participants