How does a ULM on LFCs from bulk RNA-seq work for TF activity inference? #21

adamklie · 2022-10-14T21:11:23Z

Describe your question
Thanks once again for putting together this package. I've found it super usable and useful so far. I just have hopefully a quick question about the underlying method for analysis of bulk RNA-seq data.

I'm getting some awesome results from fitting a univariate linear model to log fold changes (LFC) from DESeq2 using the DoRothEA network, but I don't really understand how I'm getting these TF activities. I follow the general idea that "the observed molecular readouts in mat are the response variable and the regulator weights in net are the explanatory ones," but I guess I don't really understand what these are in my specific context and how I end up with an activity score for each TF. I'm not sure I even follow what each data point is in this context. Am I fitting a separate model for each TF?

If someone could help me understand this better or point me to the details (that I apologize in advance if I missed them) that would be amazing!

The text was updated successfully, but these errors were encountered:

PauBadiaM · 2022-10-17T08:45:59Z

Hi @adamklie

Glad you find the package useful!

As you know decoupler takes an input matrix of gene expression (GEX) and a prior knowledge network (Net) which we transform into matrix format internally. In your case, your GEX is made out of the contrasts' statistics between conditions (if you only have one then it is only one row). When you run ulm, for each "sample" (in your case contrasts) and TF, you fit a univariate linear model. The response variable is the observed GEX (in your case the observed change between conditions) and the explanatory are the weights for that TF. Once the model is fitted, we extract the t-value of the fitted model as the "activity". If the genes that belong to a certain TF show increased activity when their weights are positive, the slope will be positive and we will have positive activities. If there is disagreement, for example genes with negative logFCs and positive weights, the slope will be negative.

To briefly answer your question, yes, we fit a separate model for each TF in ulm. However, in the case of mlm we fit one for each sample/contrast, where we include all TFs at the same time, thus the name "multivariate".

Hope this is helpful! Feel free to ask more questions if needed.

adamklie · 2022-10-20T22:03:23Z

Thank you so much for the detailed explanation and the figure! Makes it very clear how this is working. Now I'm trying to think about when you might expect one to work better than the other. I would expect that many of the explanatory TFs would have correlated weights that might make it harder to fit a mlm, but that many target genes should be explained by the action of multiple TFs. I guess its not hard to try both and inspect the fit, but are there other considerations or sub-selections of the network or data that make sense before fitting the mlm?

PauBadiaM · 2022-10-21T07:49:11Z

Very good points. In the end there is no free lunch, there is a tradeoff.

The advantage of ulm is that we don't care if two TFs have highly correlated weights because we are testing then independently, but you might get false positives and we are not accounting for TF interaction effects when modeling activities.

The advantage of mlm is that since it is multivariate, it accounts for these interaction effects when modeling activities but if the network is too co-linear it might break for those TFs. BTW, you can always check how co-linear a given net is for your data using:

dc.check_corr(net, mat=mat)

If you see that some TFs pairs have high correlations (> 0.9), you should definitely double check how the obtained activities look for these when using mlm.

Therefore, if you are not sure of which one to pick, you can always use the consensus score, which by default models activities based on the results of ulm, norm_wsum and mlm. In our benchmarks we have seen that depending on the data, one of these three methods outperforms the other two, but that their consensus is always the slightly better alternative.

Hope this is helpful!

adamklie · 2022-10-21T13:59:11Z

So helpful. Thanks again for taking the time to explain, I really appreciate it!

gjones3339 · 2024-10-24T13:46:59Z

If we only care regulatory activity in a list of about 30-50 genes (and their logFCs), can we still run consensus? Or we should stick to ORA in that case?

PauBadiaM · 2024-10-24T14:21:19Z

Hi @gjones3339,

Without a proper background of genes I would recommend just running ORA (there we assume a background of ~20k genes, all the protein coding genes). For list of genes you can easily run ora with get_ora_df, an example can be seen in this vignette. Hope this is helpful!

adamklie added the question User question: anything that's not obviously a bug label Oct 14, 2022

PauBadiaM self-assigned this Oct 17, 2022

adamklie closed this as completed Oct 21, 2022

PauBadiaM mentioned this issue May 11, 2023

run_mlm function process #42

Closed

PauBadiaM mentioned this issue May 6, 2024

why ulm (and not mlm) for transription factor activities ? there must be some overlaps #120

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does a ULM on LFCs from bulk RNA-seq work for TF activity inference? #21

How does a ULM on LFCs from bulk RNA-seq work for TF activity inference? #21

adamklie commented Oct 14, 2022

PauBadiaM commented Oct 17, 2022

adamklie commented Oct 20, 2022

PauBadiaM commented Oct 21, 2022

adamklie commented Oct 21, 2022

gjones3339 commented Oct 24, 2024 •

edited

Loading

PauBadiaM commented Oct 24, 2024

How does a ULM on LFCs from bulk RNA-seq work for TF activity inference? #21

How does a ULM on LFCs from bulk RNA-seq work for TF activity inference? #21

Comments

adamklie commented Oct 14, 2022

PauBadiaM commented Oct 17, 2022

adamklie commented Oct 20, 2022

PauBadiaM commented Oct 21, 2022

adamklie commented Oct 21, 2022

gjones3339 commented Oct 24, 2024 • edited Loading

PauBadiaM commented Oct 24, 2024

gjones3339 commented Oct 24, 2024 •

edited

Loading