Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH Added lfc under null and alt hypothesis #172

Merged
merged 12 commits into from
Sep 15, 2023
Merged

ENH Added lfc under null and alt hypothesis #172

merged 12 commits into from
Sep 15, 2023

Conversation

vcabeli
Copy link
Contributor

@vcabeli vcabeli commented Sep 5, 2023

Reference Issue or PRs

Resolves #158

PR description

This PR aims to implement the option to specify an alternative test for producing Wald statistics and p-values.
It corresponds to the 'lfcThreshold' and 'altHypothesis' parameters of the original DESeq2 results(dds), but not exactly :

  • DESeq2 takes the negative lfcThreshold when altHypothesis = less as explained in the vignette. If you want to have the same behavior in pyDESeq2, you will have to specify the correct sign for lfc_null
image
  • DESeq2 computes the pvalues and the statistics separately , this gives unintuitive results where the statistic is 0 and the p-value is < 1 :
	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
gene1	8.541317	0.632812	0.289104	0.459393	0.322976	1.0
gene2	21.281239	0.538552	0.149963	0.257077	0.398560	1.0
gene3	5.010123	-0.632832	0.295221	0.000000	0.999938	1.0
gene4	100.517961	-0.412102	0.118628	0.000000	1.000000	1.0
gene5	27.142450	0.582066	0.154730	0.530383	0.297923	1.0
gene6	5.413043	0.001457	0.310310	0.000000	0.945929	1.0
gene7	28.294023	0.134336	0.149987	0.000000	0.992615	1.0
gene8	40.358344	-0.270656	0.136402	0.000000	1.000000	1.0
gene9	37.166183	-0.212715	0.133244	0.000000	1.000000	1.0
gene10	11.589325	0.386011	0.244586	0.000000	0.679409	1.0

compared to pyDESeq2 :

	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
gene1	8.541317	0.632812	0.289101	0.459398	0.322974	0.5
gene2	21.281239	0.538552	0.149963	0.257077	0.398560	0.5
gene3	5.010123	-0.632830	0.295236	0.000000	0.500000	0.5
gene4	100.517961	-0.412102	0.118629	0.000000	0.500000	0.5
gene5	27.142450	0.582065	0.154706	0.530462	0.297896	0.5
gene6	5.413043	0.001457	0.310311	0.000000	0.500000	0.5
gene7	28.294023	0.134338	0.149945	0.000000	0.500000	0.5
gene8	40.358344	-0.270656	0.136401	0.000000	0.500000	0.5
gene9	37.166183	-0.212715	0.133243	0.000000	0.500000	0.5
gene10	11.589325	0.386011	0.244588	0.000000	0.500000	0.5
  • I have not yet understood why (probably different Cooks adjustment?), DESeq2 returns p-values between [0,1] even though it uses one-sided tests for the 'less', 'greater' and 'lessAbs' alt hypothesis, which are bounded in [0, 0.5]. They must be corrected afterwards. The p-values where the statistic is > 0 are in agreement up to tol=0.02, as shown in tests/test_pydeseq2.py:test_alt_hypothesis()
  • This implementation allows to set an arbitrary lfc_null that corresponds to the (log2) LFC under the null hypothesis and no alternative hypothesis, i.e. use the classic Wald test to test for deviation from the null value.

The test results used for comparison were generated using the R DESeq2 version 1.34.0 and results(dds, lfcThreshold=.5, altHypothesis=altHypothesis)

@vcabeli vcabeli changed the title Added lfc under null and alt hypothesis ENH Added lfc under null and alt hypothesis Sep 5, 2023
@vcabeli vcabeli added the enhancement New feature or request label Sep 5, 2023
@BorisMuzellec
Copy link
Collaborator

Hi @vcabeli,

Thanks a lot for this PR!

A few comments:

  • I agree that it makes sense to expect users to provide an LFC threshold with the correct sign when altHypothesis = less. In particular, I don't see any reason why we wouldn't allow testing LFC < 0.5.

  • The differences between DESeq2 and this PR regarding statistics equal to 0 seem to come from the fact that when an altHypothesis is specified, DESeq2 thresholds statistics to be above 0 but uses un-thresholded statistics to compute p-values, e.g. for altHypothesis = greater:

    newStat <- pmax((LFC - T)/SE, 0)
    newPvalue <- pfunc((LFC - T)/SE)
    

    vs. in this PR

    stat = contrast @ np.fmax((lfc - lfc_null) / wald_se, 0)
    pval = norm.sf(stat)
    

    I'm not sure which is correct.

  • Given that it is possible to provide a negative threshold, could you make DeseqStats throw a ValueError
    when lfc_null < 0 but alt_hypothesis is based on absolute values?

@vcabeli
Copy link
Contributor Author

vcabeli commented Sep 7, 2023

Thanks for the review @BorisMuzellec,
Personally I would be for computing the p-value directly from the stat, especially since in this implementation you can specify a lfc_null without any alt_hypothesis, which does stat=(LFC - T)/SE) (and pval=pnorm(stat), although it uses the two-tailed distribution).
That way everything strictly corresponds to the input parameters

pydeseq2/ds.py Outdated Show resolved Hide resolved
pydeseq2/ds.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@BorisMuzellec BorisMuzellec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the PR

@BorisMuzellec BorisMuzellec merged commit 868e7fa into main Sep 15, 2023
8 checks passed
@BorisMuzellec BorisMuzellec deleted the null_lfc branch September 15, 2023 10:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature request] Threshold-based Wald test
2 participants