Skip to content

Fix: debug DeseqStats constrasts #40

Merged
BorisMuzellec merged 3 commits intomainfrom
fix_contrasts
Jan 12, 2023
Merged

Fix: debug DeseqStats constrasts #40
BorisMuzellec merged 3 commits intomainfrom
fix_contrasts

Conversation

@BorisMuzellec
Copy link
Copy Markdown
Collaborator

@BorisMuzellec BorisMuzellec commented Jan 11, 2023

This PR aims to debug DeseqStats, which gave inconsistent results when changing the reference level using the constrast argument.

As an example, setting constrast = ["condition", "A", "B"] instead of constrast = ["condition", "B", "A"] led to different p-values, whereas same p-values but opposite lfcs and stats are expected.

More precisely, in this PR:

  • A design_matrix attribute is added to the DeseqStats class,
  • We make sure that the design_matrix and LFC fields of a DeseqStats object are consistent with its contrast attribute,
  • We refactor the synthetic data that is used for pytests. The same data is now used for single and multi-factor tests, the only difference lying in the choice of design_factors,
  • A test_contrast pytest was implemented to ensure that changing the reference level in the contrast yields expected results.

@BorisMuzellec BorisMuzellec changed the title fix: DeseqStats now has its own design matrix, which is updated depen… Fix: debug DeseqStats constrasts Jan 11, 2023
@BorisMuzellec BorisMuzellec marked this pull request as ready for review January 11, 2023 16:21
@BorisMuzellec BorisMuzellec added the bug Something isn't working label Jan 12, 2023
Copy link
Copy Markdown
Collaborator

@maikia maikia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @BorisMuzellec nice PR. In general LGTM and ready to merge, just two minor points (not necessary for this PR, but good to keep in mind for the future):

  1. As part of this PR there are a lot of non-relevant (to this PR) name changes, etc.. which makes more difficult PR checking itself. Next time it would be great if you could keep those to separate PR.
  2. using your unit test I tried to make some tests on DeseqStats:
  • when values: res_B_vs_A = DeseqStats(dds, contrast=["condition", "A", "A"]) are passed I get a message: *** KeyError: 'condition_A', perhaps more descriptive message could be given?
  • res_B_vs_A = DeseqStats(dds, contrast=["group", "B", "A"]) message is: *** AssertionError: The contrast levels should correspond to design factors levels.
    Although this is correct a more descriptive message could be given (eg, design factor 'group' does not have 'A' level. The contrast levels should correspond to design factors levels.

@BorisMuzellec
Copy link
Copy Markdown
Collaborator Author

Thanks @maikia, I'll merge this PR then and open another one to improve error messages.

@hfl112
Copy link
Copy Markdown

hfl112 commented May 8, 2024

I still got some error when using contrast:

stat_res = DeseqStats(dds, contrast=['condition', 'range_high', 'range_low'])

KeyError: "The tested level ('range_high') should correspond to one of the levels of 'condition'"

image

@BorisMuzellec
Copy link
Copy Markdown
Collaborator Author

Hi @hfl112, can you try replacing underscores with hyphens?

stat_res = DeseqStats(dds, contrast=['condition', 'range-high', 'range-low'])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants