Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

global epistasis models give different results with pandas 2.2 #128

Closed
jbloom opened this issue Jan 27, 2024 · 2 comments
Closed

global epistasis models give different results with pandas 2.2 #128

jbloom opened this issue Jan 27, 2024 · 2 comments

Comments

@jbloom
Copy link
Collaborator

jbloom commented Jan 27, 2024

@jgallowa07, this seems like potentially a significant bug. When I run the global epistasis models in the dms-vep-pipeline-3/test_example, I get substantially different results depending on whether I use pandas 2.1 or pandas 2.2 (I also have pyarrow installed). The pandas 2.1 results are consistent with earlier versions, but the pandas 2.2 are much different.

Note that I cannot rule out that this caused by a bug in pandas 2.2.0 which is a recent release, but I wanted to flag it here.

jbloom added a commit to dms-vep/dms-vep-pipeline-3 that referenced this issue Jan 28, 2024
- Update environment:
  - update `polyclonal` to 6.10 (addresses [this issue](#96))
  - update `neutcurve` to 1.1.2
  - update `altair` to 5.2.0
  - update `biopython` to 1.83
  - update `pandas` to 2.1 and add `pyarrow`. Did not update to `pandas` 2.2 due to [this issue](matsengrp/multidms#128).
  - update to `seaborn` 0.13
  - update to `snakemake` 8.3. **Note that this means the recommended usage now changes from `--use-conda` to `--software-deployment-method conda`.**
- sort rows in prob escape values for consistent output, may very slightly change some of the fit antibody-escape values
@jgallowa07
Copy link
Member

Good catch @jbloom. Indeed, when I updated to pandas==2.2.0 the unit tests fail. I've added a patch over at #129 That explains things in more detail.

However, this patch is for multidms>=3.1, and I understand that the dms-vep-pipeline is still reliant on an older version of multidms==0.2.1 until dms-vep/dms-vep-pipeline-3#91 gets finished and merged.

Hopefully I will get that PR ready for review this week, but if you need a patch for this issue now I could always branch off of 0.2.1 to patch this specific bug and you could point to that to install directly from github.

@jbloom
Copy link
Collaborator Author

jbloom commented Jan 30, 2024

Great, and thanks for the explanation of the bug. For now I have pinned pandas=2.1 for the pipeline so this should work fine until the new multidms is merged into the pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants