Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Versions specified in conda env files and snakemake wrappers lead to conflicts/are not available #25

Closed
akriese opened this issue Sep 13, 2022 · 11 comments

Comments

@akriese
Copy link

akriese commented Sep 13, 2022

Hi, yesterday I wanted to try out the workflow but struggled to get everything installed. Before posting the issue I wanted to make sure, that I haven't overlooked anything obvious, but that doesn't seem to be the case.

My grenepipe env has the following version numbers:

conda 4.14.0
python 3.7.10
snakemake 6.0.5
Grenepipe 0.10.0-18ff70c

I set it up with the grenepipe.yaml file. I am running snakemake with the following command:
snakemake --cores all --directory example/ --use-conda --conda-frontend mamba

There seems to a problem across the local environment configurations which specify fixed version numbers of conda packages which partly have unavailable dependencies. Specifically, these are:

  • bcftools.yaml: removing the bcftools version fixed this (or updating it to the current 1.15.1)
  • bwa.yaml: samtools ==1.12 requires some unavailable htslib version. Update to samtools==1.15.1
  • gatk.yaml: fixed by moving conda-forge as first channel
  • multiqc.yaml: multiqc==1.10.1 is missing a dependency (requests), fix by adding anaconda as first channel
  • picard.yaml: fix by putting conda-forge in first channel

Furthermore, the used snakemake wrappers seem to be using outdated versions:

  • samtools/stat: could be updated from 0.64.0 (samtools 1.10, missing htslib dependency) --> v1.12.0 (samtools 1.14)
  • samtools/flagstat: could be updated from 0.64.0 (samtools 1.10, missing htslib dependency) to v1.12.2 (samtools 1.14)
  • tabix (0.55.1) uses old htslib --> update to v1.12.2/bio/tabix/index (in 3 rule files)

With the aboce fixes, the envs can successfully be installed. But the pipeline breaks with errors, which I have to investigate. These are probably caused by the breaking changes between dependency versions.

Is grenepipe supposed to run out of the box as of today? If yes, could someone try out a fresh setup of grenepipe and see if it works on a different machine?

Looking forward to hearing from someone :)

@lczech
Copy link
Member

lczech commented Sep 27, 2022

Hi @akriese,

thanks for reporting the issue, and please excuse my late reply.

I have just run the grenepipe tests in a completely clean Ubuntu 22.04.1 LTS virtual box, using the exact same versions that you specified, and it worked perfectly fine. I've further tested on CentOS (slightly older conda, otherwise same). Which operating system are you on? That seems to be the most likely difference here.

The changes to the channels you suggest sound generally reasonable (in the sense that I think there is a chance that they don't break anything else). I would be interested in reproducing your errors before applying them though.

As for outdated versions, and the subsequent breaking of the pipeline: This is exactly the reason why the versions are specified so meticulously... otherwise, everything breaks due to weird incompatibilities between different tools. Took me ages to figure out a combination of versions that works - bioinformatics tools are a mess. If you really need to update them, I'm afraid you will have to experiment quite a bit.

Is grenepipe supposed to run out of the box as of today?

Of course it is :-) And I've 40ish tests in place that so far have have caught issues - but apparently not the ones you are running into. Hence my suspicion that this is due to a different operating system. Let me know which you are using, and we can see from there.

Cheers and so long
Lucas

@akriese
Copy link
Author

akriese commented Sep 27, 2022

Hi Lucas, thanks for the reply! I guess the problem on my system might be, that I've used conda for other projects on that machine before. So that might be the source of the version conflicts. What kind of information would help you reproduce this?
EDIT: The operating system in my institute is some custom linux distro called mariux64

@lczech
Copy link
Member

lczech commented Sep 30, 2022

Hm, under these circumstances, I'm not sure that I can reproduce this at all. I'm not sure what's going on there with conda, but I think it can indeed get cluttered, so maybe you can do a conda clean, and see if that helps?

Let's keep this issue open for now. I'll try your suggestions regarding package/channel order above, and see if that works, and might just go with them then. I want to do that after the next release though, so that there is a stable fallback point for other users - not sure when that will be, but I'll keep you posted here.

In the meantime, if you have access to some other systems, you could try them as well and see if you run into similar issues, and report back here? My goal is to have grenepipe run as smoothly as possible - happy to take all feedback!

@roosheelpatel
Copy link

roosheelpatel commented Dec 12, 2022

Hi Lucas,

Thanks for developing this package; it will be very useful for our lab in the future.

I have been experiencing similar problems to @akriese. I am, however, attempting to locally install grenepipe on a macOS Darwin-22.1.0-x86_64 platform. As I experienced an 'almost' identical errors as above, I wanted to confirm with you is grenepipe intended to be run off only Linux distros, or have you used the pipeline locally on a MacOSX system as well?

Thanks!

@lczech
Copy link
Member

lczech commented Dec 13, 2022

Hi @roosheelpatel,

thanks for the report, and sorry that this is still happening. This whole setup with dozens of tools is a nightmare to maintain...

Generally, grenepipe is meant to work from small to large datasets, and is written with both Linux and MacOS in mind. That being said, it's most exhaustively tested on Linux. I do have some MacOS tests in place, and just the other day did a large scale test on MacOS as well, and it generally works. Admittedly, there are failed tests in there - many of them due to things like Conda having a hiccup, or the weather being too cold, or whatever mysterious other forces cause it to (non-repeatably) fail... most of them seem to be solved by just starting the test again. I have not figured out a good way to debug those issues, as they are non-deterministic. So, if anything fails, first test should be to just run it again...

However, as you experience conda environment issues that seem more reproducible, we might have a chance of fixing them. Are you using conda, or mamba? Could you maybe post the versions of these, or even better a full grenepipe log file here, and expand a bit on what exactly is going wrong in your particular case? That would greatly help me getting to the bottom of this!

So long, and thanks for your patience!
Lucas

@lczech
Copy link
Member

lczech commented Dec 13, 2022

By the way, @akriese, did you try again on your system, maybe with a cleaned conda setup? I still did not get to test your suggestions, but hopefully will get to do that in the next couple of weeks.

@akriese
Copy link
Author

akriese commented Dec 15, 2022

did you try again on your system, maybe with a cleaned conda setup?

Unfortunately, I do not work for my previous employer anymore, where I was trying out grenepipe. Maybe, I'll get time to test it again at some point, but currently there is not a big intrinsic motivation to do so :(

EDIT: Also, I don't have access to that specific system anymore. So I can't really try it out in the same environment.

lczech added a commit that referenced this issue Dec 15, 2022
In particular, the picard and the qualimap envs were long standing
issues, as they did not get solved properly by conda - mamba was needed.
This hopefully helps with #11 and #25 as well.
We updated envs according to the suggestions of @akriese, with the
exception of the vep env, for which we could not easily get the bcftools
version updated without breaking VEP completely.
@lczech
Copy link
Member

lczech commented Dec 15, 2022

Hi @akriese, thanks for the update, and all the best!

Also, thank you again for your suggestions! I've just implemented (almost) all of them, in the hope that this fixes some long standing issues with conda, and also helps with #11. I'm currently running large scale tests on Ubuntu, CentOS, and MacOS, with conda, and with mamba (not all permutations, that would be too much, but the more important ones). Seems to be working well enough (with conda still being super slow... but at least it does not completely hang any more now)!

And @roosheelpatel, could you maybe try again using grenepipe at this commit? That should hopefully fix your issues. Please let me know if that works for you :-)

@lczech
Copy link
Member

lczech commented Dec 17, 2022

Okay, all environments (with the changes suggested by @akriese, thank you very much again!) now install fine on Ubuntu and on MacOS (with the exception of seqprep, which is not available for MacOS), with both conda and mamba. Phew! Still, mamba is way faster (15min instead of 4h)...

I've just published grenepipe v0.12.0 that includes these changes. @roosheelpatel, that should hopefully solve your issues - please try with that version. I'm hence going to close this issue now, but feel free to re-open or open another one should the same problems still pop up!

Cheers and thanks for your patience!
Lucas

@lczech lczech closed this as completed Dec 17, 2022
@akriese
Copy link
Author

akriese commented Dec 20, 2022

That sounds great! Out of curiosity: When I set up the env back then and changed the version numbers, the env could be set up, but I remember getting some errors during runtime (can't remember them tho). Does it work with this setup now (not just the env setup, but also the running)?

@lczech
Copy link
Member

lczech commented Dec 20, 2022

I very much hope so :-) - at least, all tests are passing, see here, running each of the tools at least once. Of course, I cannot guarantee it, so if anything breaks now, please let me know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants