Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

environment.yml #206

Closed
kullrich opened this issue Mar 23, 2021 · 14 comments
Closed

environment.yml #206

kullrich opened this issue Mar 23, 2021 · 14 comments

Comments

@kullrich
Copy link

kullrich commented Mar 23, 2021

Dear nf-core team,

could you please update the environment.yml file from the chipseq pipeline so that it can run on the latest Ubuntu version?

The test data is just not running out of the box which might be related to the following error (Ubuntu20.04; latest anaconda3 from today).

  Error: package or namespace load failed forDESeq2in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
   namespacexfun0.15 is being loaded, but >= 0.19 is required
  In addition: Warning message:
  packagematrixStatswas built under R version 3.6.3 

Might be related that the default R which is installed by Ubuntu20.04 is already 3.6.3 and not 3.6.2 as in the environment.yml file

Thank you in anticipation

@kullrich
Copy link
Author

kullrich commented Mar 23, 2021

nextflow run nf-core/chipseq -profile test,conda

@kullrich
Copy link
Author

kullrich commented Mar 23, 2021

changed the environment.yml into:

  • conda-forge::r-base=3.6.3
  • conda-forge::r-xfun=0.19
  • conda-forge::r-matrixstats=0.58.0

Still 2 warnings:

executor >  local (97)
[56/3f3016] process > CHECK_DESIGN (design.csv)                      [100%] 1 of 1 ✔
[87/e6076b] process > BWA_INDEX (genome.fa)                          [100%] 1 of 1 ✔
[22/ff97fc] process > MAKE_GENE_BED (genes.gtf)                      [100%] 1 of 1 ✔
[4e/dbab21] process > MAKE_GENOME_FILTER (genome.fa)                 [100%] 1 of 1 ✔
[d0/31d908] process > FASTQC (SPT5_T15_R2_T1)                        [100%] 6 of 6 ✔
[09/1c0240] process > TRIMGALORE (SPT5_T15_R2_T1)                    [100%] 6 of 6 ✔
[56/19fe8b] process > BWA_MEM (SPT5_T15_R2_T1)                       [100%] 6 of 6 ✔
[6d/d15fee] process > SORT_BAM (SPT5_T15_R2_T1)                      [100%] 6 of 6 ✔
[a4/697f93] process > MERGED_BAM (SPT5_T15_R1)                       [100%] 6 of 6 ✔
[2e/f5ecd0] process > MERGED_BAM_FILTER (SPT5_INPUT_R2)              [100%] 6 of 6 ✔
[ee/be6f9e] process > MERGED_BAM_REMOVE_ORPHAN (SPT5_INPUT_R1)       [100%] 6 of 6 ✔
[a0/e59639] process > PRESEQ (SPT5_T0_R1)                            [100%] 6 of 6, failed: 2 ✔
[00/79fd0d] process > PICARD_METRICS (SPT5_INPUT_R1)                 [100%] 6 of 6 ✔
[67/577baa] process > BIGWIG (SPT5_INPUT_R1)                         [100%] 6 of 6 ✔
[7f/6cc2c4] process > PLOTPROFILE (SPT5_INPUT_R1)                    [100%] 6 of 6 ✔
[47/49b58b] process > PHANTOMPEAKQUALTOOLS (SPT5_INPUT_R1)           [100%] 6 of 6 ✔
[ec/3ac3a7] process > PLOTFINGERPRINT (SPT5_T15_R1 vs SPT5_INPUT_R1) [100%] 4 of 4 ✔
[77/e9af18] process > MACS2 (SPT5_T0_R1 vs SPT5_INPUT_R1)            [100%] 4 of 4 ✔
[0f/e27836] process > MACS2_ANNOTATE (SPT5_T0_R1 vs SPT5_INPUT_R1)   [100%] 4 of 4 ✔
[82/2e797c] process > MACS2_QC                                       [100%] 1 of 1 ✔
[a1/66a2c6] process > CONSENSUS_PEAKS (SPT5)                         [100%] 1 of 1 ✔
[ae/241a88] process > CONSENSUS_PEAKS_ANNOTATE (SPT5)                [100%] 1 of 1 ✔
[9f/131a36] process > CONSENSUS_PEAKS_COUNTS (SPT5)                  [100%] 1 of 1 ✔
[f6/818504] process > CONSENSUS_PEAKS_DESEQ2 (SPT5)                  [100%] 1 of 1 ✔
[b1/3cea6d] process > IGV                                            [100%] 1 of 1 ✔
[13/629590] process > get_software_versions                          [100%] 1 of 1 ✔
[04/8032fc] process > MULTIQC (1)                                    [100%] 1 of 1 ✔
[16/af5fe5] process > output_documentation                           [100%] 1 of 1 ✔
-Warning, pipeline completed, but with errored process(es) -
-Number of ignored errored process(es) : 2 -
-Number of successfully ran process(es) : 95 -
-[nf-core/chipseq] Pipeline completed successfully-
Completed at: 23-Mar-2021 21:09:18
Duration    : 7m 18s
CPU hours   : 0.7 (0.9% failed)
Succeeded   : 95
Ignored     : 2
Failed      : 2

@kullrich
Copy link
Author

kullrich commented Mar 23, 2021

remaining warnings are related to this error or preseq:

PAIRED_END_BAM_INPUT
paired = 93787
unpaired = 257
MERGED PAIRED END READS = 93787
MATES PROCESSED = 187831
TOTAL READS     = 94044
DISTINCT READS  = 94014
DISTINCT COUNTS = 2
MAX COUNT       = 2
COUNTS OF 1     = 93984
MAX TERMS       = 2
OBSERVED COUNTS (3)
1	93984
2	30

ERROR:	max count before zero is les than min required count (4), sample not sufficiently deep or duplicates removed

@kullrich
Copy link
Author

kullrich commented Mar 23, 2021

43      74/19ae9e       874440  PRESEQ (SPT5_INPUT_R2)  FAILED  1       2021-03-23 21:06:48.980 5.5s    5.4s    -       -       -       -       -
45      53/0bd427       874480  PRESEQ (SPT5_INPUT_R1)  FAILED  1       2021-03-23 21:06:49.041 5.4s    5.4s    -       -       -       -       -

@acoleman2000
Copy link

I wanted to echo that I also have the same issue running on Ubuntu20.04 with the latest anaconda3.

@drpatelh
Copy link
Member

drpatelh commented Apr 8, 2021

Sorry, guys.... 😏 I have been meaning to update this pipeline for the longest time but have been distracted by implementing other stuff on nf-core. The problem we have currently is that the dev branch is in need of some TLC to bring it up-to-date with the Nextflow DSl2 syntax that I (and others) are implementing on nf-core/modules. This won't be something I will get around to doing anytime soon.

Are you able to use Docker / Singularity because those should work? The error with Conda is known and is one of the big downsides with using it i.e. the environment resolution can change over time and so strictly speaking isn't reproducible. I suspect a bunch of lower-level dependencies have been changed with respect to R that is now breaking the environment.

@stain
Copy link

stain commented Apr 22, 2021

Hi, we're using this workflow in this tutorial https://biocompute-objects.github.io/bco-ro-crate/tutorial/running.html and used Conda as a way to avoid having to teach installing Docker/Singularity; but this now again breaks for us on Ubuntu 20.04.

Also R should try to actually follow Semantic Versioning ;)

Is it possible to add the environment.yml.lock as saved from conda env export as part of the nf-core release procedure?

@stain
Copy link

stain commented Apr 22, 2021

Perhaps as the other changes on dev would warrant a 1.3.0 release (and unfortunately need more effort) we could do a quick patch release branched straight of master for a 1.2.2 that just fixes this issue in environment.yml?

@drpatelh
Copy link
Member

Hi @stain ! Hope you are well. Awesome that you are using the pipeline in the tutorial you linked to :)

I did actually add a line to the standard nf-core Dockerfile that saves the output of conda env export to a file within the container at the time of release for exactly these reasons:

RUN conda env export --name nf-core-chipseq-1.2.1 > nf-core-chipseq-1.2.1.yml

I have copied this out of a Singularity container and attached it below but had to change the extension to txt to upload it here:
nf-core-chipseq-1.2.1.yml.txt

But when I try to create the environment it still errors:

$ conda env create -f nf-core-chipseq-1.2.1.yml

Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:
  - gfortran_impl_linux-64==7.5.0=hdf63c60_6
  - gcc_impl_linux-64==7.5.0=hd420e75_6
  - gxx_impl_linux-64==7.5.0=hdf63c60_6

When are you hosting the tutorial? I suspect things won't be as straightforward as just doing a patch release because the linting etc we have is always done relative to the latest tools release and so will fail. But can try and give it a go.

@stain
Copy link

stain commented Apr 22, 2021

thanks, @drpatelh, I did not seem to have it recorded in previous run - only the desired environment.yml.

I would not want to muddle the tutorial with "First patch the Conda environment" but perhaps it could work to pre-load the environment for this workflow? Nextflow seem however to make temporary conda environments so not sure how to inject it in advance.

The tutorial is announced for 12. May, so still time to fix it. But would need to rewrite a fair bit if I need to do another nf-core workflow or require them to install Docker first.

@drpatelh
Copy link
Member

Ok. Try this @stain:

nextflow pull nf-core/chipseq
nextflow run nf-core/chipseq -profile test,conda -r 1.2.1_envfix

This is just a separate branch with the env fix that can be run directly without having to do anything beforehand. It won't be completely reproducbile unless we have an official release but if you have a look at the ❌ on #208 you will see what I meant about broken tests.

I will keep this branch there until we release the next version. If not, it can always be re-instated quite easily.

Is that a good enough compromise?

@drpatelh
Copy link
Member

No half measures...

Fixed in v1.2.2 guys. Thank you for your patience and @ewels for pushing the containers to Dockerhub!
#209

Hopefully it works :)

@stain
Copy link

stain commented Apr 23, 2021

Thanks @drpatelh, this works great! I see the broken tests in #208 are with the linting requirements having changed since previous release.

I also get this error in PRESEQ which I guess is same as what @kullrich points out.

[74/0a3e81] process > PRESEQ (SPT5_T15_R1)                           [100%] 6 of 6, failed: 2 ✔

Except for that the workflow itself otherwise runs well, and in a way showing how to report error could be an important part of the tutorial.

Will use 1.2.1_envfix in tutorial for now, if 1.2.2 comes out before that is "live" mid-May, even better!

More on PRESEQ below:

from log:

Apr-22 16:34:51.425 [Task monitor] DEBUG n.processor.TaskPollingMonitor - Task completed > TaskHandler[id: 38; name: PRESEQ (SPT5_INPUT_R1); status: COMPLETED; exit: 1; error: -; workDir: /work/55/699216bb28d9b747e34656b4fe7ccf]
Apr-22 16:34:51.467 [Task monitor] INFO  nextflow.processor.TaskProcessor - [55/699216] NOTE: Process `PRESEQ (SPT5_INPUT_R1)` terminated with an error exit status (1) -- Error is ignored

ERROR:	max count before zero is les than min required count (4), sample not sufficiently deep or duplicates removed
(bco-ro) root@05ce36630c51:/work/67/3245f4f667af13900fb3911e242306# cat .command.err 
PAIRED_END_BAM_INPUT
paired = 95833
unpaired = 452
MERGED PAIRED END READS = 95833
MATES PROCESSED = 192118
TOTAL READS     = 96285
DISTINCT READS  = 96262
DISTINCT COUNTS = 2
MAX COUNT       = 2
COUNTS OF 1     = 96239
MAX TERMS       = 2
OBSERVED COUNTS (3)
1	96239
2	23

ERROR:	max count before zero is les than min required count (4), sample not sufficiently deep or duplicates removed

Perhaps this is just sample-related.

@drpatelh
Copy link
Member

No worries. I released 1.2.2 yesterday so I have deleted the 1.2.1_envfix branch as we no longer need it. The command below should work now :)

nextflow pull nf-core/chipseq
nextflow run nf-core/chipseq -profile test,conda -r 1.2.2

Yup, you can literally just "ignore" the PRESEQ error. It doesn't like the test data we use for the CI because it isn't sufficiently deep enough. This will require changing the standard test data for the pipeline which is another beast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants