Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

genome_filter error #5

Closed
jhcuarta opened this issue Jun 24, 2022 · 18 comments
Closed

genome_filter error #5

jhcuarta opened this issue Jun 24, 2022 · 18 comments

Comments

@jhcuarta
Copy link

Hi
I was hoping you can help me out with an issue I'm getting while running the pipeline. I've been running the following command line, from the genome's sequences folder and outside this folder, no succeed nevertheless

nextflow run metashot/prok-quality --reduced_tree --min_completeness 90 --gunc_filter true --max_contamination 5 --max_cpus 8 --ani_thr 0.99 --min_overlap 0.75 --max_memory 25GB --outdir /home/jason/Documents/prok-quality/results_full --genomes '*.fa'

I encountered an error concerning the genome filter step, when I check out the log, this is what i get. I followed the tip but the file is empty

Jun-23 11:08:00.665 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'genome_filter (1)'

Caused by:
Missing output file(s) filtered/* expected by process genome_filter (1)

Command executed:

mkdir genomes_dir
mv CMR014_EPI.fa 4679_SC.fa A22_SC.fa UG086_UM.fa YB7A09_UA.fa 20478_UM.fa YA00120881_IP.fa H1_2020_SM.fa .......... TEM_04_01_001_UM.fa C19p_c_IMVS.fa A46_SC.fa N2767_NU.fa OYP6F10_UA.fa 22043202_C1_UM.fa 11S_UM.fa 4772STDY6941072_SC.fa E7946_TU.fa genomes_dir
genome_filter.py genome_info.tsv genomes_dir fa genome_info_filtered.tsv genome_info_filtered_drep.csv filtered 90 5 1

Command exit status:
0

Command output:
(empty)

Command error:
/usr/local/lib/python3.7/dist-packages/pandas/core/frame.py:4303: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
errors=errors,

Work dir:
/home/jason/Documents/prok-quality/Vibrio_full/work/07/161110904a0753fb93907be581d81a

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

Thanks

@davidealbanese
Copy link
Contributor

Dear @jhcuarta,
probably there are no genomes that meet the filtering parameters. Can you post the file genome_info.tsv?

@jhcuarta
Copy link
Author

Hi
Indeed, seems they do not pass GUNC filter
genome_info.zip

@jhcuarta
Copy link
Author

Should I set the GUNC filter flag to false as deafult, as if, when when should I take it into account

@davidealbanese
Copy link
Contributor

I suggest to deactivate the GUNC filter and resume the job:
--gunc_filter false -resume

@davidealbanese
Copy link
Contributor

For some reason GUNC detects your genomes as "chimeric". I think they could be false positives.I suggest to report this at https://github.com/grp-bork/gunc/issues

@jhcuarta
Copy link
Author

jhcuarta commented Jul 1, 2022

Hi
Now the dereplication step failed
[ad/da8e50] NOTE: Process drep (1) failed -- Execution is retried (1)
I was wondering if there's any way to know what went wrong, since it took me three to four days to show this error, it is now retrying, I think the pipeline retries three to four times, I was hoping to save some time.
Here's the log for the run, seems running time limit exceeded, although an error message was displaying before the time limit report
nextflow_log.zip

@davidealbanese
Copy link
Contributor

Dear @jhcuarta,

the error is the following: nextflow.exception.ProcessException: Process exceeded running time limit (4d)
the time limit for each process is by default 4 days (96.h, see nextflow.conf). You need to change this value by adding the parameter --max_time e.g. --max_time 14.d for 14 days.

@davidealbanese
Copy link
Contributor

Any news about GUNC?

@jhcuarta
Copy link
Author

jhcuarta commented Jul 5, 2022

Hi
Regarding GUNC, I've sent the genomes sequences to the email provided, none response yet

@jhcuarta
Copy link
Author

Hi
I decided to run the latest GUNC version (1.0.5) and I got a better chimerism filtering, 12 genomes didn't pass in comparison to none for the prok-quality version. I was wondering if you could update the version so I can pass the filtering in one single run.
I'm still pending on GUNC support response
I'll attached the GUNC 1.0.5 output
GUNC_output.zip

Best regards

@davidealbanese
Copy link
Contributor

Dear @jhcuarta,
GUNC version 1.0.5 is now available in prok-quality 1.3.0, let me know if it works as expected.

All the best

@jhcuarta
Copy link
Author

jhcuarta commented Sep 2, 2022

Hi
Same genome filter issue despite GUNC being upgraded. I attached genome.info file
genome_info.zip
I'm not sure, I do not posses the expertise, but I think is a problem regarding the GUNC database
Best regards

@davidealbanese
Copy link
Contributor

Dear @jhcuarta,
did you ran the pipeline with the option -r 1.3.1 (e.g. nextflow run metashot/prok-quality -r 1.3.1 [...]) and without the option --gunc_db?

@jhcuarta
Copy link
Author

jhcuarta commented Sep 2, 2022

Hi
no I did not use the -r flag, this is tehe pipeline I've been using
nextflow run metashot/prok-quality --reduced_tree --min_completeness 90 --gunc_filter true --max_contamination 5 --max_time 21.d --max_cpus 8 --skip_dereplication true --max_memory 25GB --outdir /home/jason/Documents/prok-quality/results_full --genomes '*.fa'

@jhcuarta
Copy link
Author

jhcuarta commented Sep 4, 2022

Hi
I rerun the pipeline using the -r 1.3.1 flag without --gunc_db, and the issue persists. Although, it downloads the database regardless. I was thinking that the database may be outdated

@jhcuarta
Copy link
Author

Hi
I was wondering if you are keeping track on this issue, it's been almost a month and no feedback

@davidealbanese
Copy link
Contributor

Dear @jhcuarta,
can you share some of your problematic genomes?

@jhcuarta
Copy link
Author

Hi
I was wondering if you are keeping track on this issue, it's been a while since I sent you the data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants