Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building custom database issue #18

Open
jongin333 opened this issue Aug 15, 2019 · 5 comments
Open

Building custom database issue #18

jongin333 opened this issue Aug 15, 2019 · 5 comments

Comments

@jongin333
Copy link

Hi,

I'd like to build a custom database for some of NCBI refseq sequences.
Before that, I attempted to build a very small database, but It was failed.

Here is the directory structure for building the database, the configure file and log file.

======= directory structure =======
./db
./db/clark
./db/clark/genomes.fna
./db/dudes
./db/dudes/genomes.fna
./db/kaiju
./db/kaiju/genome.gbff
./db/kraken
./db/kraken/genome.fna

======= config file =======
workdir: "/mss2/projects/META2/taxonomy_classification/metameta"

databases:

  • custom_db

custom_db:
clark: "/mss2/projects/META2/taxonomy_classification/metameta/db/clark"
dudes: "/mss2/projects/META2/taxonomy_classification/metameta/db/dudes"
kaiju: "/mss2/projects/META2/taxonomy_classification/metameta/db/kaiju"
kraken: "/mss2/projects/META2/taxonomy_classification/metameta/db/kraken"

dbdir: "/mss2/projects/META2/taxonomy_classification/metameta/db"

samples:
"TEST":
fq1: "test1_1.fq.gz"
fq2: "test1_2.fq.gz"

gzipped: 1
threads: 50

======= Log file =======
Building DAG of jobs...
Provided cores: 5
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
1 clark_db_custom_1
1 clark_db_custom_2
1 clark_db_custom_3
1 clark_db_custom_4
1 clark_db_custom_check
1 clark_db_custom_profile
1 clark_rpt
1 clark_run_1
4 clean_files
1 clean_reads
4 database_profile
1 dudes_db_custom_1
1 dudes_db_custom_2
1 dudes_db_custom_3
1 dudes_db_custom_check
1 dudes_db_custom_profile
1 dudes_rpt
1 dudes_run_1
1 dudes_run_2
1 errorcorr_reads
1 get_accession2taxid
1 get_gi_taxid_nucl
1 get_taxdump
1 kaiju_db_custom_1
1 kaiju_db_custom_2
1 kaiju_db_custom_3
1 kaiju_db_custom_4
1 kaiju_db_custom_check
1 kaiju_db_custom_profile
1 kaiju_rpt
1 kaiju_run_1
1 kraken_db_custom_1
1 kraken_db_custom_2
1 kraken_db_custom_3
1 kraken_db_custom_check
1 kraken_db_custom_profile
1 kraken_rpt
1 kraken_run_1
1 krona
1 metametamerge
1 subsample_reads
1 trim_reads
49


MetaMeta Pipeline v1.2.0 by Vitor C. Piro (vitorpiro@gmail.com, http://github.com/pirovc)

Parameters:

  • bins: 4
  • custom_db: OrderedDict([('clark', 'db/clark'),
    ('dudes', 'db/dudes'),
    ('kaiju', 'db/kaiju'),
    ('kraken', 'db/kraken')])
  • cutoff: 0.0001
  • databases: ['custom_db']
  • dbdir: 'db'
  • desiredminlen: 70
  • detailed: 0
  • errorcorr: 0
  • gzipped: 1
  • keepfiles: 0
  • mode: 'linear'
  • ranks: 'species'
  • replacement: 0
  • samples: OrderedDict([('TEST',
    OrderedDict([('fq1', 'test1_1.fq.gz'),
    ('fq2', 'test1_2.fq.gz')]))])
  • samplesize: 1
  • strictness: 0.8
  • subsample: 0
  • threads: 50
  • tool_alt_path: {'bowtie2': '',
    'clark': '',
    'dudes': '',
    'gottcha': '',
    'kaiju': '',
    'kraken': '',
    'krona': '',
    'metametamerge': '',
    'motus': '',
    'spades': '',
    'trimmomatic': ''}
  • tools: {'clark': 'b',
    'dudes': 'p',
    'gottcha': 'p',
    'kaiju': 'b',
    'kraken': 'b',
    'motus': 'p'}
  • trimming: 0
  • verbose: 0
  • workdir: '.'

rule kaiju_db_custom_1:
output: dbcustom_db/kaiju_db/kaiju_db.faa
log: dbcustom_db/log/kaiju_db_custom_1.log
jobid: 48
benchmark: dbcustom_db/log/kaiju_db_custom_1.time
wildcards: database=custom_db

rule get_gi_taxid_nucl:
output: dbtaxonomy/gi_taxid_nucl.dmp.gz
log: dbtaxonomy/log/get_gi_taxid_nucl.log
jobid: 44
benchmark: dbtaxonomy/log/get_gi_taxid_nucl.time

rule get_taxdump:
output: dbtaxonomy/taxdump.tar.gz, dbtaxonomy/names.dmp, dbtaxonomy/nodes.dmp, dbtaxonomy/merged.dmp
log: dbtaxonomy/log/get_taxdump.log
jobid: 4
benchmark: dbtaxonomy/log/get_taxdump.time

rule get_accession2taxid:
output: dbtaxonomy/nucl_gb.accession2taxid.gz, dbtaxonomy/nucl_wgs.accession2taxid.gz
log: dbtaxonomy/log/get_accession2taxid.log
jobid: 47
benchmark: dbtaxonomy/log/get_accession2taxid.time

Activating conda environment /mss2/projects/META2/taxonomy_classification/metameta/.snakemake/conda/0e3e8e78.

rule clark_db_custom_1:
output: dbcustom_db/clark_db/Custom/
log: dbcustom_db/log/clark_db_custom_1.log
jobid: 40
benchmark: dbcustom_db/log/clark_db_custom_1.time
wildcards: database=custom_db

Finished job 40.
1 of 49 steps (2%) done

rule kaiju_db_custom_profile:
output: dbcustom_db/kaiju.dbaccession.out
log: dbcustom_db/log/kaiju_db_custom_profile.log
jobid: 36
benchmark: dbcustom_db/log/kaiju_db_custom_profile.time
wildcards: database=custom_db

Finished job 36.
2 of 49 steps (4%) done
Finished job 48.
3 of 49 steps (6%) done
Exiting because a job execution failed. Look above for error message
Will exit after finishing currently running jobs.
Finished job 44.
4 of 49 steps (8%) done
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message


An error has occured.
Please check the main log file for more information:
/mss2/projects/META2/taxonomy_classification/metameta/metameta_2019-08-15_23-59-47.log
Detailed output and execution time for each rule can be found at:
/mss2/projects/META2/taxonomy_classification/metameta/db/log/
/mss2/projects/META2/taxonomy_classification/metameta/SAMPLE_NAME/log/

=======

How can I build a custom database?
What did I miss?

Thank you,
Jongin

@pirovc
Copy link
Owner

pirovc commented Aug 16, 2019

Hi Jongin,

Apparently jobs 4 (get_taxdump) and 47 (get_accession2taxid) could not be finished. Both of them are just downloading data from NCBI servers. Do you have some internet restrictions? you can check what was the error for those rules in:

/mss2/projects/META2/taxonomy_classification/metameta/db/log/taxonomy/log/get_taxdump.log

/mss2/projects/META2/taxonomy_classification/metameta/db/log/taxonomy/log/get_accession2taxid.log

Best
Vitor

@jongin333
Copy link
Author

Hi Vitor,

I found the problem in the 'dbdir' path in the configure file.
So, I changed and reran (wipe all and reran in a newly made directory).

dbdir: "/mss2/projects/META2/taxonomy_classification/metameta/db"
->
dbdir: "/mss2/projects/META2/taxonomy_classification/metameta/db/"

Then, the previous errors were passed, but the other problem occurred in dudes.
Here is the bottom of the log file.

Error in rule dudes_db_custom_profile:
    jobid: 47
    output: /mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/dudes.dbaccession.out
    log: /mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/log/dudes_db_custom_profile.log

RuleException:
CalledProcessError in line 56 of /mss1/programs/europa/miniconda2/envs/metametaenv/opt/metameta/tools/dudes_db_custom.sm:
Command ' set -euo pipefail;  python3 -c 'import numpy as np; npzfile = np.load("/mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/dudes_db/dudes_db.npz"); print("\n".join(npzfile["refids_lookup"].item().keys()))' | sed 's/>//g' > /mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/dudes.dbaccession.out 2> /mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/log/dudes_db_custom_profile.log ' returned non-zero exit status 1.
  File "/mss1/programs/europa/miniconda2/envs/metametaenv/opt/metameta/tools/dudes_db_custom.sm", line 56, in __rule_dudes_db_custom_profile
  File "/mss1/programs/europa/miniconda2/envs/metametaenv/lib/python3.6/concurrent/futures/thread.py", line 56, in run
Removing output files of failed job dudes_db_custom_profile since they might be corrupted:
/mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/dudes.dbaccession.out
Will exit after finishing currently running jobs.
Touching output file /mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/clark_db/.taxondata.
Finished job 34.
13 of 49 steps (27%) done
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message

How can I fix it?

Thank you,
Jongin

@pirovc
Copy link
Owner

pirovc commented Aug 27, 2019

Dear Jongin,

Can you please send me the contetnts of the file: /mss2/projects/META2/taxonomy_classification/metameta/db/custom_db/log/dudes_db_custom_profile.log

Vitor

@jongin333
Copy link
Author

jongin333 commented Aug 27, 2019

Hi, Victor.

Unfortunately, the log file (dudes_db_custom_profile.log) is empty.
I attach the final log file and the list of log directory. I hope that those are helpful.

-rw-r--r-- 1 jongin bioinfo    0 Aug 17 11:05 clark_db_custom_1.log
-rw-r--r-- 1 jongin bioinfo  114 Aug 17 11:05 clark_db_custom_1.time
-rw-r--r-- 1 jongin bioinfo  101 Aug 17 12:26 clark_db_custom_2.log
-rw-r--r-- 1 jongin bioinfo  120 Aug 17 12:26 clark_db_custom_2.time
-rw-r--r-- 1 jongin bioinfo   20 Aug 17 11:07 dudes_db_custom_2.log
-rw-r--r-- 1 jongin bioinfo  117 Aug 17 11:07 dudes_db_custom_2.time
-rw-r--r-- 1 jongin bioinfo 1392 Aug 17 12:19 dudes_db_custom_3.log
-rw-r--r-- 1 jongin bioinfo  134 Aug 17 12:19 dudes_db_custom_3.time
-rw-r--r-- 1 jongin bioinfo    0 Aug 17 12:19 dudes_db_custom_profile.log
-rw-r--r-- 1 jongin bioinfo    0 Aug 17 11:05 kaiju_db_custom_1.log
-rw-r--r-- 1 jongin bioinfo  118 Aug 17 11:05 kaiju_db_custom_1.time
-rw-r--r-- 1 jongin bioinfo  476 Aug 17 12:19 kaiju_db_custom_2.log
-rw-r--r-- 1 jongin bioinfo  117 Aug 17 12:19 kaiju_db_custom_2.time
-rw-r--r-- 1 jongin bioinfo  976 Aug 17 12:19 kaiju_db_custom_3.log
-rw-r--r-- 1 jongin bioinfo  114 Aug 17 12:19 kaiju_db_custom_3.time
-rw-r--r-- 1 jongin bioinfo   10 Aug 17 11:07 kaiju_db_custom_4.log
-rw-r--r-- 1 jongin bioinfo  116 Aug 17 11:07 kaiju_db_custom_4.time
-rw-r--r-- 1 jongin bioinfo    0 Aug 17 11:05 kaiju_db_custom_profile.log
-rw-r--r-- 1 jongin bioinfo  114 Aug 17 11:05 kaiju_db_custom_profile.time
-rw-r--r-- 1 jongin bioinfo   20 Aug 17 11:32 kraken_db_custom_2.log
-rw-r--r-- 1 jongin bioinfo  126 Aug 17 11:32 kraken_db_custom_2.time

metameta_2019-08-17_12-26-40.log

Thank you,
Jongin

@pirovc
Copy link
Owner

pirovc commented Aug 28, 2019

Hi Jongin,

Can you check what is the python version in your environment? I guess the error can be related to this: pirovc/dudes#2 if you have python > 3.5

In this case, you can try to change the following line in your metameta installation (the file should be at /mss1/programs/europa/miniconda2/envs/metametaenv/opt/metameta/tools/dudes_db_custom.sm based on your log file

shell: """python3 -c 'import numpy as np; npzfile = np.load("{input.npz}"); print("\\n".join(npzfile["refids_lookup"].item().keys()))' | sed 's/>//g' > {output} 2> {log}"""

for this one:

shell: """python3 -c 'import numpy as np; npzfile = np.load("{input.npz}", allow_pickle=True); print("\\n".join(npzfile["refids_lookup"].item().keys()))' | sed 's/>//g' > {output} 2> {log}"""

but I guess the error will also occur when running dudes and you should change dudes code as well.

You can also try to install python 3.5 in the metametaenv and everything should be fine.

Cheers
Vitor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants