Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLAST Database error: Not a valid version 4database #35

Open
nana-marinbio opened this issue Apr 7, 2022 · 0 comments
Open

BLAST Database error: Not a valid version 4database #35

nana-marinbio opened this issue Apr 7, 2022 · 0 comments

Comments

@nana-marinbio
Copy link

nana-marinbio commented Apr 7, 2022

Hi BLCA team, @YJulyXing , @yingeddi2008, @koopkaup ,@qunfengdong

I tryed to run BLCA with the standard NCBI 16S microbial database, but the taxonomy and taxID files created are empty. See below where the error message showed up and the final dataset created.

$python 1.subset_db_acc.py

--2022-04-06 13:24:43-- ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdmp.zip
=> ‘db/taxdmp.zip’
Resolving ftp.ncbi.nih.gov (ftp.ncbi.nih.gov)... 130.14.250.13, 165.112.9.228, 2607:f220:41f:250::229, ...
Connecting to ftp.ncbi.nih.gov (ftp.ncbi.nih.gov)|130.14.250.13|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD (1) /pub/taxonomy ... done.
==> SIZE taxdmp.zip ... 57732633
==> PASV ... done. ==> RETR taxdmp.zip ... done.
Length: 57732633 (55M) (unauthoritative)

taxdmp.zip 100%[=============================================================================================================================================================>] 55.06M 5.20MB/s in 17s

2022-04-06 13:25:04 (3.33 MB/s) - ‘db/taxdmp.zip’ saved [57732633]

Archive: db/taxdmp.zip
inflating: db/citations.dmp
inflating: db/delnodes.dmp
inflating: db/division.dmp
inflating: db/gencode.dmp
inflating: db/merged.dmp
inflating: db/names.dmp
inflating: db/nodes.dmp
inflating: db/gc.prt
inflating: db/readme.txt

NCBI Taxonomy Database downloaded!
blastdbcmd is located in your PATH!
--2022-04-06 13:25:07-- https://ftp.ncbi.nlm.nih.gov/blast/db/16S_ribosomal_RNA.tar.gz
Resolving ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)... 165.112.9.228, 130.14.250.13, 2607:f220:41e:250::7, ...
Connecting to ftp.ncbi.nlm.nih.gov (ftp.ncbi.nlm.nih.gov)|165.112.9.228|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 38650164 (37M) [application/x-gzip]
Saving to: ‘db/16S_ribosomal_RNA.tar.gz’

16S_ribosomal_RNA.tar.gz 100%[=============================================================================================================================================================>] 36.86M 5.67MB/s in 8.3s

2022-04-06 13:25:16 (4.43 MB/s) - ‘db/16S_ribosomal_RNA.tar.gz’ saved [38650164/38650164]

16S_ribosomal_RNA.ndb
16S_ribosomal_RNA.nhr
16S_ribosomal_RNA.nin
16S_ribosomal_RNA.nnd
16S_ribosomal_RNA.nni
16S_ribosomal_RNA.nog
16S_ribosomal_RNA.nos
16S_ribosomal_RNA.not
16S_ribosomal_RNA.nsq
16S_ribosomal_RNA.ntf
16S_ribosomal_RNA.nto
taxdb.btd
taxdb.bti
BLAST Database error: Error: Not a valid version 4 database.

Accession and TaxIDs from 16S_ribosomal_RNA are extracted!!

Loading 16S_ribosomal_RNA TaxID list...
('>> Loading 16S_ribosomal_RNA TaxID list...DONE!\nTotal', 0, 'TaxID to fetch!')
Loading nodes.dmp...This will take 10-15 minutes, please wait!
1 >> Open nodes.dmp file!
('Scanning nodes.dmp line:', 100000)
('Scanning nodes.dmp line:', 200000)
('Scanning nodes.dmp line:', 300000)
('Scanning nodes.dmp line:', 400000)
('Scanning nodes.dmp line:', 500000)
('Scanning nodes.dmp line:', 600000)
('Scanning nodes.dmp line:', 700000)
('Scanning nodes.dmp line:', 800000)
('Scanning nodes.dmp line:', 900000)
('Scanning nodes.dmp line:', 1000000)
('Scanning nodes.dmp line:', 1100000)
('Scanning nodes.dmp line:', 1200000)
('Scanning nodes.dmp line:', 1300000)
('Scanning nodes.dmp line:', 1400000)
('Scanning nodes.dmp line:', 1500000)
('Scanning nodes.dmp line:', 1600000)
('Scanning nodes.dmp line:', 1700000)
('Scanning nodes.dmp line:', 1800000)
('Scanning nodes.dmp line:', 1900000)
('Scanning nodes.dmp line:', 2000000)
('Scanning nodes.dmp line:', 2100000)
('Scanning nodes.dmp line:', 2200000)
('Scanning nodes.dmp line:', 2300000)
('Scanning nodes.dmp line:', 2400000)
2 >> Close nodes.dmp file!
('>> Remaining # TaxID to look for:', 0)
Loading nodes.dmp...DONE!
Loading names.dmp...
This may take 20-30 minutes. Please wait!
('>> ', 50000, 'names recorded!')
('>> ', 100000, 'names recorded!')
('>> ', 150000, 'names recorded!')
('>> ', 200000, 'names recorded!')
('>> ', 250000, 'names recorded!')
('>> ', 300000, 'names recorded!')
('>> ', 350000, 'names recorded!')
('>> ', 400000, 'names recorded!')
('>> ', 450000, 'names recorded!')
('>> ', 500000, 'names recorded!')
('>> ', 550000, 'names recorded!')
('>> ', 600000, 'names recorded!')
('>> ', 650000, 'names recorded!')
('>> ', 700000, 'names recorded!')
('>> ', 750000, 'names recorded!')
('>> ', 800000, 'names recorded!')
('>> ', 850000, 'names recorded!')
('>> ', 900000, 'names recorded!')
('>> ', 950000, 'names recorded!')
('>> ', 1000000, 'names recorded!')
('>> ', 1050000, 'names recorded!')
('>> ', 1100000, 'names recorded!')
('>> ', 1150000, 'names recorded!')
('>> ', 1200000, 'names recorded!')
('>> ', 1250000, 'names recorded!')
('>> ', 1300000, 'names recorded!')
('>> ', 1350000, 'names recorded!')
('>> ', 1400000, 'names recorded!')
('>> ', 1450000, 'names recorded!')
('>> ', 1500000, 'names recorded!')
('>> ', 1550000, 'names recorded!')
('>> ', 1600000, 'names recorded!')
('>> ', 1650000, 'names recorded!')
('>> ', 1700000, 'names recorded!')
('>> ', 1750000, 'names recorded!')
('>> ', 1800000, 'names recorded!')
('>> ', 1850000, 'names recorded!')
('>> ', 1900000, 'names recorded!')
('>> ', 1950000, 'names recorded!')
('>> ', 2000000, 'names recorded!')
('>> ', 2050000, 'names recorded!')
('>> ', 2100000, 'names recorded!')
('>> ', 2150000, 'names recorded!')
('>> ', 2200000, 'names recorded!')
('>> ', 2250000, 'names recorded!')
('>> ', 2300000, 'names recorded!')
('>> ', 2350000, 'names recorded!')
('>> ', 2400000, 'names recorded!')
('>> ', 2450000, 'names recorded!')
('>> ', 2500000, 'names recorded!')
('>> ', 2550000, 'names recorded!')
('>> ', 2600000, 'names recorded!')
('>> ', 2650000, 'names recorded!')
('>> ', 2700000, 'names recorded!')
('>> ', 2750000, 'names recorded!')
('>> ', 2800000, 'names recorded!')
('>> ', 2850000, 'names recorded!')
('>> ', 2900000, 'names recorded!')
('>> ', 2950000, 'names recorded!')
('>> ', 3000000, 'names recorded!')
('>> ', 3050000, 'names recorded!')
('>> ', 3100000, 'names recorded!')
('>> ', 3150000, 'names recorded!')
('>> ', 3200000, 'names recorded!')
('>> ', 3250000, 'names recorded!')
('>> ', 3300000, 'names recorded!')
('>> ', 3350000, 'names recorded!')
('>> ', 3400000, 'names recorded!')
('>> ', 3450000, 'names recorded!')
('>> ', 3500000, 'names recorded!')
('>> ', 3550000, 'names recorded!')
Loading names.dmp...DONE!
Generating a subset of taxonomy file.
Taxonomy file generated!

ubuntu@ubuntu18:~/BLCA/db$ ls -lh
total 660M
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr 6 13:25 16S_ribosomal_RNA.ACC.taxonomy
-rw-rw-r-- 1 ubuntu ubuntu 0 Apr 6 13:25 16S_ribosomal_RNA.ACCtaxID
-rw-rw-r-- 1 ubuntu ubuntu 1.2M Mar 29 06:36 16S_ribosomal_RNA.ndb
-rw-rw-r-- 1 ubuntu ubuntu 3.4M Mar 29 06:36 16S_ribosomal_RNA.nhr
-rw-rw-r-- 1 ubuntu ubuntu 262K Mar 29 06:36 16S_ribosomal_RNA.nin
-rw-rw-r-- 1 ubuntu ubuntu 175K Mar 29 06:36 16S_ribosomal_RNA.nnd
-rw-rw-r-- 1 ubuntu ubuntu 748 Mar 29 06:36 16S_ribosomal_RNA.nni
-rw-rw-r-- 1 ubuntu ubuntu 88K Mar 29 06:36 16S_ribosomal_RNA.nog
-rw-rw-r-- 1 ubuntu ubuntu 437K Mar 29 06:36 16S_ribosomal_RNA.nos
-rw-rw-r-- 1 ubuntu ubuntu 262K Mar 29 06:36 16S_ribosomal_RNA.not
-rw-rw-r-- 1 ubuntu ubuntu 7.9M Mar 29 06:36 16S_ribosomal_RNA.nsq
-rw-rw-r-- 1 ubuntu ubuntu 592K Mar 29 06:36 16S_ribosomal_RNA.ntf
-rw-rw-r-- 1 ubuntu ubuntu 154K Mar 29 06:36 16S_ribosomal_RNA.nto
-rw-rw-r-- 1 ubuntu ubuntu 37M Mar 29 06:36 16S_ribosomal_RNA.tar.gz
-rw-r--r-- 1 ubuntu ubuntu 19M Apr 6 12:29 citations.dmp
-rw-r--r-- 1 ubuntu ubuntu 4.2M Apr 6 12:26 delnodes.dmp
-rw-r--r-- 1 ubuntu ubuntu 452 Apr 6 12:20 division.dmp
-rw-r--r-- 1 ubuntu ubuntu 17K Apr 6 12:29 gc.prt
-rw-r--r-- 1 ubuntu ubuntu 4.9K Apr 6 12:20 gencode.dmp
-rw-r--r-- 1 ubuntu ubuntu 1.2M Apr 6 12:26 merged.dmp
-rw-r--r-- 1 ubuntu ubuntu 208M Apr 6 12:29 names.dmp
-rw-r--r-- 1 ubuntu ubuntu 159M Apr 6 12:28 nodes.dmp
-rw-rw---- 1 ubuntu ubuntu 2.7K Sep 11 2019 readme.txt
-rw-rw-r-- 1 ubuntu ubuntu 149M Mar 29 06:36 taxdb.btd
-rw-rw-r-- 1 ubuntu ubuntu 16M Mar 29 06:36 taxdb.bti
-rw-rw-r-- 1 ubuntu ubuntu 56M Apr 6 13:25 taxdmp.zip

I checked the 16S_ribosomal_RNA.tar.gz file with md5sum and it is OK!
I have the latest blast ncbi-blast-2.13.0+,
python3.6.9
biopython-1.79
clustalo 1.2.4-1
muscle5.0.98_linux

How can manage this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant