# Build custom databases with segment flu genomes using kraken_flu utility
A new utilty [kraken_flu](https://gitlab.internal.sanger.ac.uk/malariagen1/misc_utils/kraken_flu) was created to generate Kraken2 databases with segmented flu genomes from any set of input files (either from kraken2 or directly from NCBI).

The tool peforms two tasks:
1. Fiter influenza genomes to keep only those that have 8 full-length segments
2. Create a reorganisee taxonomy with new taxa for segments of the influenza A viruses

This notebook builds a complete database with NCBI RefSeq plus the NCBI Influenza resource.

## directory paths and names

In [9]:
export DB_NAME=refseq_ncbiFlu_022124
export BASE_DIR=/lustre/scratch126/gsu/team112/personal/fs5/rvi_dev/krakenDBs/kraken_flu/

# taxonomy data is universal and ideally reused for every DB we build, so the direcotry is not specific to this DB
export TAX_PATH=${BASE_DIR}/downloads/taxonomy_download/

export LIB_PATH=${BASE_DIR}/downloads/sequence_download/${DB_NAME}/

export DB_PREP_DIR=${BASE_DIR}/db_prep/${DB_NAME}/
export DB_PATH=${BASE_DIR}/databases/${DB_NAME}

In [10]:
mkdir -p ${BASE_DIR}
mkdir -p ${TAX_PATH}
mkdir -p ${LIB_PATH}
mkdir -p ${DB_PREP_DIR}
mkdir -p ${DB_PATH}

## install the kraken_flu tool
The tool can be installed from local gitlab. Creating a venv for it.

In [1]:
python3 -m venv ~/kraken_flu

___This version is not yet in main branch, checking out latest commit in dev branch___

In [1]:
~/kraken_flu/bin/pip install kraken_flu@git+ssh://git@gitlab.internal.sanger.ac.uk/malariagen1/misc_utils/kraken_flu.git@0a3cfe98d8f3ab09fb6a799b54e80cbed254f1c8

Collecting kraken_flu@ git+ssh://****@gitlab.internal.sanger.ac.uk/malariagen1/misc_utils/kraken_flu.git@0a3cfe98d8f3ab09fb6a799b54e80cbed254f1c8
  Cloning ssh://****@gitlab.internal.sanger.ac.uk/malariagen1/misc_utils/kraken_flu.git (to revision 0a3cfe98d8f3ab09fb6a799b54e80cbed254f1c8) to /tmp/pip-install-_pkhzbq8/kraken-flu_b3f7796379d44bc0bcde16eb1ef3d101
  Running command git clone --filter=blob:none --quiet 'ssh://****@gitlab.internal.sanger.ac.uk/malariagen1/misc_utils/kraken_flu.git' /tmp/pip-install-_pkhzbq8/kraken-flu_b3f7796379d44bc0bcde16eb1ef3d101
  Running command git rev-parse -q --verify 'sha^0a3cfe98d8f3ab09fb6a799b54e80cbed254f1c8'
  Running command git fetch -q 'ssh://****@gitlab.internal.sanger.ac.uk/malariagen1/misc_utils/kraken_flu.git' 0a3cfe98d8f3ab09fb6a799b54e80cbed254f1c8
  Running command git checkout -q 0a3cfe98d8f3ab09fb6a799b54e80cbed254f1c8
  Resolved ssh://****@gitlab.internal.sanger.ac.uk/malariagen1/misc_utils/kraken_flu.git to commit 0a3cfe98d8f3ab09

In [2]:
~/kraken_flu/bin/kraken_flu -v

kraken_flu 1.2.1.dev53+g0a3cfe9


## Download data
Get viral taxonomy and sequence data from NCBI

In [8]:
module load kraken2/2.1.2

### Taxonomy data
Use the kraken2 build tool to download taxonomy files from [NCBI](https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/).

In [12]:
kraken2-build --download-taxonomy --db ${TAX_PATH}

Downloading nucleotide gb accession to taxon map... done.
Downloading nucleotide wgs accession to taxon map... done.
Downloaded accession to taxon map(s)
Downloading taxonomy tree data... done.
Uncompressing taxonomy data... done.
Untarring taxonomy tree data... done.


In [21]:
tree ${TAX_PATH}

/lustre/scratch126/gsu/team112/personal/fs5/rvi_dev/krakenDBs/kraken_flu//downloads/taxonomy_download/
└── taxonomy
    ├── accmap.dlflag
    ├── citations.dmp
    ├── delnodes.dmp
    ├── division.dmp
    ├── gc.prt
    ├── gencode.dmp
    ├── images.dmp
    ├── merged.dmp
    ├── names.dmp
    ├── nodes.dmp
    ├── nucl_gb.accession2taxid
    ├── nucl_wgs.accession2taxid
    ├── readme.txt
    ├── taxdump.dlflag
    ├── taxdump.tar.gz
    └── taxdump.untarflag

1 directory, 16 files


### Sequence data
Directly download from NCBI RefSeq release FTP website (file from 15/01/24

NCBI viral RefSeq (directly from NCBI now, not using the kraken2 pre-built)

In [11]:
cd ${LIB_PATH}
wget https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.1.1.genomic.fna.gz
gunzip viral.1.1.genomic.fna.gz

--2024-02-21 12:09:01--  https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/viral.1.1.genomic.fna.gz
Resolving wwwcache.sanger.ac.uk (wwwcache.sanger.ac.uk)... 172.30.152.200
Connecting to wwwcache.sanger.ac.uk (wwwcache.sanger.ac.uk)|172.30.152.200|:3128... connected.
Proxy request sent, awaiting response... 200 OK
Length: 168645964 (161M) [application/x-gzip]
Saving to: ‘viral.1.1.genomic.fna.gz’


2024-02-21 12:09:13 (14.0 MB/s) - ‘viral.1.1.genomic.fna.gz’ saved [168645964/168645964]



NCBI Infuenza FTP  
___NOTE___ that the resource is no longer kept updated (since Oct 2020) so this is used as a starting point but we still need to add later influenza genomes using the new NCBI API for

https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/

Here is a post showing how to programmatically interact with the new NCBI Virus site to obtain sequences:  
https://www.biostars.org/p/9562294/   

In [13]:
cd ${LIB_PATH}
wget https://ftp.ncbi.nih.gov/genomes/INFLUENZA/influenza.fna

--2024-02-21 13:07:32--  https://ftp.ncbi.nih.gov/genomes/INFLUENZA/influenza.fna
Resolving wwwcache.sanger.ac.uk (wwwcache.sanger.ac.uk)... 172.30.152.200
Connecting to wwwcache.sanger.ac.uk (wwwcache.sanger.ac.uk)|172.30.152.200|:3128... connected.
Proxy request sent, awaiting response... 200 OK
Length: 1429095618 (1.3G)
Saving to: ‘influenza.fna’


2024-02-21 13:08:59 (15.6 MB/s) - ‘influenza.fna’ saved [1429095618/1429095618]



combine the RefSeq and NCBI virus so we can build the DB in one pass  
___NOTE___: it is possible to keep the files separate but in that case, the taxonomy files that are used as input into the second dataset would have to be the output of the first. If this is not done, the two taxonomy outputs would not be compatible because they would create new taxa with clashing IDs.

In [14]:
cat ${LIB_PATH}/viral.1.1.genomic.fna  ${LIB_PATH}/influenza.fna  >  ${LIB_PATH}/combined_refseq_flu.fna

In [15]:
tree ${LIB_PATH}

/lustre/scratch126/gsu/team112/personal/fs5/rvi_dev/krakenDBs/kraken_flu//downloads/sequence_download/refseq_ncbiFlu_022124/
├── combined_refseq_flu.fna
├── influenza.fna
└── viral.1.1.genomic.fna

0 directories, 3 files


## Run the kraken-flu tool
The tool creates a new directory of taxonomy and sequence files.  

Using an exception to the "complete flu genomes" filter for the avian flu reference because we want this one in the DB but it does not have all 8 segments in RefSeq.

___NOTE___ The current implementation of the kraken-flu tool reads sequence and taxonomy files into RAM and therefore needs a significant amount of RAM. The process got killed on a 5GB node but work fine on 10GB.


In [18]:
~/kraken_flu/bin/kraken_flu \
    --taxonomy_path  ${TAX_PATH}/taxonomy \
    --fasta_path ${LIB_PATH}/combined_refseq_flu.fna \
    --out_dir ${DB_PREP_DIR} \
    --filter > ${DB_PREP_DIR}/log \
    --filter_except "A/Goose/Guangdong/1/96(H5N1)" \
    2>&1

In [19]:
tree ${DB_PREP_DIR}

/lustre/scratch126/gsu/team112/personal/fs5/rvi_dev/krakenDBs/kraken_flu//db_prep/refseq_ncbiFlu_022124/
├── library
│   └── library.fna
├── log
└── taxonomy
    ├── names.dmp
    └── nodes.dmp

2 directories, 4 files


## Prepare new kraken2 DB directory
Create the direcotry and copy the kraken-flu results into it, then use the kraken-build tool to add the library to the new DB

In [20]:
mkdir -p ${DB_PATH}

Copy the taxonomy created by kraken-flu in the new dir

In [21]:
cp -r ${DB_PREP_DIR}/taxonomy ${DB_PATH}

also need the large NCBIU accession to tax ID file here for kraken-build

In [22]:
ln -s ${TAX_PATH}/taxonomy/nucl_gb.accession2taxid ${DB_PATH}/taxonomy

In [23]:
tree ${DB_PATH}

/lustre/scratch126/gsu/team112/personal/fs5/rvi_dev/krakenDBs/kraken_flu//databases/refseq_ncbiFlu_022124
└── taxonomy
    ├── names.dmp
    ├── nodes.dmp
    └── nucl_gb.accession2taxid -> /lustre/scratch126/gsu/team112/personal/fs5/rvi_dev/krakenDBs/kraken_flu//downloads/taxonomy_download//taxonomy/nucl_gb.accession2taxid

1 directory, 3 files


use kraken-tool to add the library file to the new DB

In [24]:
mkdir ${DB_PATH}/library

In [25]:
kraken2-build \
    --add-to-library ${DB_PREP_DIR}/library/library.fna \
    --db ${DB_PATH}

Masking low-complexity regions of new file... done.
Added "/lustre/scratch126/gsu/team112/personal/fs5/rvi_dev/krakenDBs/kraken_flu//db_prep/refseq_ncbiFlu_022124//library/library.fna" to library (/lustre/scratch126/gsu/team112/personal/fs5/rvi_dev/krakenDBs/kraken_flu//databases/refseq_ncbiFlu_022124)


## Create the kraken2 DB

In [26]:
kraken2-build \
    --build \
    --db ${DB_PATH}

Creating sequence ID to taxonomy ID map (step 1)...
Found 3295/230189 targets, searched through 51319536 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 12049/230189 targets, searched through 187874420 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 14524/230189 targets, searched through 188353601 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 17811/230189 targets, searched through 191446096 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 23232/230189 targets, searched through 193185158 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 25538/230189 targets, searched through 193187478 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 28359/230189 targets, searched through 193490016 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 31480/230189 targets, searched through 193657294 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 33514/230189 targets, searched through 196454795 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 41840/230189 targets, searched through 218183103 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 50556/230189 targets, searched through 218346306 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 61946/230189 targets, searched through 218644806 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 65856/230189 targets, searched through 218683253 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 69885/230189 targets, searched through 218699719 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 74746/230189 targets, searched through 218803661 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 83436/230189 targets, searched through 220699019 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 90532/230189 targets, searched through 220872565 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 105538/230189 targets, searched through 221068943 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 116882/230189 targets, searched through 221228564 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 126645/230189 targets, searched through 221366431 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 135738/230189 targets, searched through 221508186 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 139089/230189 targets, searched through 222512605 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 145493/230189 targets, searched through 222594403 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 155891/230189 targets, searched through 222761647 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 166636/230189 targets, searched through 222984772 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 177516/230189 targets, searched through 223204688 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 184127/230189 targets, searched through 223450853 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 189064/230189 targets, searched through 225280584 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 193663/230189 targets, searched through 225306478 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 197694/230189 targets, searched through 225410304 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 208878/230189 targets, searched through 225582731 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 213758/230189 targets, searched through 231146059 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 215779/230189 targets, searched through 231155274 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 217487/230189 targets, searched through 231161464 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 220195/230189 targets, searched through 231167375 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 223017/230189 targets, searched through 231176760 accession IDs...

IOPub message rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_msg_rate_limit`.

Current values:
ServerApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
ServerApp.rate_limit_window=3.0 (secs)



Found 230186/230189 targets, searched through 327566049 accession IDs, search complete.
lookup_accession_numbers: 3/230189 accession numbers remain unmapped, see unmapped.txt in DB directory
Sequence ID to taxonomy ID map complete. [3m17.829s]
Estimating required capacity (step 2)...
Estimated hash table requirement: 668180480 bytes
Capacity estimation complete. [32.085s]
Building database files (step 3)...
Taxonomy parsed and converted.
CHT created with 18 bits reserved for taxid.
Completed processing of 417664 sequences, 1220666255 bp
Writing data to disk...  complete.
Database files completed. [3m52.411s]
Database construction complete. [Total: 7m42.486s]


In [27]:
echo ${DB_PATH}

/lustre/scratch126/gsu/team112/personal/fs5/rvi_dev/krakenDBs/kraken_flu//databases/refseq_ncbiFlu_022124
