Skip to content
This repository has been archived by the owner on Feb 16, 2019. It is now read-only.

Itep data limitations

mattb112885 edited this page Apr 17, 2013 · 2 revisions

Database limits

The presence\absence analysis will not work for more than 1997 genomes by default (the maximum number of columns in a SQLite database is 2000, and 3 columns are reserved for annotation and cluster information). However, due to time and memory limitations we recommend using ITEP for 300 genomes at most. Running BLASTP and BLASTN on 16 cores with 300 genomes would take about a week (clustering and RPSBLAST would take additional time) and the final database including BLASTP, BLASTN, RPSBLAST, genome sequences and clustering results would require about 2 TB of hard drive space for 300 average-sized bacterial genomes.

The maximum length of a contig that can be imported into ITEP is 1 billion base pairs because that is the default maximum string length in SQLite. The limit can be increased by re-compiling SQLite with certain compile flags.

Time and memory requirements

The database grows roughly as O(N^2) where N is the number of genomes in both time and space.

Clone this wiki locally