Skip to content
This repository has been archived by the owner on Feb 16, 2019. It is now read-only.

Cleaning up and reclaiming disk space

mattb112885 edited this page Apr 3, 2014 · 4 revisions

ITEP creates a large number of intermediate files and the database it creates is also quite large. This section talks about the purpose of these intermediate files and what can be done to reclaim disk space.

What NOT to delete

The following files are used in various ways by different ITEP scripts and should not be deleted even after the database is loaded:

$ITEP_ROOT/aliases/aliases
$ITEP_ROOT/db/DATABASE.sqlite
$ITEP_ROOT/groups
$ITEP_ROOT/organisms
$ITEP_ROOT/orthomcl.config_sample

The input data in $ITEP_ROOT/raw and $ITEP_ROOT/genbank should also be maintained in the same locations - the analysis scripts don't use them but you will need them if you need to make updates to your database and may end up using them them for downstream analysis.

VACUUMing your database

Over the course of building the database (particularly in step 1), a couple of large tables are created, manipulated, and then dropped. SQLite will by default keep the free space for re-use when new tables are created or new records or added. This means it will not release the space for other programs to use.

After you are finished with all of the database building scripts, you can reclaim any remaining empty space in the database by running the provided wrapper script for SQLite's VACUUM command:

$ ./cleanupSqliteTables.sh TRUE

You can also issue the VACUUM command manually by using sqlite to open the db/DATABASE.sqlite file, if you prefer.

  • WARNING: The reason the VACUUM command is not automatically included in the build scripts is that performing a VACUUM requires construction of a temporary database file that is (at most) as large as the original database; thus it requires a large amount of disk space for large databases to perform. The free space must be available on the partition containing SQLite's tmp directory. SQLite uses the TMPDIR environment variable to search for temp space, so you can try changing that to a partition with lots of space if you have one.

Removing intermediate files

The FASTA files in faa/, fna/ and the modified table files in modtable/ can be safely removed. They are automatically regenerated from the files in raw/

Note that these directories tend to be relatively small.

Removing BLAST, BLASTN and RPSBLAST data

BLASTP and BLASTN data are generated for every pair of organisms and stored in blastres/ and blastn_res/, respectively. These files are not deleted in order to improve your ability to add more organisms to the database later (pairs of organisms that already have had BLAST run between them are not re-run). If you do not plan on adding any new organisms to your database, you can safely delete these files.

The paragraph above also holds true for RPSBLAST data (stored in rpsblast_res/). If you do not plan on adding any new organisms to your database, you can safely delete these files.

Clone this wiki locally