Skip to content
This repository has been archived by the owner on Feb 16, 2019. It is now read-only.

Obtaining the complete sequences of contigs, genes or proteins

mattb112885 edited this page Nov 8, 2013 · 2 revisions

Proteins and genes

The sequences of proteins and the coding DNA are included in the results for db_getGeneInformation.py (see obtaining information about genes for further information).

Contigs

The sequence of a contig for which you know the contig ID within ITEP can be obtained using the db_getContigSeqs.py function. Note that ITEP modifies contig IDs from the Genbank files by appending the organism ID so that the same contig name in different organisms will always be given a different ID.

You can get a FASTA file for an entire organism by combining this command with db_getContigs.py in one of these two ways (note the quotes around the organism name, which are necessary because of spaces in the names):

$ db_getContigs.py -o "organism_name" | db_getContigSeqs.py -f
$ db_getContigs.py -i organism_id | db_getContigSeqs.py -f

For example the following two commands return the same sequences.

$ db_getContigs.py -i 290402.1 | db_getContigSeqs.py -f > Cbe_fasta
$ db_getContigs.py -o "Clostridium beijerinckii NCIMB 8052" | db_getContigSeqs.py -f > Cbe_fasta;
Clone this wiki locally