Skip to content
This repository has been archived by the owner on Feb 16, 2019. It is now read-only.

Adding user defined gene data to itep

Matthew Benedict edited this page Nov 8, 2013 · 3 revisions

This tutorial describes the capabilities of setup_step5.sh. These capabilities are still somewhat under development but are very useful to integrate the results of tBLASTn or other tools for adding manually-edited or added genes to the database for presence-absence analysis. There are plans to add other analyses as well (e.g. to integrate these into tools for pulling out genes to tree, etc.).

How to format user-defined gene data

User-defined gene data should be located in a file called "user_genes" in the "userdata" folder of this repository. The expected format is a tab-delimited file with the following fields in this order:

  • user_geneid (REQUIRED): You have to specify an ID for your gene. Note - it should NOT be the same as any ITEP ID or it is likely that programs will get very confused.
  • organismid (REQUIRED): The ID for the organism in which the gene is found. It must match an existing organism ID in ITEP (you cannot add genes that are not associated with any organism in ITEP).
  • genetype (REQUIRED): Some string that indicates to you where the gene came from (e.g. TBLASTN)

The rest of these fields can be left blank:

  • contigid: The ITEP ID for the contig in which the gene is found.
  • startloc: The location on the contig of the first base of the start codon (starting from 1 at the first base of the contig)
  • stoploc: The location on the contig of the last base of the stop codon (starting from 1 at the first base of the contig)
  • runid: If you believe that this gene belongs in a particular cluster as already computed in ITEP, you can specify which cluster\runid pair it belongs in.
  • clusterid: If you believe that this gene belongs in a particular cluster as already computed in ITEP, you can specify which cluster\runid pair it belongs in.
  • seq: Nucleotide sequence for the gene
  • annotation: What function you believe the gene has in the cell.

Importing user-defined gene data into ITEP

The user-defined gene data can be imported by running:

./setup_step5.sh

This creates a new table to hold the user-defined data and also re-generates the presence-absence table by matching up the run and cluster IDs with those already in ITEP.

The presence-absence table is modified to include the user's specified gene IDs (user_geneid above) in addition to the ITEP IDs. You can extract ONLY user-specified genes from this table using

$ db_getPresenceAbsenceTable.py -u

Similarly, use the -i flag to get ONLY ITEP IDs (and not user-specified ones).

Clone this wiki locally