Skip to content

Function: incorpIprScan

G. Kenney edited this page Jul 28, 2023 · 8 revisions

incorpIprScan

This tool is designed to supplement gbToIMG, though it can also be used to add InterPro annotations (or updated annotations in general) to IMG-style metadata from any source. NCBI GenBank-derived metadata often does not list the families from which gene annotations are derived, which is a problem for this workflow. This function takes a tab-delimited InterProScan output file and an IMG-formatted metadata file for the same genes and adds Pfam, TIGRfam, and/or InterPro families to the metadata file. Note that you may need to run InterProScan on your local cluster or install it on a linux machine or VM; there is not a Windows or MacOS-compatible version, and actually running it is thus not included in this workflow. That said, a basic command outputting the required tab-delimited output file looks like this:

./interproscan.sh -i 20210101_gb2img_genE_neighborSeqs.fa -t p -o 20210101_genE_iprScan.txt -f TSV -iprlookup

Use of incorpIprScan

incorpIprScanOutput <- incorpIprScan(iprScanSource = "20210101_genE_iprScan.txt", imgNeighborsSource = "20210101_gb2img_genE_neighborData.txt", geneName = "genE", addPfam = TRUE, addTigrfam = TRUE, addIPRfam = TRUE)

Note that in this example, I'm using a gbToIMG output file as my metadata source (and I used the sequence output file as my input into InterProScan), but you could also use this to add updated InterPro annotations to older IMG data. In this example, I'm populating the Pfam and TIGRfam fields as well, since GenBank files rarely have this info.

Options

  • iprScanSource Filename. For text file containing InterProScan output. Required.
  • imgNeighborsSource Filename. For a text file IMG-formatted metadata for neighbors of genes of interest. Required.
  • geneName Character string. Name of gene family of interest (purely for file naming). Required.
  • addPfam Boolean. Specifies whether or not Pfam annotations should be added to the metadata sheet.
  • addTigrfam Boolean. Specifies whether or not TIGRfam annotations should be added to the metadata sheet.
  • addIprfam Boolean. Specifies whether or not InterPro family annotations should be added to the metadata sheet.

Output

  • 20210101_incorpIprScan_genE_neighborData.txt File. Text file containing updated gene metadata.
  • incorpIprScanOutput List. Contains imgNeighborsData (a data frame of IMG-styled metadata for neighbors of genes of interest, now with more annotation info.)