Skip to content
This repository has been archived by the owner on Feb 16, 2019. It is now read-only.

Visualizing homology patterns

mattb112885 edited this page Sep 10, 2016 · 11 revisions

This tutorial assumes that you have already run setup_step1.sh and setup_step2.sh with the default ("all") cluster group for C. beijerinckii, C. novii and A. woodii.

Some cluster exploration

Lets start with our 6-phosphofructokinase again and see if we build a cluster, what the evidence looks like. You can search for what other genes fall into clusters with the 6-phosphofructokinase(s) in C. beijerinckii by running the following command (only the clusters in the run with all organisms are shown)

$ db_getGenesWithAnnotation.py "6-phosphofructokinase" | grep "Cbei" | db_getClustersContainingGenes.py
all_I_2.0_c_0.4_m_maxbit	649	fig|290402.1.peg.4768
all_I_2.0_c_0.4_m_maxbit	6149	fig|290402.1.peg.581
all_I_2.0_c_0.4_m_maxbit	6150	fig|290402.1.peg.992

As you can see the three C. beijerinckii genes all fall into different clusters with these cluster settings. If you remember that the lower cluster IDs correspond to larger clusters, it is likely that the fig|290402.1.peg.4768 copy is the most strongly conserved. Indeed, if we look at the contents of these clusters it indicates that the other two are predicted to be singletons in C. beijerinckii at this level of homology:

$ db_getPresenceAbsenceTable.py -r "all_I_2.0_c_0.4_m_maxbit" | grep -w -P "(649|6149|6150)"
runid   clusterid       annote  Clostridium_beijerinckii_NCIMB_8052     Clostridium_novyi_NT    Acetobacterium_woodii_DSM_1030
all_I_2.0_c_0.4_m_maxbit	649	6-phosphofructokinase_YP_877380.1_NT01CX_1297	fig|290402.1.peg.4768	fig|386415.1.peg.406	fig|931626.1.peg.1249
all_I_2.0_c_0.4_m_maxbit	6149	6-phosphofructokinase_YP_001307727.1_Cbei_0584	fig|290402.1.peg.581	NONE	NONE
all_I_2.0_c_0.4_m_maxbit	6150	6-phosphofructokinase_YP_001308138.1_Cbei_0998	fig|290402.1.peg.992	NONE	NONE

Producing a GML file for a cluster

Note: This section requires you to install networkx, a python package for importing and exporting standard Graph formats, among other things.

We provide a function db_makeClusterGraph.py that takes one or more cluster\runID pairs from standard input and outputs GML files. GML is a standard Graph format that is widely supported in graph viewers such as Cytoscape (see https://en.wikipedia.org/wiki/Graph_Modelling_Language for some details). Our implementation creates a graph with the genes in a cluster as nodes, significant BLAST hits as edges, the BLAST scores (with a variety of supported metrics including minbit, maxbit, and log E-value) as edge weights, and annotation and other gene information as attributes for the nodes. In addition, the edges will be colored according to their weight automatically. This allows ready identification of issues such as fusion genes bringing two clusters together, or clusters breaking into pieces with one specific cutoff.

For example to make a GML file with maxbit edge weights for the cluster of interest above (with a cutoff of 0.4), we would type:

$ makeTabDelimitedRow.py "all_I_2.0_c_0.4_m_maxbit" "649" | db_makeClusterGraph.py -m maxbit -u 0.4

The function creates a GML file "all_I_2.0_c_0.4_m_maxbit_649_maxbit_0.40_blastp.gml" which can be read in graph viewers.

The function can also optionally use BLASTN as the scoring basis between the genes in a cluster.

Producing a GML file directly from BLAST results

Note that to run this section correctly, you will need networkx and matplotlib to be installed on your computer.

If you want to make a GML graph from BLASTP or BLASTN results (e.g. all genes homologous to a specific protein), instead of based on a pre-computed cluster, a script is also available to do that:

$  db_makeGraphFromBlastResults.py -m method -u cutoff [options] < BLAST_results

Valid scoring methods are provided in the help text. If you want to use one of the normalized bit score metrics you need to provide a table with the self-bit scores for query and target as the last two columns. Such a table is always provided when getting the BLAST results using one of the ITEP interfaces:

$ db_getBlastResultsBetweenSpecificGenes.py
$ db_getBlastResultsBetweenSpecificOrganisms.py
$ db_getBlastResultsContainingGenes.py

For example, this pipeline will create a GML file with a link between Cbei_1843 and each gene to which it has a maxbit score higher than 0.2.

$ db_getGenesWithAnnotation.py "Cbei_1843" | db_getBlastResultsContainingGenes.py -g 1 | db_makeGraphBlastResults.py -m maxbit -c 0.2

See this tutorial for more details on the use of the BLAST results search interfaces.

Importing GML into Cytoscape

Cytoscape (along with many other similar visualization tools) supports the import and visualization of graphs in GML format. A vizmapper is provided (in lib/ITEP_vizmapper.props) that can be loaded into Cytoscape to make the visualization prettier. Different parts of it work in 2.8 and 3.0.1.

Here's some directions on how to import a GML file into Cytoscape 2.8:

  1. Go to file -> import -> Vizmap property file...
  2. Locate the ITEP vizmap file "ITEP_vizmapper.props" and load it.
  3. Click "close" on the dialog box that appears.
  4. Go to file -> import -> network (multiple file types)
  5. Select "local", and then find your GML file (note - it must have a .gml extension for Cytoscape to recognize it as a GML file). Click "open" them "import".
  6. In the "Vizmapper" tab click on the vizmapper called "complete_I_2.0_...". This should turn the nodes yellow and make organism names appear in them.
  7. Lay out the nodes by going to layout -> cytoscape layouts -> force-directed layout -> weight

Here's how to do it in 3.0.1:

  1. When you open Cytoscape it asks you for how to begin - select "from network file..." (on the left side)
  2. Find your GML file and select it.
  3. Go to file -> import -> Vizmap file...
  4. Locate the "ITEP_vizmapper.props" file and load it.
  5. In the "Vizmapper" tab click on the vizmapper called "complete_I_2.0_...". This should turn the nodes yellow (it should also make organism names appear but it doesn't work at the moment).
  6. Lay out the nodes by going to layout -> perfuse force-directed layout -> weight
Clone this wiki locally