Skip to content

Exercise 2: Protein protein interaction networks

Martina Summer-Kutmon edited this page Jul 9, 2020 · 14 revisions

In this exercise, we will create a protein-protein interaction (PPI) network from the STRING database. As input, we will use a lung cancer dataset from TCGA and we will focus on very strongly up- and down-regulated genes (abs. log2FC > 3). We will investigate the network topology and visualize the degree and the log2FC on the nodes in the network for interpretation.

Setup

Step 1: Load TCGA data

  • Create Network
    • Download lung-cancer-data.tsv
    • Import the file to create a network using File → Import → Network from File... This will bring up the Import Network From Table dialog.
      • Click on Select None to disable all columns.
      • Click only on the GeneName column header and set this column as the Source Node column (green circle).
      • Click OK. You’ll see a warning about no edges, but that’s OK. This will create a grid of 1023 unconnected nodes, where each node represents a gene.
  • Add data to our nodes
    • Open the file again, but now use File → Import → Table from File... This will bring up the Import Columns From Table dialog.
      • Select to import table data to a network collection.
      • Make sure, you select the GeneName as the key attribute in the table (click on header - select key symbol).
      • This will import all of the data in the spreadsheet and associate each row with the corresponding node.
      • You should be able to see this in the Table Panel.

Step 2: Find significant expression changes

We’ll use the Filter tab in the Control Panel to find the strongly changed genes.

  • Open the Select tab and click on the + button to add a new condition. In this case we’re going to add a Column Filter. Select the "Node: log2FC" column and set the values to be between 3 and 12.257 (up-regulated genes)
  • Repeat the same process as above, but set the values to be between -7.461 and -3 (down-regulated genes).
  • Make sure you use the Match any (OR) option so both filters need to be fulfilled.
  • At the bottom, you should see that 519 nodes are selected.

Step 3: Create PPI network from STRING

  • Make sure only the 519 changed nodes are selected. Then the Node Table only shows the data for those selected nodes.
  • Select gene names from Table Panel. Select everything in the name column by clicking into the first cell and then dragging down until you get to the bottom. Then, do a copy (Control-C or Apple-C).
  • Paste gene names into STRING network search. In the Network tab of the Control Panel at the top should be a text field with an icon at the left. Click on that icon and select STRING protein query. (If you don’t see any STRING options, the stringApp hasn't been loaded.) Then click into the text field and paste the list of genes.
  • Set STRING search parameters. Next to the text field is a menu with a list of options. Change the Confidence (score) cutoff to 0.8 and the Maximum additional interactors to 0. This will get only high quality results (80% confidence) and add no extra proteins to the network.
  • Create the network. Click on the search icon (magnifying glass) to load the network. The network should appear similar to the figure below.

Step 4: Topological analysis

  • Go to Tools → Analyze network and analyze the network as an undirected graph (do not select the checkbox in the dialog that pops up).
  • The Analyzer panel will show up and new columns have been added to the Node Table containing the different node properties like Degree and Betweenness.
  • Click on "Show Node Degree Distribution" to see if the network can be considered a scale-free network (many nodes with low degrees and few hub nodes). There is a nice exponential decay for the degree, indicating that the network is indeed scale-free like most biological networks.

Step 5: Style the network to show differential expression

In this step, we’ll change the style of the network to also show the degree and log2FC of the genes.

  • Re-import expression data. First, we need to re-import the expression data for the new network created by the stringApp. Similar to the initial import, start by doing File → Import → Table from File. Again, select the lung-cancer-data.tsv file. However, now we need to use a different network column to match our names. Change the Key Column for Network: from shared name to query term. STRING uses Ensembl protein identifiers, but retains the original query term so we can match data against. Now select OK.
  • Disable structure image. STRING provides some nice images of the 3D structure of the proteins, but we need to disable those to be able to see our expression values clearly. Disable the images by going to Apps → STRING → Don’t show structure images. Also disable the STRING glass ball effect.
  • Create color gradient for expression data. To show the expression data, go the Style tab of the Control Panel and click on the middle square (Map.) of the three Fill Color controls. Set the Column to "log2FC" and set the Mapping Type to "Continuous Mapping". Double-click on the gradient to show the Continuous Mapping Editor. The default gradient is a Blue/Red color gradient, with blue representing underexpressed genes and red representing overexpressed genes. We only want to change the min and max values (Set Min and Max...) and select -5 and 5.
  • Lock node width and height. By default, the stringApp provides separate values for Node Width and Node Height. In our case, we just want to lock them to be the same so we only have to modify the Node Size. The Lock node width and height is a checkbox at the bottom of the Node tab in the Style tab of the Control Panel. Make sure that it’s checked.
  • Set the default node size. Click on the leftmost (Def.) box next to Size. Set the default size to "30.0".
  • Map degree to node size. Click on the middle (Map.) box next to Size. Choose "Degree" for the Column and "Continuous Mapping" for the Mapping Type. Then double-click on the ramp that appears to bring up the Continuous Mapping Editor. Click on the leftmost triangle and set the Node Size to "30". Then click on the right-most triangle and set the Node Size to "80". Then check OK.

Step 6: Examine the network.

  • Where are the up-regulated and down-regulated genes in the network? Do they cluster together?
  • Do you see any hub nodes?