# 03 Workflow for Constructing a Multimodal Cancer Network

In this and the following notebooks (04, 05, 06, 07, and 08), we provide a detailed look into how to construct multimodal networks. All data required for this tutorial can be found in the `datasets/cancer_example` directory of [this repository](https://github.com/snap-stanford/mambo). The data should have been downloaded automatically if this github repository was cloned. 

In this workflow, we construct a **multimodal cancer network** centered around genes that are frequently mutated in cancer patients. 

The multimodal cancer network has 5 modes:

<center>**Chemical, Disease, Function, Gene,** and **Protein.**</center>

The network has 11 link types: 

<center>**Chemical-Chemical, Chemical-Protein, Disease-Chemical, Disease-Disease, Disease-Function, Disease-Gene, Function-Function, Gene-Gene, Gene-Protein, Protein-Function,** and **Protein-Protein.**</center> 

This network was originally constructed by taking a much larger network and selecting a sub-network based around protien-coding genes with mutations in the largest number of patients according to data provided by the International Cancer Genome Consortium (ICGC). We begin with the top 500 genes out of the total 20,326 protein-coding genes provided by the ICGC data portal, and then include all nodes in other modes (excluding genes) that are within one-hop in the large network. As a note, Mambo can handle networks that are orders of magnitude larger that the multimodal cancer network analyzed here, a feat explained in [09 Giga-Scale Multimodal Biological Network Case Study](09 Giga-Scale Multimodal Biological Network Case Study.ipynb).

## Step 1: Parse Data

Some minimal amount of data preprocessing may be necessary depending on the dataset. We need edges in a tsv (tab-separated value) format. If edges are provded in a different format, like JSON or XML, the relationships may need to be parsed accordingly. In this example, all of the data has already been processed. 

In a later example on the giga-scale multimodal network, we use raw data from several knowledge databases. In that example, we describe in greater detail how to preprocess data, including things like filtering rows that have non-zero values for our relationship types of interest. See [10 Supplementary - Data Filtering in the Giga-Scale Multimodal Biological Network.ipynb](10 Supplementary - Data Filtering in the Giga-Scale Multimodal Biological Network.ipynb) for an example of how this is done.

## Step 2: Create Mode Tables

See [04 Creating Mode Tables](04 Creating Mode Tables.ipynb) for an example on how to the create mode tables.

## Step 3: Create Link Tables

See [05 Creating Link Tables](05 Creating Link Tables.ipynb) for an example on how to create link tables. 

## Step 4: Construct the Multimodal Network from Mode and Link Tables

See [06 Constructing a Multimodal Network from Mode and Link Tables](06 Constructing a Multimodal Network from Mode and Link Tables.ipynb) for an example on how to create a multimodal network based on previously constructed mode and link tables.

## Step 5: Load a Multimodal Network and Perform Analytics

See [07 Loading a Multimodal Network](07 Loading a Multimodal Network.ipynb) for an example on how to load tables from the disk. See [08 Performing Analytics on the Multimodal Network](08 Performing Analytics on the Multimodal Network.ipynb) for an example on performing analytics.