# Vignette 2: MOON

In this vignette, we are going to use MOON (Dugourd et al. in preparation) to iteratively compute enrichment scores for a prior knowledge network, taking metabolic measurements and signalling cascades as inputs. 

In [1]:
import networkcommons as nc
import decoupler as dc
import pandas as pd
import networkx as nx

## 1. Input preparation

We first import the network, and check it does not contain unsigned interactions or self loops

In [2]:
!pwd

/home/victo/networkcommons/docs/src/vignettes


In [3]:
meta_network_df = pd.read_csv('../../../data/moon/meta_network.sif', sep='\t')

In [4]:
meta_network_df = meta_network_df.drop_duplicates(subset=['source', 'target', 'sign'], keep='first')
meta_network_df = meta_network_df.drop_duplicates(subset=['source', 'target'], keep=False)

We create the graph representation from our DataFrame

In [5]:
meta_network = nc.utils.network_from_df(meta_network_df, directed=True)

In [6]:
meta_network = nc.methods.meta_network_cleanup(meta_network) # equals R

We then download a dataset and read inputs:

In [7]:
moon_data = nc.data.omics.moon()

In [8]:
moon_data

{'sig':         TF     value
 0       AR  1.156582
 1    BACH1  2.399881
 2    CEBPA  3.687354
 3    CREB1  0.829149
 4     CTCF  2.914983
 5     E2F1  4.989779
 6     E2F4  3.972646
 7     EGR1  6.337803
 8     ELK1  0.444149
 9    EPAS1  4.268129
 10    ESR1  7.069928
 11    ETS1  5.957844
 12     FOS  5.009215
 13   FOXA1  2.338539
 14   FOXM1  1.206632
 15   FOXO3 -0.772054
 16   FOXP1  0.876896
 17   GATA2  1.052240
 18   GATA3  4.433932
 19   HIF1A  2.503899
 20   HNF4A  5.230794
 21     JUN  4.310749
 22    MITF  4.685015
 23     MYC  0.761681
 24   NFKB1  2.386302
 25  PRDM14  2.602170
 26    RARA  2.259669
 27    RELA  3.635926
 28   RUNX1  1.654963
 29    SOX2  0.903587
 30     SP1  2.073969
 31     SP3  0.190111
 32    SPI1  5.666462
 33  SREBF1  1.577459
 34   STAT1  2.219767
 35   STAT2  0.092127
 36   STAT3  1.241225
 37    TAL1  2.968578
 38  TFAP2A  0.182564
 39  TFAP2C  7.987909
 40    TP53  1.014723
 41    USF1  2.194528
 42     VDR  1.545408
 43     YY1  1.521236
 44

In [9]:
sig_input = moon_data['sig'].set_index('TF')['value'].to_dict()
rna_input = moon_data['rna'].set_index('gene')['value'].to_dict()
metab_input = moon_data['metab'].set_index('metab')['value'].to_dict()

For the metabolites, we add the compartment it's located in

In [10]:
metab_input = nc.methods.prepare_metab_inputs(metab_input, ["c", "m"])

In [11]:
meta_network = nc.methods.filter_pkn_expressed_genes(rna_input.keys(), meta_network)

We filter out those inputs that cannot be mapped to the prior knowledge network

In [12]:
sig_input = nc.methods.filter_input_nodes_not_in_pkn(sig_input, meta_network)
meta_network = nc.methods.keep_controllable_neighbours(sig_input, meta_network)
metab_input = nc.methods.filter_input_nodes_not_in_pkn(metab_input, meta_network)
meta_network = nc.methods.keep_observable_neighbours(metab_input, meta_network)
sig_input = nc.methods.filter_input_nodes_not_in_pkn(sig_input, meta_network)

## 2. Network compression

This is one of the most important parts of this vignette. Here, we aim to remove redundant information from the network, in order to reduce its size without compromising the information contained in it. A common example would be the following:

<img src="./img/network_compr.png" height="250" />

Here, the nodes B and C have been compressed into a single node, Parent of D. There is no loss of information because B and C regulate D in the same way (same edge sign), and A also regulates B and C the same way (same edge sign). However, in other cases, we would lose information:

<img src="./img/network_compression_nocases.png" height="250" />

In case 1, nodes B and C cannot be compressed because they exert opposite regulation onto D. If we compressed this situation, we would have a duplicated edge with opposite weights, which would create issues when computing the moon scores. Similarly in case 2, even B and C have the same edge signs towards D, A exert opposite regulation towards B and C. If we compressed B and C, we would have a duplicated edge between A and Parent of D, which poses similar issues as Case 1.

In [13]:
meta_network_compressed, signatures, dup_parents = nc.methods.compress_same_children(meta_network, sig_input, metab_input) # equals R

We clean the network again in case some self loops arose

In [14]:
meta_network_compressed = nc.methods.meta_network_cleanup(meta_network_compressed)

## 3. MOON scoring

Now it is time to compute the MOON scores from the compressed network. The network has been compressed by around a third of its original size, which increases computational efficiency. We will use the metabolic inputs and the signalling inputs to compute the MOON scores. After each optimisation, we check the sign consistency of the MOON scores, and remove those edges that turn out to be incoherent (the real TF enrichment scores are compared against the computed MOON scores and the sign of the edge). If there are incoherent edges, the function computes the MOON scores on the reduced network. The loop continues until it reaches a maximum number of tries (in our example, 10) or there are no incoherent edges left.

We can get now the GRN from DoRothEA, filtering by levels of confidence A and B.

In [15]:
tf_regn = dc.get_dorothea(levels = ['A', 'B'])

In [16]:
moon_network = meta_network_compressed.copy()

In [18]:
moon_res, moon_network = nc.methods.run_moon(
    meta_network_compressed,
    sig_input,
    metab_input,
    tf_regn,
    rna_input,
    n_layers=6,
    method='ulm',
    max_iter=10)

Iteration count: 1
Iteration count: 2
Iteration count: 3
Iteration count: 4
Iteration count: 5
Optimisation iteration 1 - Before: 12714, After: 12669
Iteration count: 1
Iteration count: 2
Iteration count: 3
Iteration count: 4
Iteration count: 5
Optimisation iteration 2 - Before: 12669, After: 12665
Iteration count: 1
Iteration count: 2
Iteration count: 3
Iteration count: 4
Iteration count: 5
Optimisation iteration 3 - Before: 12665, After: 12665
MOON: Solution converged after 3 iterations


## 4. Decompression and solution network

Once the MOON scores are computed, we need to restore the uncompressed nodes that were compressed in section 2. For this, we will use the signatures that we obtained when we compressed the network to map back the original nodes to the compressed ones. After that, we can retrieve a solution network that contains the nodes (with the subsequent MOON scores) that are in the vicinity of the signalling input(s) and are sign consistent in terms of signed interactions.

In [19]:
moon_res_dec = nc.methods.decompress_moon_result(moon_res, signatures, dup_parents, meta_network_compressed)

Now, we perform the decompression of the network, mapping the compressed nodes to their original components.

FInally, we reduce the solution network by removing incoherent edges and filtering for nodes with moon scores higher than 1. We retrieve a networkx.DiGraph that we will visualise, and an attributes dataframe with the moon scores.

In [20]:
res_network, att = nc.methods.reduce_solution_network(moon_res_dec, meta_network, 1, sig_input, rna_input) # equals R

As an optional step, we can translate the HMDB identifiers to more readable names (e.g HMDB0000122 is Glucose).

In [22]:
mapping_dict = pd.read_csv("../../../data/moon/hmdb_mapper_vec.csv", header=0).set_index('HMDB_id')['name'].to_dict()

In [23]:
translated_network, att_translated = nc.methods.translate_res(res_network, att, mapping_dict)

The resulted network can be used now for visualization purposes, or further studying of the topology can be conducted, as shown in Vignette 1. Since the network is quite big, it will not be shown in this notebook.