## Setting up

### Loading required modules

In [None]:
using NetworkInference, Assortativity, LightGraphs

### Defining paths and filenames

In [2]:
datadir = "data/"
data_filename = "data.csv"
groups_filename = "groups.tsv"

"groups.tsv"

## Use `NetworkInference.jl` to infer networks and write networks to file

### 1. Inferring networks

#### One step

In [3]:
pidc_net = infer_network(datadir * data_filename, PIDCNetworkInference(), delim = ',')
signed_pearson_net = infer_network(datadir * data_filename, CorrelationNetworkInference("Pearson", true, nothing), delim = ',')

typeof(pidc_net)

Getting nodes...
Inferring network...
Getting nodes...
Inferring network...


InferredNetwork

#### Multiple steps

In [4]:
# first get the nodes and expression values from the data
nodes, expression_values = get_nodes(datadir * data_filename, delim = ',', get_values = true)

# then infer networks
mi_net = InferredNetwork(MINetworkInference(), nodes)
signed_spearman_net = InferredNetwork(CorrelationNetworkInference("Spearman", true, expression_values), nodes)

typeof(signed_spearman_net)

InferredNetwork

### 2. Accessing network properties

In [5]:
# number of nodes and edges of an InferredNetwork
number_of_nodes = length(mi_net.nodes)
number_of_edges = length(mi_net.edges)

number_of_nodes, number_of_edges

(69, 2346)

In [6]:
# access nodes and edges of an InferredNetwork
signed_spearman_net.nodes[1], signed_spearman_net.edges[1]

(Node("ACTB", [4, 4, 4, 4, 5, 4, 4, 4, 5, 4  …  4, 4, 5, 2, 4, 4, 3, 5, 4, 3], 6, [0.021653543307086614, 0.07874015748031496, 0.17716535433070865, 0.5885826771653543, 0.1141732283464567, 0.01968503937007874]), NetworkInference.Edge(Node[Node("FGF4", [4, 4, 4, 4, 4, 4, 4, 4, 4, 4  …  1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 5, [0.46062992125984253, 0.011811023622047244, 0.15748031496062992, 0.3484251968503937, 0.021653543307086614]), Node("POU5F1", [5, 4, 5, 5, 7, 5, 4, 5, 5, 5  …  1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 7, [0.33661417322834647, 0.007874015748031496, 0.07086614173228346, 0.25984251968503935, 0.28346456692913385, 0.02952755905511811, 0.011811023622047244])], 0.8358907981753785))

### 3. Writing networks to file

In [7]:
# export the network to file where each line stores an edge in the form: node1 - node2 - weight of the edge
filename = "pidc_network.txt"
write_network_file(datadir * filename, pidc_net)

In [8]:
# one can also read a previously inferred network that has been exported
read_network_file(datadir * filename).nodes[1:10] # print the first 10 nodes of the imported network

10-element Array{Node,1}:
 Node("SALL4", Int64[], 0, Float64[])
 Node("LIN28", Int64[], 0, Float64[])
 Node("GLI2", Int64[], 0, Float64[]) 
 Node("VIM", Int64[], 0, Float64[])  
 Node("CDK2", Int64[], 0, Float64[]) 
 Node("CLDN6", Int64[], 0, Float64[])
 Node("PAX6", Int64[], 0, Float64[]) 
 Node("DNMT1", Int64[], 0, Float64[])
 Node("CD34", Int64[], 0, Float64[]) 
 Node("ACTB", Int64[], 0, Float64[]) 

## Using `Assortativity.jl`

### 1. Loading a network from file

In [9]:
# load the PIDC network inferred previously
pidc_network = load_network(datadir * filename)

# check the number of nodes
length(pidc_network.nodes)

69

### 2. Converting an `InferredNetwork` to a `SimpleGraph`

This produces a SimpleGraph storing the edges from the InferredNetwork in no particular order. Each node is represented by a number, which is why the `ids_to_genes` object is used to keep track of nodes.

In [10]:
# get groups annotations
genes_to_groups = get_labels_to_groups(pidc_network.nodes, datadir * groups_filename)

# assign an index to each group
groups_to_indices = get_groups_to_indices(genes_to_groups)



Dict{Symbol,Int64} with 12 entries:
  :Cell_Cycle          => 1
  :Core_Pluripotency   => 3
  :Primed_Pluripotency => 9
  :Chromatin_Modulator => 2
  :Endoderm            => 4
  :Trophoectoderm      => 12
  :Loading_Control     => 5
  :Neuroectoderm       => 8
  :Mesoderm            => 6
  :Naive_Pluripotency  => 7
  :Signalling          => 11
  :Primitive_Endoderm  => 10

In [11]:
# convert an InferredNetwork to a SimpleGraph
pidc_graph, ids_to_genes = InferredNetwork_to_LightGraph(pidc_network)

({69, 2346} undirected simple Int64 graph, Dict(68 => "LIFR",2 => "LIN28",11 => "BMPR1A",39 => "BMP4",46 => "DPPA4",25 => "REST",55 => "MYST3",42 => "HES1",29 => "PTPN11",58 => "TCF3"…))

One can then access the SimpleGraph properties e.g.

In [12]:
number_of_edges = ne(pidc_graph)
number_of_nodes = nv(pidc_graph)

number_of_edges, number_of_nodes

(2346, 69)

The correspondence between the SimpleGraph and the genes works this way:

In [13]:
first_edge = collect(edges(pidc_graph))[1] # look at the first edge in the network
source = first_edge.src # source of the edge
destination = first_edge.dst # destination of the edge

println("The first edge in the graph is $(ids_to_genes[source]) => $(ids_to_genes[destination]).")

The first edge in the graph is SALL4 => LIN28.


### 3. Calculating the assortativity coefficient of a `SimpleGraph`

First load the network at a given threshold number of edges; this is done on the InferredNetwork because the SimpleGraph does not keep track of edges order.

In [14]:
# load the PIDC network at a threshold of 150 edges
pidc_network_150 = set_threshold(pidc_network, 150)

# then convert the network at that threshold to a SimpleGraph
pidc_graph_150, ids_to_genes = InferredNetwork_to_LightGraph(pidc_network_150)

({67, 150} undirected simple Int64 graph, Dict(2 => "FGF5",11 => "DNMT3B",39 => "CDK2",46 => "SOX2",25 => "NR0B1",55 => "TBX3",42 => "SETDB1",66 => "PRMT6",58 => "UTF1",29 => "TRP53"…))

Then calculate the assortativity coefficient of the graph. Given only a `SimpleGraph`, the `assortativity` and `second_neighbour_assortativity` functions will return the degree assortativity, and one needs to give the dictionaries as illustrated below as arguments in order for them to return label assortativity.

In [15]:
# calculate the degree assortativity of the graph
degree_assortativity = assortativity(pidc_graph_150)
excess_degree_assortativity = assortativity(pidc_graph_150, excess_degree = true)
second_neighbour_degree_assortativity = second_neighbour_assortativity(pidc_graph_150)
second_neighbour_excess_degree_assortativity = second_neighbour_assortativity(pidc_graph_150, excess_degree = true)

# calculate the label assortativity of the graph
label_assortativity = assortativity(pidc_graph_150, genes_to_groups, groups_to_indices, ids_to_genes)
second_neighbour_label_assortativity = second_neighbour_assortativity(pidc_graph_150, genes_to_groups, groups_to_indices, ids_to_genes)

└ @ Assortativity C:\Users\leo-d\.julia\packages\Assortativity\VFq9d\src\measures.jl:61
└ @ Assortativity C:\Users\leo-d\.julia\packages\Assortativity\VFq9d\src\measures.jl:152


AssortativityObject(0.14659736650356758, [8 52 … 6 4; 52 248 … 28 14; … ; 6 28 … 52 0; 4 14 … 0 0], Dict(:Cell_Cycle => 1,:Primed_Pluripotency => 9,:Core_Pluripotency => 3,:Naive_Pluripotency => 7,:Chromatin_Modulator => 2,:Trophoectoderm => 12,:Neuroectoderm => 8,:Loading_Control => 5,:Signalling => 11,:Endoderm => 4…))

The excess degree assortativity may return a warning, as illustrated above, in case the excess degree for a given node is 0 (this means the degree of that node is 1) and as a result the connectivity matrix cannot be updated.

These functions return an AssortativityObject which is used to store specific information about the assortativty coefficient. Its properties can be accessed as follows:

In [16]:
@show label_assortativity.value # value of the assortativity coefficient
@show label_assortativity.connectivity # connectivity matrix
@show label_assortativity.groups # groups present in the connectivity matrix

typeof(label_assortativity)

label_assortativity.value = 0.1834038054968288
label_assortativity.connectivity = [0 8 4 0 1 0 0 1 0 0 0 0; 8 26 12 0 4 0 0 8 2 0 7 1; 4 12 10 0 2 0 14 8 1 0 5 2; 0 0 0 0 0 0 0 0 1 0 2 0; 1 4 2 0 0 0 0 0 0 0 3 0; 0 0 0 0 0 0 0 0 0 0 0 0; 0 0 14 0 0 0 30 1 0 0 7 0; 1 8 8 0 0 0 1 18 1 0 6 0; 0 2 1 1 0 0 0 1 6 0 2 0; 0 0 0 0 0 0 0 0 0 0 0 0; 0 7 5 2 3 0 7 6 2 0 4 0; 0 1 2 0 0 0 0 0 0 0 0 0]
label_assortativity.groups = Dict(:Cell_Cycle => 1,:Primed_Pluripotency => 9,:Core_Pluripotency => 3,:Naive_Pluripotency => 7,:Chromatin_Modulator => 2,:Trophoectoderm => 12,:Neuroectoderm => 8,:Loading_Control => 5,:Signalling => 11,:Endoderm => 4)


AssortativityObject

Not all groups from the original dataset may be present in the graph as is the case here (this is because only the top 150 edges from the original network are included and these do not connect nodes from all groups). The `AssortativityObject` can be filtered out for groups not currently present for clarity by doing:

In [17]:
filter_connectivity(label_assortativity)

AssortativityObject(0.1834038054968288, [0 8 … 0 0; 8 26 … 7 1; … ; 0 7 … 4 0; 0 1 … 0 0], Dict(:Cell_Cycle => 1,:Core_Pluripotency => 3,:Primed_Pluripotency => 8,:Naive_Pluripotency => 6,:Chromatin_Modulator => 2,:Loading_Control => 5,:Neuroectoderm => 7,:Trophoectoderm => 10,:Signalling => 9,:Endoderm => 4…))

### 4. Calculating other measures with `LightGraphs.jl`

One can look at other properties of the graph to put the assortativity coefficient in context, e.g.

In [18]:
clustering_coefficient = global_clustering_coefficient(pidc_graph_150)
communities_number = get_communities_number(pidc_graph_150)
graph_modularity = get_modularity(pidc_graph_150)
degree_sequence = degree(pidc_graph_150)
centrality = betweenness_centrality(pidc_graph_150)

clustering_coefficient, communities_number, graph_modularity, degree_sequence, centrality

(0.3953804347826087, 5, 0.5608222222222222, [6, 5, 4, 8, 5, 3, 11, 2, 8, 8  …  9, 1, 6, 5, 3, 3, 2, 1, 2, 1], [0.12268266047677814, 0.055696329813976875, 0.056643356643356645, 0.22378601485964708, 0.05086083344721765, 0.04285719559475216, 0.06955152583906182, 0.05994793333028629, 0.06226149773866762, 0.0667933494054432  …  0.04903812064651226, 0.0, 0.015363585635078848, 0.11955317733289758, 0.034215470758045846, 0.04123398509864984, 0.01229215396636211, 0.0, 0.030303030303030304, 0.0])

### 5. Noise and randomness

The `Assortativity.jl` package also implements a few functions to add noise in different objects, as illustrated below.

In [19]:
# rewire 50 edges (edges are swapped two by two, 25 times) at random
rewired_graph, rewired_ids_to_genes, rewired_edges = random_edge_rewiring(pidc_network_150, 25)

({67, 150} undirected simple Int64 graph, Dict(2 => "FGF5",11 => "DNMT3B",39 => "CDK2",46 => "SOX2",25 => "NR0B1",55 => "TBX3",42 => "SETDB1",66 => "PRMT6",58 => "UTF1",29 => "TRP53"…), 50)

In [20]:
# randomly delete 10 nodes from the network
random_node_deletion_graph, random_node_deletion_ids_to_genes = random_node_deletion(pidc_network_150, 10)

({57, 107} undirected simple Int64 graph, Dict(2 => "CDH2",11 => "JAG1",39 => "NANOG",46 => "FGFR2",25 => "BMP4",55 => "ZFP281",42 => "PAX6",29 => "REST",8 => "POU5F1",57 => "PRMT6"…))

In [21]:
# make a random graph with as many nodes and edges as the one given in input
random_graph, random_ids_to_genes = random_network(pidc_network_150)

({67, 150} undirected simple Int64 graph, Dict(2 => "FGF5",11 => "DNMT3B",39 => "CDK2",46 => "SOX2",25 => "NR0B1",55 => "TBX3",42 => "SETDB1",66 => "PRMT6",58 => "UTF1",29 => "TRP53"…))

In [22]:
# change the value of 20 genes (i.e. change their groups) in the genes_to_groups dictionary
randomised_genes_to_groups, randomised_groups_count = randomise_annotations(genes_to_groups, 20)

(Dict("SALL4" => :Core_Pluripotency,"LIN28" => :Primitive_Endoderm,"GLI2" => :Neuroectoderm,"NCAM1" => :Neuroectoderm,"VIM" => :Neuroectoderm,"CDK2" => :Cell_Cycle,"CLDN6" => :Primed_Pluripotency,"PAX6" => :Neuroectoderm,"DNMT1" => :Chromatin_Modulator,"CD34" => :Primed_Pluripotency…), 20)

One can then calculate the assortativity coefficient of these randomised objects:

In [23]:
@show label_assortativity.value # original label assortativity coefficient

rewired_label_assortativity = assortativity(rewired_graph, genes_to_groups, groups_to_indices, rewired_ids_to_genes)
random_node_deletion_label_assortativity = assortativity(random_node_deletion_graph, genes_to_groups, groups_to_indices, random_node_deletion_ids_to_genes)
random_label_assortativity = assortativity(random_graph, genes_to_groups, groups_to_indices, random_ids_to_genes)
randomised_groups_label_assortativity = assortativity(pidc_graph_150, randomised_genes_to_groups, groups_to_indices, random_ids_to_genes)

(rewired_label_assortativity.value, random_node_deletion_label_assortativity.value,
random_label_assortativity.value, randomised_groups_label_assortativity.value)

label_assortativity.value = 0.1834038054968288


(0.10412262156448206, 0.1744343655510374, 0.01318345463492964, 0.06353997368624718)

### 6. Printing and writing a network to JSON

In [24]:
JSON_network = InferredNetwork_to_JSON(pidc_network_150, genes_to_groups, groups_to_indices)

Dict{String,Array{T,1} where T} with 2 entries:
  "nodes" => JSON_node[JSON_node("CLDN6", :Primed_Pluripotency, 9), JSON_node("…
  "edges" => JSON_edge[JSON_edge("CLDN6", "IGF2", 1.99259), JSON_edge("FGF5", "…

JSON formatted networks can easily be printed to the REPL by running `using JSON; JSON.print(stdout, JSON_network, 4)`.

One can then export them:

In [25]:
write_JSON_network(JSON_network, "$(datadir)pidc_network_150.json")