# Infer network

Infer a network and plot with gene labels.

## More details
Plot networks inferred from datasets simulated from realistic network structures. Several datasets are available, from different organisms, different numbers of genes in the networks and different numbers of cells in the dataset. To infer a network from a different dataset, make sure the data file is in the same format as the simulated datasets, and change `dataset_name` to the relevant path.

For very large datasets (tens of thousands of cells, thousands of genes), network inference may take a long time. Benchmarking scripts are included to indicate how number of cells, number of genes and algorithm affect the time taken to infer a network.

The network inference algorithms rank all edges, between every possible pair of genes. In order to progress from a ranked list to a network, a threshold must be set, indicating what percentage of the highest-ranked edges to include.

In [1]:
# Include packages

using NetworkInference
using LightGraphs
using GraphPlot

[1m[36mINFO: [39m[22m[36mRecompiling stale cache file /home/fede/.julia/lib/v0.6/Discretizers.ji for module Discretizers.
[39m[1m[36mINFO: [39m[22m[36mRecompiling stale cache file /home/fede/.julia/lib/v0.6/Distributions.ji for module Distributions.
[39m[1m[36mINFO: [39m[22m[36mRecompiling stale cache file /home/fede/.julia/lib/v0.6/LightGraphs.ji for module LightGraphs.
[39m[1m[36mINFO: [39m[22m[36mRecompiling stale cache file /home/fede/.julia/lib/v0.6/GraphPlot.ji for module GraphPlot.
[39m

In [5]:
# Customize the dataset, algorithm and percentage threshold (for plotting)

# Use these options for datasets generated from GeneNetWeaver...
# 50 or 100
number_of_genes = 50
# "ecoli1", "ecoli2", "yeast1", "yeast2" or "yeast3"
organism = "yeast1"
# "large", "medium", or "small"
dataset_size = "small"

# ...Or override dataset_name to point to your data file:
dataset_name = string("../simulated_datasets/", number_of_genes, "_", organism, "_", dataset_size, ".txt")

"../simulated_datasets/50_yeast1_small.txt"

In [None]:
dataset_name

In [2]:
# Choose an algorithm
# PIDCNetworkInference(), PUCNetworkInference(), CLRNetworkInference() or MINetworkInference()
algorithm = PIDCNetworkInference()

# Keep the top x% highest-scoring edges
# 0.0 < threshold < 1.0
threshold = 0.15

0.15

In [None]:
dataset_name = string("/home/fede/src/network_inference_tutorials/cluster_4.tsv")

In [6]:
# Get the genes and discretize the expression levels

@time genes = get_nodes(dataset_name);

# Troubleshooting: the default discretizer is "bayesian_blocks"
# If this doesn't work, try the "uniform_width" discretizer:
# @time genes = get_nodes(dataset_name, discretizer = "uniform_width");

  3.679625 seconds (2.52 M allocations: 165.882 MiB, 1.15% gc time)


In [11]:
genes

50-element Array{NetworkInference.Node,1}:
 NetworkInference.Node("G1", [2, 2, 3, 3, 3, 4, 2, 2, 3, 2  …  3, 3, 4, 2, 2, 2, 1, 3, 1, 2], 4, [0.135714, 0.607143, 0.235714, 0.0214286])                                   
 NetworkInference.Node("G2", [3, 3, 3, 3, 3, 3, 3, 2, 2, 3  …  3, 2, 3, 2, 2, 4, 3, 2, 3, 4], 4, [0.0285714, 0.235714, 0.642857, 0.0928571])                                  
 NetworkInference.Node("G3", [2, 3, 2, 2, 2, 1, 3, 2, 2, 2  …  2, 2, 2, 2, 2, 2, 2, 1, 1, 2], 3, [0.25, 0.657143, 0.0928571])                                                 
 NetworkInference.Node("G4", [2, 2, 2, 2, 2, 2, 2, 2, 2, 2  …  2, 2, 3, 2, 3, 2, 2, 2, 2, 2], 3, [0.192857, 0.65, 0.157143])                                                  
 NetworkInference.Node("G5", [2, 2, 2, 2, 3, 2, 2, 2, 2, 2  …  2, 2, 2, 2, 2, 2, 2, 1, 1, 2], 4, [0.164286, 0.757143, 0.05, 0.0285714])                                       
 NetworkInference.Node("G6", [3, 2, 4, 3, 2, 1, 3, 2, 2, 3  …  1, 2, 2, 2, 3, 2, 1

In [4]:
# Infer the network

@time network = InferredNetwork(algorithm, genes);

Gamma distribution failed for G1 and G28; used normal instead.
Gamma distribution failed for G1 and G42; used normal instead.
Gamma distribution failed for G2 and G28; used normal instead.
Gamma distribution failed for G2 and G42; used normal instead.
Gamma distribution failed for G3 and G28; used normal instead.
Gamma distribution failed for G3 and G42; used normal instead.
Gamma distribution failed for G4 and G28; used normal instead.
Gamma distribution failed for G4 and G42; used normal instead.
Gamma distribution failed for G5 and G28; used normal instead.
Gamma distribution failed for G5 and G42; used normal instead.
Gamma distribution failed for G6 and G28; used normal instead.
Gamma distribution failed for G6 and G42; used normal instead.
Gamma distribution failed for G7 and G28; used normal instead.
Gamma distribution failed for G7 and G42; used normal instead.
Gamma distribution failed for G8 and G28; used normal instead.
Gamma distribution failed for G8 and G42; used normal i

In [5]:
write_network_file("network_julia_yeast1", network)

In [2]:
read_network = read_network_file("../networks/50_yeast1_large_mi.txt")

NetworkInference.InferredNetwork(NetworkInference.Node[NetworkInference.Node("G4", Int64[], 0, Float64[]), NetworkInference.Node("G45", Int64[], 0, Float64[]), NetworkInference.Node("G38", Int64[], 0, Float64[]), NetworkInference.Node("G48", Int64[], 0, Float64[]), NetworkInference.Node("G28", Int64[], 0, Float64[]), NetworkInference.Node("G2", Int64[], 0, Float64[]), NetworkInference.Node("G15", Int64[], 0, Float64[]), NetworkInference.Node("G40", Int64[], 0, Float64[]), NetworkInference.Node("G33", Int64[], 0, Float64[]), NetworkInference.Node("G21", Int64[], 0, Float64[])  …  NetworkInference.Node("G26", Int64[], 0, Float64[]), NetworkInference.Node("G29", Int64[], 0, Float64[]), NetworkInference.Node("G18", Int64[], 0, Float64[]), NetworkInference.Node("G20", Int64[], 0, Float64[]), NetworkInference.Node("G9", Int64[], 0, Float64[]), NetworkInference.Node("G10", Int64[], 0, Float64[]), NetworkInference.Node("G8", Int64[], 0, Float64[]), NetworkInference.Node("G13", Int64[], 0, Floa



In [4]:
read_network.nodes

50-element Array{NetworkInference.Node,1}:
 NetworkInference.Node("G4", Int64[], 0, Float64[]) 
 NetworkInference.Node("G45", Int64[], 0, Float64[])
 NetworkInference.Node("G38", Int64[], 0, Float64[])
 NetworkInference.Node("G48", Int64[], 0, Float64[])
 NetworkInference.Node("G28", Int64[], 0, Float64[])
 NetworkInference.Node("G2", Int64[], 0, Float64[]) 
 NetworkInference.Node("G15", Int64[], 0, Float64[])
 NetworkInference.Node("G40", Int64[], 0, Float64[])
 NetworkInference.Node("G33", Int64[], 0, Float64[])
 NetworkInference.Node("G21", Int64[], 0, Float64[])
 NetworkInference.Node("G39", Int64[], 0, Float64[])
 NetworkInference.Node("G14", Int64[], 0, Float64[])
 NetworkInference.Node("G44", Int64[], 0, Float64[])
 ⋮                                                  
 NetworkInference.Node("G23", Int64[], 0, Float64[])
 NetworkInference.Node("G43", Int64[], 0, Float64[])
 NetworkInference.Node("G26", Int64[], 0, Float64[])
 NetworkInference.Node("G29", Int64[], 0, Float64[])
 Ne

In [7]:
# Get the adjacency matrix, then make a LightGraphs.SimpleGraphs.SimpleGraph

adjacency_matrix, labels_to_ids, ids_to_labels = get_adjacency_matrix(network, threshold)

(Bool[false false … false true; false false … false false; … ; false false … false false; true false … false false], Dict("G4"=>4,"G45"=>45,"G38"=>38,"G48"=>48,"G28"=>28,"G2"=>2,"G15"=>15,"G40"=>40,"G33"=>33,"G21"=>21…), Dict(2=>"G2",11=>"G11",39=>"G39",46=>"G46",25=>"G25",42=>"G42",29=>"G29",8=>"G8",20=>"G20",14=>"G14"…))



In [17]:
? get_adjacency_matrix

search: [1mg[22m[1me[22m[1mt[22m[1m_[22m[1ma[22m[1md[22m[1mj[22m[1ma[22m[1mc[22m[1me[22m[1mn[22m[1mc[22m[1my[22m[1m_[22m[1mm[22m[1ma[22m[1mt[22m[1mr[22m[1mi[22m[1mx[22m



```
get_adjacency_matrix(inferred_network::InferredNetwork, threshold = 1.0; <keyword arguments>)
```

Gets an adjacency matrix given an InferredNetwork and a threshold.

Arguments:

  * `inferred_network`: network that was inferred
  * `threshold=0.1`: threshold above which to keep edges in the network
  * `absolute=false`: interpret threshold as an absolute confidence score

If `absolute` is false, threshold will be interpreted as the percentage of edges to keep.


In [11]:
writedlm("test.txt", adjacency_matrix)

In [None]:
graph = LightGraphs.SimpleGraphs.SimpleGraph(adjacency_matrix)

In [None]:
# Get the node labels, in order of index

number_of_nodes = size(adjacency_matrix)[1]
nodelabels = []
for i in 1 : number_of_nodes
    push!(nodelabels, ids_to_labels[i])
end 

In [None]:
# Plot the network at the given threshold

display(gplot(graph, nodelabel = nodelabels))