In [8]:
import Base: reset

In [9]:
using Distributions, LightGraphs, ProgressMeter, RCall, DataFrames

In [10]:
srand(20130810)

MersenneTwister(UInt32[0x01332bfa], Base.dSFMT.DSFMT_state(Int32[-1772545288, 1073534108, 1077066014, 1072915095, -2146195133, 1072843413, 301764553, 1073404181, 750472136, 1073628106  …  -1491411563, 1073194977, 716119449, 1072893711, 1632331784, 758890923, 1433693833, -13012230, 382, 0]), [1.80007, 1.93436, 1.06188, 1.64044, 1.9828, 1.88763, 1.93974, 1.05582, 1.88232, 1.58916  …  1.82218, 1.14308, 1.85535, 1.99342, 1.02735, 1.52537, 1.8365, 1.86189, 1.87699, 1.35362], 382)

The grammar is a set of simple building blocks that generalize any diffusion simulation. 

 > `reset(N) -> seed(node_status, N, seeder, seed_size) -> 
            evolve(node_status, N, evolver, stopping_criterion) -> 
            sum(node_status)`  

### 1. Initialization

The starting point of any diffusion simulation is the `reset` function that takes a network object and sets the status of all the nodes in the network to `false`. The output is a BitVector.

In [12]:
function reset(N::LightGraphs.SimpleGraphs.SimpleGraph{Int})
    return falses(nv(N))
end

reset (generic function with 4 methods)

### 2. Seeding

The next step is to seed a subset of the network using a seeding function that dictates which specific nodes should be targeted. This mutates the state of the `node_status` vector in place. A purer version of this function would initialize a new vector that is the copy of `node_status`, update this vector and return a copy of the new vector. Let us define these two versions and check if there are any speed differences between the versions.

In [22]:
function seed!(node_status::BitVector, 
               N::LightGraphs.SimpleGraphs.SimpleGraph{Int},
               seeder::Function,
               seed_size::Int)
    
    seed = seeder(N, seed_size) # should return a vector of indices
    node_status[seed] = true
    
end

seed! (generic function with 1 method)

In [23]:
function seed(node_status::BitVector,
              N::LightGraphs.SimpleGraphs.SimpleGraph{Int},
              seeder::Function,
              seed_size::Int)
    
    new_status = copy(node_status)
    seed = seeder(N, seed_size) # should return a vector of indices
    new_status[seed] = true
    
    return new_status
end

seed (generic function with 1 method)

We illustrate the working of these functions and measure the timing using two example. 

#### 2.1 random seeding

In [24]:
function seed_random(N::LightGraphs.SimpleGraphs.SimpleGraph{Int}, seed_size::Int)
    return sample(vertices(N), seed_size, replace = false)
end

seed_random (generic function with 1 method)

In [25]:
N = watts_strogatz(10^7, 3, 0.5)

{10000000, 10000000} undirected simple Int64 graph

In [32]:
node_status = reset(N);

In [33]:
@time seed!(node_status, N, seed_random, 1000)

  0.000206 seconds (21 allocations: 54.398 KiB)


In [34]:
sum(node_status)

In [38]:
node_status = reset(N);

In [39]:
@time node_status = seed(node_status, N, seed_random, 1000);

  0.000795 seconds (26 allocations: 1.245 MiB)


In [40]:
sum(node_status)

Modification in place is about 4 times faster.

#### 2.2 seeding the nodes with highest pagerank centrality 

In [46]:
function seed_pagerank(N::LightGraphs.SimpleGraphs.SimpleGraph{Int}, seed_size::Int)
    return sortperm(pagerank(N))[1:seed_size]
end

seed_pagerank (generic function with 1 method)

In [50]:
node_status = reset(N)
@time seed!(node_status, N, seed_pagerank, 1000)
sum(node_status)

  5.449269 seconds (22 allocations: 305.184 MiB, 20.05% gc time)


In [49]:
node_status = reset(N)
@time node_status = seed(node_status, N, seed_pagerank, 1000)
sum(node_status)

  5.479471 seconds (27 allocations: 306.377 MiB, 20.01% gc time)


This shows an interesting pattern when the bottleneck is shifted to the seeder algorithm. The more complex the seeding algorithm, lesser is the difference in speed between mutating and non-mutating updates.

By keeping the `seeder` function is independent of the `seed` function, we can quickly compare the relative benefits of the diffusion process across seeding methods.

### 3. Evolution