*Note: This script is an effort to replicate the results from the paper "Talk of the Network: A Complex Systems Look at the Underlying Process of Word-of-Mouth", Goldenberg, Libai and Muller (2001). This is a self-didactic attempt.*

In [None]:
using Distributions

In [None]:
workspace()
srand(20130810)

# Introduction 

This paper explores the pattern of personal communication betwee an individual's core friends group (strong ties) and a wider set of acquaintances (weak ties). This remarkable study is one of the first ones in marketing that explored the influence of social networks on the diffusion of marketing messages. The key questions investogated in the context of information dissemination are:

- What matters more - strong ties or weak ties?
- What effect does the size of an average individuals network have?
- How does advertising interact with the diffusion through weak ties and that through strong ties

# Building the network substrates

## The node object

Since this study employs a set of synthetic networks, where each of the nodes have a fixed number of strong and weak ties, we cannot use existing graph types to build these networks. However, given the simple configuration of the network, we conceptualize a network as a list of node objects. Each node object has the id, a vector of strong ties, a vector of weak ties and activation status as the parameters of the node. 

In [None]:
mutable struct Node
    id::Int
    weak_ties::Vector{Int} 
    strong_ties::Vector{Int} 
    status::Bool
end

## Initializing the network

We initialize the network as a list of empty `Node` objects and then build the neighborhoods of individual nodes by adhering to the number of strong and weak ties that each `Node` has.

In [None]:
function initalize_network(n_nodes::Int, n_strong_ties::Int, n_weak_ties::Int)
    
    # Initialize an empty network
    
    G = [Node(i, [], [], false) for i in 1:n_nodes] 
    node_ids = [node.id for node in nodes]
    
    # Wire the network according to the number of strong and weak ties
    # When wiring with random nodes, take care that the subject node and
    # already existing neighbors are not sampled again
    
    for node in G
        while length(node.weak_ties) < n_weak_ties
            rand_nbr = sample(node_ids[1:end .!= node.id])
            if !(rand_nbr in node.weak_ties || rand_nbr in node.strong_ties)
                push!(node.weak_ties, rand_nbr)
            end
        end
        while length(node.strong_ties) < n_strong_ties
            rand_nbr = sample(node_ids[1:end .!= node.id])
            if !(rand_nbr in node.weak_ties || rand_nbr in node.strong_ties)
                push!(node.strong_ties, rand_nbr)
            end
        end
    end
    
    return G
end

# Model

## Assumptions

Each individual in the substrate network (referred to as nodes) are connected to the same number of strong ties (varied from 5 - 29) and weak ties (varied from 5 - 29). The probability of activation of a node, i.e., an uninformed individual turning to informed can happen in three ways: through a strong tie with probability $\beta_s$, through a weak tie with probability $\beta_w$ or through external marketing efforts with probability $\alpha$. In line with conventional wisdom, we assume $\alpha < \beta_w < \beta_s$. 

At timestep $t$, if an individual is connected to $m$ strong ties and $j$ weak ties, the probability of the individual being informed in this time step is:

$$
p(t) = 1 - (1- \alpha)(1 - \beta_w)^j(1 - \beta_s)^m
$$

We are interested in two outcome variables:
1. The number of time steps elapsed till 15% of the network engages 
2. The number of time steps elapsed till 95% of the network engages

## Parameter ranges

In [None]:
println("Number of strong ties per node (j): ", floor.(Int, linspace(5, 29, 7)))
println("Number of weak ties per node(w): ", floor.(Int, linspace(5, 29, 7)))
println("Effect of advertising (α): ", collect(linspace(0.0005, 0.01, 7)))
println("Effect of weak ties (β_w): ", collect(linspace(0.005, 0.015, 7)))
println("Effect of strong ties (β_s): ", collect(linspace(0.01, 0.07, 7)))

## Execution

- At $t = 0$, the status of all nodes is set to `false`

- For each node, the probability of being informed is calculated as per the above equation. A random draw $U$ is made from a standard uniform distribution and compared with the probability. If $U < p(t)$ the status of the node is changed to `true`

- In each successive time step the previous step is repeated till 95% of the total network (of size 3000) engages

We now look at several helper functions that execute the above logic

### Reset node status

At the beginning of each simulation, we call the following function to set the status of all the nodes to `false`

In [None]:
function reset_node_status!(G::Vector{Node})
    for node in G
        node.status = false
    end
end

### Activation probability

At each time step, the vector holding the probabilty of activation for each node is calculated using the following function. Since the network is small (3000 nodes) we use an array comprehension. For larger networks, preallocated arrays and explicit looping might be better.

In [None]:
function calc_activation_prob(G::Vector{Node})
    