## An SI model to a Tomato Spotted Wilt Virus experiment
In this example we will perform SI model inference using data on the spread of Tomato Spotted Wilt Virus (TSWV) in a greenhouse from Hughes et al. (1997). In this experiment, 520 plants regularly spaced in a 10m x 26m greenhouse were examined for pressence of TSWV once every two weeks. Plants were not removed after showing signs of infection by TSWV. The experiment concluded after 14 weeks, which saw a total of 327 individual plants infected.

### References
* Hughes G, McRoberts N, Madden LV, Nelson SC (1997). “Validating Mathematical Models of Plant-Disease Progress in Space and Time.” Mathematical Medicine and Biology: A Journal of the Institute of Mathematics and Its Applications, 14(2), 85– 112.

## Initialization
Load `Pathogen`, as well as:
* `CSV` for extended CSV file I/O functionality,
* `Distributed`, `Random`, and `DelimitedFiles` from Julia Base,
* `DataFrames` for storing individual level information,
* `Distributions` for specification of priors for Bayesian inference,
* `Plots` for visualization (using whichever visualization backend your prefer), and


In [1]:
using CSV, Distributed, DelimitedFiles, Distances, LinearAlgebra, Plots, Random, DataFrames, Distributions, Pathogen
addprocs(3)
@everywhere using DataFrames, Distributions, Pathogen

┌ Info: Recompiling stale cache file /Users/justin/.julia/compiled/v1.0/CSV/HHBkp.ji for CSV [336ed68f-0bac-5ca0-87d4-7b16caf5d00b]
└ @ Base loading.jl:1187
┌ Info: Recompiling stale cache file /Users/justin/.julia/compiled/v1.0/Plots/ld3vC.ji for Plots [91a5bcdd-55d7-5caf-9e0b-520d859cae80]
└ @ Base loading.jl:1187
┌ Info: Recompiling stale cache file /Users/justin/.julia/compiled/v1.0/Distributions/xILW0.ji for Distributions [31c24e10-a181-5473-b8eb-7969acd0382f]
└ @ Base loading.jl:1187
┌ Info: Recompiling stale cache file /Users/justin/.julia/compiled/v1.0/Pathogen/08VJn.ji for Pathogen [58f1fdb4-9bff-11e8-2091-99b816fb7d3c]
└ @ Base loading.jl:1187


We'll set the seed for random number generation such that our results here are reproducible:

In [2]:
Random.seed!(5432)

MersenneTwister(UInt32[0x00001538], Random.DSFMT.DSFMT_state(Int32[-1744756614, 1072761743, -2028083806, 1073624598, -581637934, 1073626379, 327696063, 1073382742, 738168642, 1073600735  …  -595993142, 1072726157, 913087430, 1073114622, 2001550298, -624279679, 872770575, 1763491762, 382, 0]), [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0  …  0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], UInt128[0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000  …  0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x00000000000000000000000000000000, 0x000000000000

## TSWV Data
We'll import the TSWV, provided as csv files. 

The first CSV file contains X, Y locations in metres for each individual plant in the study. 

The second CSV file contains records of the first day in which each individual was observed as being infected by TSWV. `NaN` indicates that no signs of infection were observed within the 14 week study period.

In [3]:
# Use CSV.jl for DataFrames I/O
#
# We know the types of the columns, so we'll manually specify those. 
# * Individual IDs are `Int64`
# * X,Y coordinates are `Float64`s
risks = CSV.read(joinpath(@__DIR__, "02_TSWV_locations.csv"), types=[Int64; Float64; Float64])
pop = Population(risks)

# Will precalculate distances
pop.distances = [euclidean([risks[:x][i]; risks[:y][i]], [risks[:x][j]; risks[:y][j]]) for i = 1:pop.individuals, j = 1:pop.individuals]

# Use julia's included CSV interface for simple vector of observation times
raw_observations = readdlm(joinpath(@__DIR__, "02_TSWV_infection_observations.csv"))[:]

# Create an `EventObservations` object with `Pathogen.jl`
obs = EventObservations{SI}(raw_observations)

SI model observations (n=520)

In [4]:
# For performing inference we are going to set everything at or before t = 42 as being the starting state.
starting_states = [obs.infection[i] <= 42.0 ? State_I : State_S for i=1:obs.individuals]
# We will also set these observation times to -Inf
obs.infection[obs.infection .<= 42.0] .= -Inf
# Note: may provide a built in method for this all in the future!

13-element view(::Array{Float64,1}, [2, 83, 86, 113, 227, 270, 296, 305, 322, 346, 410, 438, 461]) with eltype Float64:
 -Inf
 -Inf
 -Inf
 -Inf
 -Inf
 -Inf
 -Inf
 -Inf
 -Inf
 -Inf
 -Inf
 -Inf
 -Inf

We will now formulate our `SEI` individual level model. For our example, this model will be quite simple as we have a contained artifical environment which limits exogeneous transmissions, and we do not have individual level risk factors to consider beyond basic location data. We will use some common functions which have been prewritten in our examples folder.



In [5]:
include(joinpath(@__DIR__, "risk_functions.jl"))

rf = RiskFunctions{SI}(_zero, # sparks function - we will assume no exogenous transmissions and set this to zero
                       _one, # susceptibility function - we do not have individual level risk factor information to explore here, so will set to a constant 1
                       _powerlaw, # transmissability function - we will use a powerlaw kernel. This provides a spatial and non-spatial component to infection transmissions. This has 2 parameters.
                       _one) # latency function - we will set this to a constant rate for all individuals to keep simple

SI model risk functions

In [6]:
rpriors = RiskPriors{SI}(UnivariateDistribution[], # empty `UnivariateDistribution` vector for all parameter-less functions
                         UnivariateDistribution[], 
                         [Gamma(1.0, 0.5); Gamma(1.0, 1.0)], # Relatively uninformative priors with appropriate support
                         UnivariateDistribution[])

SI model risk function priors

We provide some bounds to event times in comparision to observation times. Actual onset of infectiousness in this study could have occurred any time between plant examinations. This means the observation delay could be up to 14.0 days.

In [7]:
ee = EventExtents{SI}(14.0)

SI model event extents

In [8]:
mcmc = MCMC(obs, ee, pop, starting_states, rf, rpriors)
start!(mcmc, 3, attempts = 20000)

[32mInitialization progress100%|████████████████████████████| Time: 0:00:04[39m:12[39m


RemoteException: On worker 2:
UndefVarError: #_zero not defined
deserialize_datatype at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Serialization/src/Serialization.jl:1051
handle_deserialize at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Serialization/src/Serialization.jl:743
deserialize at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Serialization/src/Serialization.jl:703
handle_deserialize at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Serialization/src/Serialization.jl:750
deserialize at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Serialization/src/Serialization.jl:1157
handle_deserialize at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Serialization/src/Serialization.jl:756
deserialize at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Serialization/src/Serialization.jl:1157
handle_deserialize at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Serialization/src/Serialization.jl:756
deserialize at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Serialization/src/Serialization.jl:1157
handle_deserialize at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Serialization/src/Serialization.jl:751
deserialize_msg at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Serialization/src/Serialization.jl:703
#invokelatest#1 at ./essentials.jl:697 [inlined]
invokelatest at ./essentials.jl:696 [inlined]
message_handler_loop at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Distributed/src/process_messages.jl:160
process_tcp_streams at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.0/Distributed/src/process_messages.jl:117
#105 at ./task.jl:259

In [9]:
iterate!(mcmc, 5000, 0.5, event_batches = 50)

SI model MCMC with 0 chains

In [None]:
plot(mcmc.markov_chains[1].risk_parameters)
plot!(mcmc.markov_chains[2].risk_parameters)
plot!(mcmc.markov_chains[3].risk_parameters)
png(joinpath(@__DIR__, "02_TSWV_trace.png"))

In [None]:
# Visualization of transmission network convergence during MCMC
epidemic_animation = @animate for i = 1:50:7950
    plot(mcmc.markov_chains[1].transmission_network[i], 
         mcmc.population, 
         mcmc.markov_chains[1].events[i], 
         100.0, 
         aspect_ratio = :equal)
end
mp4(epidemic_animation, joinpath(@__DIR__, "02_TSWV_transmission_network_convergence.mp4"), fps=5)

In [None]:
# Visualization of a single inferred epidemic sample from MCMC
epidemic_animation = @animate for t in range(42.0, stop=100.0, length=100)
    plot(mcmc.markov_chains[3].transmission_network[end], 
         mcmc.population, 
         mcmc.markov_chains[3].events[end], 
         t, 
         aspect_ratio = :equal)
end
mp4(epidemic_animation, joinpath(@__DIR__, "02_TSWV_epidemic_plot_time.mp4"), fps=2)

In [None]:
# Visualization of epidemic curve convergence
epidemic_animation = @animate for i = 1:50:7950
    plot(mcmc.markov_chains[1].events[i], 0.0, 100.0)
end
mp4(epidemic_animation, joinpath(@__DIR__, "02_TSWV_epidemic_curve_convergence.mp4"), fps=5)

In [None]:
# Visualization of epidemic curve posterior
plot(mcmc.markov_chains[2].events[3950], 0.0, 100.0, ylims=(0, 525), alpha=0.05, legend=false)
for i = 4000:25:8000, j in [1; 3]
    plot!(mcmc.markov_chains[j].events[i], 0.0, 100.0, ylims=(0, 525), alpha=0.05, legend=false)
end
png(joinpath(@__DIR__, "02_TSWV_epidemic_curve_posterior.png"))