## Goals of this section
1. Even *more* physics! Time evolution and "qunatum quenches" as a way to...
2. Learn more about Julia's parallel computing features like `pmap` and module importation
3. Exploit various linear algebra tricks to make things faster

So far all we've looked at have been static problems. Now we will do something *time dependent*, as a motivator to explore more of Julia's linear algebra and parallel features.

We will start in a situation with non-zero transverse magnetic field and turn it off, seeing what happens. This is going to require doing *a lot* of independent matrix-vector operations - a good use case for parallelism!

First, we'll make our Hamiltonian from the previous part, with a little time-dependent spice:

$$ \hat{H}(t) = -\sum_{\langle i, j \rangle} \hat{\sigma}_i^z \hat{\sigma}_j^z - h(t)\sum_i \hat{\sigma}_i^x $$

Now $h(t)$ is some time dependent function. This means that the lowest energy state (the groundstate) will change with time as well. We need a way to simulate this. In a closed quantum system, a wavefunction $|\Psi\rangle$ will undergo *unitary time evolution* so that:

$$ \left| \Psi(t) \right\rangle = \hat{U}(t)\left| \Psi(t = 0)\right\rangle $$

so that

$$ \left| \left\langle \Psi(t) \left|\right. \Psi(t) \right\rangle\right|_2 = 1 \forall t $$

$\hat{U}(t)$ is a *unitary* operator - it's norm preserving. The particular form it takes is:

$$ \hat{U}(t) = \exp\left\{- i t\hat{H}(t) \right\} $$

Since $\hat{H}(t)$ is always Hermitian, $\hat{U}(t)$ is always unitary. For an explanation of where all this comes from you can consult a textbook on quantum mechanics. For now, if it's confusing, we're just going to
  1. Calculate $\hat{U}(t)$ for various times (and perhaps use a few shortcuts)
  2. Multiply it by $|\Psi\rangle$ to find the groundstate at various times
  3. Make pretty pictures, learn some things
  
Remember that $|\Psi\rangle$ is "just" some vector, and $\hat{U}$ and $\hat{H}$ are matrices - underneath all the jargon, we're still just doing linear algebra! We'll be able to reuse our types from the previous part. We can do this by hoovering up all our code into a file called `timeevolution.jl` (or whatever you feel like calling it, it's a free country). If you're not sure how to do this quickly, take a look at [`nbconvert`](https://nbconvert.readthedocs.io/en/latest/).

In [1]:
include("timeevolve.jl")

get_groundstate (generic function with 2 methods)

We can write some single-node code to generate the wavefunction at various times, which we'll parallelize in a moment. It's good to have a working single-node version as a proof of concept and something to test against. I wrote a cheap helper function `get_groundstate` to automate finding the lowest energy eigenvector for the `TransverseFieldIsing` Hamiltonian. The first argument to my function is the length $L$, and the second argument is the strength of the transverse field $h$.

In [2]:
function timeEvolve(ψ, H, t)
    U = expm(-im*t*Hermitian(H)) # want to use the optimized method
    return U*ψ
end

function timeSeries(ψ, H, start, stop, n)
    times = linspace(start, stop, n)
    map((t,)->timeEvolve(t, ψ, H), times)
end

timeSeries (generic function with 1 method)

## Here's your parallelism

Now we can use Julia's [parallel map](https://docs.julialang.org/en/latest/stdlib/parallel.html#Base.Distributed.pmap) function to make this faster! For now, all we have to do is add the "`p`": 

In [3]:
function timeSeriesParallel(ψ, H, start, stop, n)
    times = linspace(start, stop, n)
    pmap((t,) -> timeEvolve(ψ, H.Mat, t), times)
end

addprocs(6)
ψ, H = get_groundstate(10, 1.0)
ψ_quench, H_quench = get_groundstate(10, 0.0)
timeSeriesParallel(ψ, H_quench, 0.0, 10.0, 100)

LoadError: [91mOn worker 2:
[91mUndefVarError: #timeEvolve not defined[39m
deserialize_datatype at ./serialize.jl:968
handle_deserialize at ./serialize.jl:674
deserialize at ./serialize.jl:634
handle_deserialize at ./serialize.jl:681
deserialize_global_from_main at ./distributed/clusterserialize.jl:154
foreach at ./abstractarray.jl:1724
deserialize at ./distributed/clusterserialize.jl:56
handle_deserialize at ./serialize.jl:722
deserialize at ./serialize.jl:634
deserialize_datatype at ./serialize.jl:966
handle_deserialize at ./serialize.jl:676
deserialize at ./serialize.jl:634
handle_deserialize at ./serialize.jl:681
deserialize_msg at ./distributed/messages.jl:98
message_handler_loop at ./distributed/process_messages.jl:161
process_tcp_streams at ./distributed/process_messages.jl:118
#99 at ./event.jl:73[39m

Oops! The workers don't know about our type and other functions. We'll need to load them onto each worker to be able to use `pmap`. For a more detailed discussion of why this is, consult the [docs](https://docs.julialang.org/en/latest/manual/parallel-computing.html#Code-Availability-and-Loading-Packages-1).

In [4]:
addprocs(6)
@everywhere include("timeevolve.jl")

@everywhere function timeEvolve(ψ, H, t)
    U = expm(UniformScaling(-im*t)*Hermitian(H)) # want to use the optimized method
    return U*ψ
end
ψ, H = get_groundstate(8, 1.0)
ψ_quench, H_quench = get_groundstate(8, 0.0)

#now we can try running with the workers we added
ψs_t = timeSeriesParallel(ψ, H_quench, 0.0, 10.0, )

5-element Array{Array{Complex{Float64},1},1}:
 Complex{Float64}[0.0+0.0im, 0.0+0.0im, 0.0+0.0im, 0.0+0.0im, 0.0+0.0im, 0.0+0.0im, 0.0+0.0im, 0.0+0.0im, 0.0+0.0im, 0.0+0.0im  …  0.149587+0.0im, 0.1981+0.0im, 0.253379+0.0im, 0.194948+0.0im, 0.163508+0.0im, 0.206921+0.0im, 0.263587+0.0im, 0.216095+0.0im, 0.278388+0.0im, 0.372904+0.0im]                                                               
 Complex{Float64}[0.0+0.0im, 0.0+0.0im, 0.0+0.0im, 0.0+0.0im, 0.0+0.0im, 0.0+0.0im, 0.0+0.0im, 0.0+0.0im, 0.0+0.0im, 0.0+0.0im  …  0.0424323-0.143443im, 0.0686685+0.185818im, -0.212603-0.137843im, 0.067576+0.182862im, 0.0463809-0.156791im, 0.0717262+0.194092im, -0.221168-0.143397im, 0.0749063+0.202698im, -0.233588-0.151449im, 0.372083-0.0247317im]  
 Complex{Float64}[0.0+0.0im, 0.0+0.0im, 0.0+0.0im, 0.0+0.0im, 0.0+0.0im, 0.0+0.0im, 0.0+0.0im, 0.0+0.0im, 0.0+0.0im, 0.0+0.0im  …  -0.125514-0.0813787im, -0.150494+0.128822im, 0.103399+0.231321im, -0.1481+0.126773im, -0.137195-0.0889516im, -0.157196+

This is an *embarrassingly parallel* problem. How convenient that the workers don't need to coordinate with each other, only with the driver node. 

This might, depending on your choice of $L$ and how many time samples you want, take *ages*. $L = 6$ and 5 time samples is an OK choice to make sure that your code works at all. Now we can make an initial plot:

In [5]:
using GR

mags_t = map(magnetization, ψs_t)
plot(linspace(0.0, 10.0, 5), mags_t)

LoadError: [91mMethodError: no method matching magnetization(::Array{Complex{Float64},1})[0m
Closest candidates are:
  magnetization(::Array{T,1} where T, [91m::Any[39m) at /Users/kshyatt/Projects/juliacon2017/timeevolve.jl:26[39m

There are clearly a lot of inefficiencies in this code. Let's enumerate some of them:
  1. We are constructing each Hamiltonian matrix on the head node and sending the whole thing to the workers. All the workers need is the value of $h(t)$.
  2. We are sending the entire groundstate back when, for now, all we need is the magnetization (a single `float`).
  3. The Hamiltonian is actually diagonal in the basis we have picked. We can use a linear algebra trick to speed up the hard work of computing $\hat{U}(t)$.

$$ \exp\left\{ \hat{A} \right\} = \hat{P}^\dagger \exp\left\{ \hat{D} \right\} \hat{P} $$

where

$$ \hat{A} = \hat{P}^\dagger \hat{D} \hat{P} $$

and $\hat{D}$ is a diagonal matrix whose entries are the eigenvalues of $\hat{A}$, and $\hat{P}$ is the matrix which diagonalizes $\hat{A}$ (its columns are the eigenvectors of $\hat{A}$). So:

$$ \hat{U}(t) = \exp\left\{- i t\hat{H}(t) \right\} = \hat{P}^\dagger \exp\left\{ -it\hat{D} \right\} \hat{P} $$

and we have access to $\hat{P}$ and $\hat{D}$ from our previous work...

### Exercises:

1. Try different batch sizes for `pmap` - does that speed anything up?
2. Use the linear algebra trick! For extra points, let's play with `Diagonal` and see if that speeds anything up.
3. Implement the speedup in 2 above.
4. (Harder) have the plot be generated/updated in real time
5. How does this scale with the number of workers?

### If you have extra time:

It's worth trying the quench with the $XXZ$ model, quenching to and from the $XY$ model to the Heisenberg model. What happens?

We've implemented an *instantaneous* quench. What happens if you change your time-evolution function to support *slow* ([adiabatic](https://en.wikipedia.org/wiki/Adiabatic_theorem)) quenches?

Quantum quenches are a good example of a process that is a) relatively easy to simulate on a classical computer and b) illuminating of lots of [interesting](https://arxiv.org/abs/1404.6848) [quantum](https://arxiv.org/abs/1704.01974) [properties](https://arxiv.org/abs/1706.01917).