# Parallelizing the `for` loop

Manual matrix inversion and sparsity have led to great performance gains, but more can be accomplished. Perhaps the greatest source of potential is the single `for` loop that runs through each line in the network. There is no need for one process to handle every line in series, but the way the code is currently written precludes parallelization.

I begin the re-write by deleting old versions of the temporal instanton code. One of the greatest things about version control is that you don't need to be sentimental.

The files `TemporalInstanton3` and `PowerFlow` are the only ones I currrently use. Everything else contains duplicate or inferior code.

Can you parallelize a `for` loop when each loop accesses a common variable?

In [1]:
addprocs()

4-element Array{Int64,1}:
 2
 3
 4
 5

In [2]:
workers()

4-element Array{Int64,1}:
 2
 3
 4
 5

In [3]:
@everywhere f(x) = "number $x"
x = [i for i in 1:5]
@time pmap(f,x); # parallel
@time [f(i) for i in x]; # single-core

 202

5-element Array{Union{UTF8String,ASCIIString},1}:
 "number 1"
 "number 2"
 "number 3"
 "number 4"
 "number 5"

.870 milliseconds (34973 allocations: 1461 KB)
   1.184 milliseconds (109 allocations: 5520 bytes)


In [8]:
@everywhere type InstantonInput
    field1
    field2
end

In [11]:
@everywhere function analyze(x)
    println(x.field2)
    x.field1[1] + x.field1[2]
end
x = [InstantonInput([i,i+1],"$i") for i in 1:4]
results = pmap(analyze,x); # parallel
#@time [f(i) for i in x]; # single-core

1


4-element Array{Any,1}:
 3
 5
 7
 9

2
3
4


In [12]:
results

4-element Array{Any,1}:
 3
 5
 7
 9

Parallel version is faster (but takes more space). Great! So this is what I'm supposed to do. I can start with an array of empty vectors and fill each with an instanton solution.

In [64]:
# works!
solutions = SharedArray(Int64,4)
@time @sync @parallel for i = 1:4
    solutions[i] = i;
end
solutions

  

4-element SharedArray{Int64,1}:
 1
 2
 3
 4

27.588 milliseconds (4927 allocations: 362 KB)


Though I love the idea of using a SharedArray to parallelize, I shouldn't rely on an [experimental, Unix-only feature](http://julia.readthedocs.org/en/latest/manual/parallel-computing/#shared-arrays-experimental).

In [65]:
# in this case, single-core is far superior
@time solutions = [i for i in 1:4]

  

4-element Array{Int64,1}:
 1
 2
 3
 4

 1.322 microseconds (1 allocation: 96 bytes)


In [66]:
rmprocs(workers())

:ok

## New approach

The `for` loop is one way to look at temporal instanton analysis. Another way to look at it is a function that accepts a set of matrices, solves a QCQP, and returns a solution vector. If I want to apply this function to each line, I need only build the matrices for all lines and use `pmap` to apply the instanton function to each line's set in parallel.