# Parallelizing the `for` loop

Manual matrix inversion and sparsity have led to great performance gains, but more can be accomplished. Perhaps the greatest source of potential is the single `for` loop that runs through each line in the network. There is no need for one process to handle every line in series, but the way the code is currently written precludes parallelization.

I begin the re-write by deleting old versions of the temporal instanton code. One of the greatest things about version control is that you don't need to be sentimental.

The files `TemporalInstanton3` and `PowerFlow` are the only ones I currrently use. Everything else contains duplicate or inferior code.

Can you parallelize a `for` loop when each loop accesses a common variable?

In [15]:
addprocs(4)

4-element Array{Any,1}:
 2
 3
 4
 5

In [16]:
workers()

4-element Array{Int64,1}:
 2
 3
 4
 5

In [3]:
@everywhere f(x) = "number $x"
x = [i for i in 1:5]
@time pmap(f,x); # parallel
@time [f(i) for i in x]; # single-core

 202

5-element Array{Union{UTF8String,ASCIIString},1}:
 "number 1"
 "number 2"
 "number 3"
 "number 4"
 "number 5"

.870 milliseconds (34973 allocations: 1461 KB)
   1.184 milliseconds (109 allocations: 5520 bytes)


In [8]:
@everywhere type InstantonInput
    field1
    field2
end

In [42]:
outside_data = 1
@everywhere function analyze(x)
    println(x.field2)
    return x.field1[1] + x.field1[2] + outside_data,x.field1
end
x = [InstantonInput([i,i+1],"$i") for i in 1:4]
results1,results2 = pmap(analyze,x); # parallel

1


4-element Array{Any,1}:
 (4,[1,2]) 
 (6,[2,3]) 
 (8,[3,4]) 
 (10,[4,5])

2
3
4


In [39]:
results

4-element Array{Any,1}:
  4
  6
  8
 10

In [20]:
x

4-element Array{InstantonInput,1}:
 InstantonInput([1,2],"1")
 InstantonInput([2,3],"2")
 InstantonInput([3,4],"3")
 InstantonInput([4,5],"4")

Parallel version is faster (but takes more space). Great! So this is what I'm supposed to do. I can start with an array of empty vectors and fill each with an instanton solution.

In [17]:
# works!
solutions = SharedArray(Int64,4)
@time @sync @parallel for i = 1:4
    solutions[i] = i;
end
solutions

elapsed time: 0.593147493 seconds (9944408 bytes allocated)


4-element SharedArray{Int64,1}:
 1
 2
 3
 4

In [27]:
# works!
# solutions = SharedArray(Int64,4)
results = @time @sync @parallel (vcat) for i = 1:4
    [i;i+1],i+2
end

elapsed time: 0.711400147 seconds (12618484 bytes allocated, 3.62% gc time)


4-element Array{(Array{Int64,1},Int64),1}:
 ([1,2],3)
 ([2,3],4)
 ([3,4],5)
 ([4,5],6)

In [25]:
results

4-element Array{Int64,1}:
 1
 2
 3
 4

Though I love the idea of using a SharedArray to parallelize, I shouldn't rely on an [experimental, Unix-only feature](http://julia.readthedocs.org/en/latest/manual/parallel-computing/#shared-arrays-experimental).

In [65]:
# in this case, single-core is far superior
@time solutions = [i for i in 1:4]

  

4-element Array{Int64,1}:
 1
 2
 3
 4

 1.322 microseconds (1 allocation: 96 bytes)


In [66]:
rmprocs(workers())

:ok

## New approach

The `for` loop is one way to look at temporal instanton analysis. Another way to look at it is a function that accepts a set of matrices, solves a QCQP, and returns a solution vector. If I want to apply this function to each line, I need only build the matrices for all lines and use `pmap` to apply the instanton function to each line's set in parallel.

In [28]:
using MatpowerCases
mpc = loadcase("case96")
# vector of bus names (e.g. 101, 102, ...)
bus_names = round(Int64,mpc["bus"][:,1])

# vector of bus voltages (e.g. 138, 138, 230, ...)
bus_voltages = mpc["bus"][:,10]

# vector of "from" nodes
from = round(Int64,mpc["branch"][:,1])

# vector of "to" nodes
to = round(Int64,mpc["branch"][:,2])

""" Use voltage information to assign a conductor name to 
each line of the RTS-96.
"""
function return_line_conductors(bus_names,bus_voltages,from,to)
    numLines = length(from)
    node2voltage(node) = bus_voltages[find(bus_names.==node)][1]
    volt2cond(volt) = volt < 300 ? "waxwing" : "dove"
    line_voltages = Array(Float64,0)

    for i in 1:numLines
        Vfrom = node2voltage(from[i])
        Vto = node2voltage(to[i])
        Vline = max(Vfrom,Vto)
        push!(line_voltages, Vline)
    end
    line_conductors = [volt2cond(volt) for volt in line_voltages]
end

line_conductors = return_line_conductors(bus_names,bus_voltages,from,to);

  Please email questions, comments, and significant results to: Robert.C.Green@gmail.com.  Thanks!


In [29]:
# Data loading and manipulation:
using HDF5, JLD
include("../src/tmp_inst_rts96.jl")

# Analysis:
include("../src/TemporalInstanton.jl")
using TemporalInstanton



In [3]:
####### LOAD DATA ########
psData = psDataLoad()

# unpack psDL (boilerplate):
(Sb,f,t,r,x,b,Y,bustype,
Gp,Gq,Dp,Dq,Rp,Rq,
Pmax,Pmin,Qmax,Qmin,Plim,
Vg,Vceiling,Vfloor,
busIdx,N,Nr,Ng,k) = unpack_psDL(psData)

Sb = 100e6 #overwrite "100.0"

res = r
reac = x

####### LINK DATA ########
# Static
Ridx = find(Rp) # Vector of renewable nodes
Y = full(Y) # Full admittance matrix (ref not removed)
ref = 1 # Index of ref node
k = k # Conventional generator participation factors
lines = [(f[i],t[i]) for i in 1:length(f)];
line_lengths = load("../data/RTS-96\ Data/line_lengths.jld", "line_lengths")

# Thermal model parameters:
Tamb = 35. # C
T0 = 60. #46. # initial line steady-state temp
time_intervals = 3 # thirty minutes
time_values = 0:60:600 # ten minutes in 1-min steps
int_length = 600. # seconds

# Generation, demand, and wind gen forecast:
G0 = [0.7*Gp;0.7*Gp;0.7*Gp;0.7*Gp;0.7*Gp;0.7*Gp]
D0 = [0.9*Dp;0.9*Dp;0.9*Dp;0.9*Dp;0.9*Dp;0.9*Dp]
P0 = [Rp;1.1*Rp;1.2*Rp;1.3*Rp;1.4*Rp;1.5*Rp]

n = length(k)
nr = length(Ridx)
T = round(Int64,length(find(P0))/nr)
numLines = length(lines)

    # Form objective quadratic:
    Qobj = tmp_inst_Qobj(n,nr,T; pad=true)
    G_of_x = (Qobj,0,0)

    # Create A1 (only A2, the bottom part,
    # changes during line loop):
    A1 = tmp_inst_A1(Ridx,T,Y,ref,k; pad=true)

    b = tmp_inst_b(n,T,G0,P0,D0; pad=true)
    Qtheta = tmp_inst_Qtheta(n,nr,T)


println("loaded")

loaded


In [4]:
nz_line_idx = find(line_lengths.!=0);

globals =     
    (Ridx,
    Y,
    G0,
    P0,
    D0,
    Sb,
    ref,
    lines,
    res,
    reac,
    k,
    line_lengths,
    line_conductors,
    Tamb,
    T0,
    int_length,
    Qobj,
    G_of_x,
    A1,
    b,
    Qtheta);

([2,14,16,17,18,19,20,21,23,26,36,37,43,44,47,49,54,57],
73x73 Array{Float64,2}:
  87.9326   -71.4286   -4.73934   0.0      …     0.0       0.0        0.0  
 -71.4286    84.5109    0.0      -7.87402        0.0       0.0        0.0  
  -4.73934    0.0      25.0475    0.0            0.0       0.0        0.0  
   0.0       -7.87402   0.0      17.4894         0.0       0.0        0.0  
 -11.7647     0.0       0.0       0.0            0.0       0.0        0.0  
   0.0       -5.20833   0.0       0.0      …     0.0       0.0        0.0  
   0.0        0.0       0.0       0.0            0.0       0.0        0.0  
   0.0        0.0       0.0       0.0            0.0       0.0        0.0  
   0.0        0.0      -8.40336  -9.61538        0.0       0.0        0.0  
   0.0        0.0       0.0       0.0            0.0       0.0        0.0  
   0.0        0.0       0.0       0.0      …     0.0       0.0        0.0  
   0.0        0.0       0.0       0.0            0.0       0.0        0.0  
   0.0 

In [3]:
addprocs(4)

4-element Array{Any,1}:
 2
 3
 4
 5

In [6]:
rmprocs(workers())

:ok

In [7]:
workers()

1-element Array{Int64,1}:
 1

In [5]:
analyze_line(1,globals)

([-7.75122e-6,-4.63109e-9,1.87264e-7,2.13588e-7,2.25849e-7,1.0646e-7,3.61966e-8,2.37234e-7,-2.44852e-9,-5.93067e-8  …  0.408612,0.295896,0.417572,-17.4254,7.2732e-6,1.91838e-5,5.06901e-5,0.000135788,0.000404556,0.0147062],4.252047354025929)

## Winning approach

I tried the `pmap` approach and found it rather awkward. It was difficult to distribute "global" variables. I kept getting "x is undefined" kinds of messages. So I went back to the parallel for loop.

