Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train model with Asynchronous mode #492

Closed
anavarro01 opened this issue Nov 25, 2021 · 21 comments
Closed

Train model with Asynchronous mode #492

anavarro01 opened this issue Nov 25, 2021 · 21 comments

Comments

@anavarro01
Copy link

Hi!
I'm trying to try my model with the Asynchronous mode, based on the docs.
At the first lines of the script, i'm running this:

dir_to_enviromentl = Base.active_project()
using Distributed
Distributed.addprocs(5)
@everywhere begin
    import Pkg
    Pkg.activate(dir_to_enviromentl)
    using Gurobi, SDDP
end

Notice that i've added the statement "using Gurobi, SDDP" because i found it at other post.
Actually, i am creating my model by this way:

graph = SDDP.LinearGraph(10)
gurobi_env = Gurobi.Env()
model = SDDP.PolicyGraph(
            graph,
            sense = :Min,
            optimizer = optimizer_with_attributes(() -> Gurobi.Optimizer(gurobi_env))) do sp, t
end

When i try to train my model:

SDDP.train(model; iteration_limit = 10, print_level = 1, add_to_existing_cuts = true, parallel_scheme = SDDP.Asynchronous())

I can only see Outputs from the "worker 1", and nothing from the other workers. Also, at the end of the iterations of train, i get the next error:

ERROR: LoadError: On worker 2:
Gurobi Error 10002:
trap_error at /home/agnavarro/.julia/packages/SDDP/Cp4Bp/src/plugins/parallel_schemes.jl:159
slave_loop at /home/agnavarro/.julia/packages/SDDP/Cp4Bp/src/plugins/parallel_schemes.jl:155
#103 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/process_messages.jl:290
run_work_thunk at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/process_messages.jl:79
run_work_thunk at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Distributed/src/process_messages.jl:88
#96 at ./task.jl:356

I can't figure out where is the problem here.

Thanks in advance!

@odow
Copy link
Owner

odow commented Nov 25, 2021

You needed to read a little further down the page to use Gurobi:
https://odow.github.io/SDDP.jl/latest/guides/improve_computational_performance/#Initialization-hooks

@anavarro01
Copy link
Author

Thanks, it worked perfectly.

@odow odow closed this as completed Nov 27, 2021
@anavarro01
Copy link
Author

Hi,

Sorry for re open this issue, but i have a question and opening a new issue might be innecesary.
I'm training my model in asynchronus mode, and i get this output:

 Iteration    Simulation       Bound         Time (s)    Proc. ID   # Solves
        1    3.664092e+07   7.371414e+05   4.740712e+02          1       2484
        2    1.954982e+07   9.953921e+06   7.951662e+02          1       4680
        3    4.591550e+07   9.953921e+06   8.032407e+02          2       4740
        4    4.866310e+07   9.953921e+06   8.118478e+02          3       4800
        5    4.878851e+07   9.953921e+06   8.195227e+02          4       4860
        6    4.355507e+07   9.953921e+06   8.272206e+02          5       4920
        7    1.969249e+07   1.026331e+07   1.138915e+03          1       7116
        8    1.469963e+07   1.026350e+07   1.458276e+03          1       9312
        9    1.564037e+07   1.200186e+07   1.808019e+03          1      11508
       10    1.341304e+07   1.203708e+07   2.155772e+03          1      13704
       11    1.816808e+07   1.205720e+07   2.490398e+03          1      15900

It seems that it uses the 5 workers jsut once, and train only with one of them the rest of iterations. I'v treined the model with 30 iterations, and only the first five used more than de first worker. In the docs it says "SDDP.jl will start in serial mode while the initialization takes place. Therefore, in the log you will see that the initial iterations take place on the master thread (Proc. ID = 1), and it is only after while that the solve switches to full parallelism", bit i'm getting something different.
It's this working ok? or it's something strange?

Thanks in advance!

@odow
Copy link
Owner

odow commented Dec 6, 2021

Why does it take 474 seconds do do one iteration? What machine is this on? It looks like you're probably running out of RAM and so the other 4 workers are actually slower than just running in serial mode.

@odow odow reopened this Dec 6, 2021
@anavarro01
Copy link
Author

I don't know why it takes 474 seconds to do one iteration. The problem it's a big hydro-thermal scheduling, with 56 stages (representing like 9 years), more than 200 buses, 150 nodes on the water network.
I don't think that im running out of ram. I'm running in a Intel Xeon E5-2630 with 40 cores and 64GB of ram. Looking in my dashboard, each worker it's not using more than 5GB of ram, and the main process 8GB.

@odow
Copy link
Owner

odow commented Dec 6, 2021

The parallel scheme is not well optimized, and it requires a lot of data movement between the processors. I assume for this model the set-up and data movement overhead outweighs the benefit of running in parallel. How do the times look if you run in serial?

@anavarro01
Copy link
Author

This is the output in serial mode

 Iteration    Simulation       Bound         Time (s)    Proc. ID   # Solves
        1    4.379315e+07   6.238694e+05   5.137349e+02          1       2196
        2    2.398887e+07   1.024725e+07   9.844802e+02          1       4392
        3    1.365226e+07   1.025685e+07   1.464048e+03          1       6588
        4    2.005606e+07   1.188252e+07   1.904386e+03          1       8784
        5    1.751424e+07   1.188255e+07   2.471620e+03          1      10980
        6    1.362879e+07   1.259155e+07   2.880270e+03          1      13176
        7    1.396052e+07   1.261973e+07   3.248855e+03          1      15372
        8    1.366801e+07   1.261982e+07   3.658474e+03          1      17568
        9    1.412258e+07   1.262390e+07   4.027489e+03          1      19764
       10    1.476644e+07   1.262392e+07   4.426796e+03          1      21960
       11    1.387798e+07   1.298193e+07   4.769414e+03          1      24156
       12    1.344110e+07   1.299375e+07   5.248231e+03          1      26352
       13    1.444331e+07   1.299466e+07   5.842923e+03          1      28548
       14    1.375559e+07   1.336322e+07   6.328565e+03          1      30744
       15    1.335609e+07   1.336351e+07   6.852967e+03          1      32940
       16    1.357505e+07   1.340325e+07   7.329363e+03          1      35136
       17    1.329446e+07   1.340336e+07   7.866351e+03          1      37332
       18    1.822702e+07   1.340367e+07   8.441247e+03          1      39528
       19    1.401540e+07   1.340401e+07   8.839535e+03          1      41724
       20    1.349358e+07   1.340825e+07   9.145367e+03          1      43920

It converges in fewer iterations, as we would expect. Per iteration is a little slower (iteration from the thread 1).
Also, i've tested the asynchronus mode with 40 iterations, and it use each worker just once and make all the other iterations with the thread one.

@odow
Copy link
Owner

odow commented Dec 7, 2021

There's something not quite right about your model. Does it really take 200ms per LP solve? What solver are you using? Is it an LP? Can you provide the full log from SDDP.jl (including all the stuff at the top)?

@anavarro01
Copy link
Author

I'm using Gurobi 9.1 and it's an LP. The full SDDP log is this:

------------------------------------------------------------------------------
                      SDDP.jl (c) Oscar Dowson, 2017-21

Problem
  Nodes           : 36
  State variables : 25
  Scenarios       : 1.03144e+64
  Existing cuts   : false
  Subproblem structure                                              : (min, max)
    Variables                                                       : (21107, 21107)
    GenericAffExpr{Float64,VariableRef} in MOI.GreaterThan{Float64} : (3352, 3354)
    VariableRef in MOI.LessThan{Float64}                            : (1, 1)
    GenericAffExpr{Float64,VariableRef} in MOI.Interval{Float64}    : (5585, 5585)
    VariableRef in MOI.GreaterThan{Float64}                         : (19568, 19568)
    GenericAffExpr{Float64,VariableRef} in MOI.LessThan{Float64}    : (5050, 5051)
    GenericAffExpr{Float64,VariableRef} in MOI.EqualTo{Float64}     : (6506, 6516)
Options
  Solver          : serial mode
  Risk measure    : SDDP.Expectation()
  Sampling scheme : SDDP.InSampleMonteCarlo

Numerical stability report
  Non-zero Matrix range     [8e-05, 1e+03]
  Non-zero Objective range  [9e-02, 1e+07]
  Non-zero Bounds range     [5e+05, 5e+05]
  Non-zero RHS range        [1e-01, 4e+04]
WARNING: numerical stability issues detected
  - Matrix range contains small coefficients
Very large or small absolute values of coefficients
can cause numerical stability issues. Consider
reformulating the model.

 Iteration    Simulation       Bound         Time (s)    Proc. ID   # Solves
        1    4.379315e+07   6.238694e+05   5.137349e+02          1       2196
        2    2.398887e+07   1.024725e+07   9.844802e+02          1       4392
        3    1.365226e+07   1.025685e+07   1.464048e+03          1       6588
        4    2.005606e+07   1.188252e+07   1.904386e+03          1       8784
        5    1.751424e+07   1.188255e+07   2.471620e+03          1      10980
        6    1.362879e+07   1.259155e+07   2.880270e+03          1      13176
        7    1.396052e+07   1.261973e+07   3.248855e+03          1      15372
        8    1.366801e+07   1.261982e+07   3.658474e+03          1      17568
        9    1.412258e+07   1.262390e+07   4.027489e+03          1      19764
       10    1.476644e+07   1.262392e+07   4.426796e+03          1      21960
       11    1.387798e+07   1.298193e+07   4.769414e+03          1      24156
       12    1.344110e+07   1.299375e+07   5.248231e+03          1      26352
       13    1.444331e+07   1.299466e+07   5.842923e+03          1      28548
       14    1.375559e+07   1.336322e+07   6.328565e+03          1      30744
       15    1.335609e+07   1.336351e+07   6.852967e+03          1      32940
       16    1.357505e+07   1.340325e+07   7.329363e+03          1      35136
       17    1.329446e+07   1.340336e+07   7.866351e+03          1      37332
       18    1.822702e+07   1.340367e+07   8.441247e+03          1      39528
       19    1.401540e+07   1.340401e+07   8.839535e+03          1      41724
       20    1.349358e+07   1.340825e+07   9.145367e+03          1      43920

Terminating training
  Status         : iteration_limit
  Total time (s) : 9.145367e+03
  Total solves   : 43920
  Best bound     :  1.340825e+07
  Simulation CI  :  1.653153e+07 ± 3.064649e+06
------------------------------------------------------------------------------

I'm using gurobi parameter numeric focus = 3.
I suspect that the warning can be the problem, but im not sure.

@odow
Copy link
Owner

odow commented Dec 7, 2021

So a few things:

  • Setting the numeric focus will slow things down a lot
  • What are the 1e-5 terms in your constraints? You're running into numerical issues because you have big terms and small terms in the same problem. That can lead to all sorts of trouble. Consider reformulating your model to accept less accuracy in some variables (you don't need to measure water in a reservoir with m^3, for example; use million m^3 instead).
  • You have a large number of variables in each subproblem. Are they contributing to the model in a useful way? Consider simplifications to the network.
  • Consider adding realistic upper bounds to your variables. This can help quite a bit.

@anavarro01
Copy link
Author

Thanks for the answer.

  • I've tested not using the numeric focus and the time is just marginally better.
  • I've fixed the 1e-5 terms in my constraints. They was from de water network of my model. The new model has the matrix values between 1e-3 and 1e+3. I'm testing the times of this case
  • Yes, i have a large number of variables on each subproblem. I have another case with a simplified network, but for this case i need a detailed model from the power and hydro system
  • About the upperbounds of the variables, you mean adding them like this? thermal_generation[g in SetGens], lower_bound = 0, upper_bound =GenMax , start = 0) ? Actually im using just the lower bound. If i add the upper bound he model can be faster?

Also, i've tested the asynchronous mode with a slightly modified model than the begining (note this in the matrix range). I know that i have a better model in term of range rigth now, but i'm running that model.
The thing it's that SDDP just uses de 4 extra workers just once (with redundant cuts i think) and never uses them again. I was looking the CPU usage of each worker, and the extra workers barely used more than 5% of a core after the 6th iteration (when de solver used them). In contrast, the main worker allways used like 90 or 100% of a core. Also, in term of RAM usage, the main worker used lige 8GB of ram anf the rest like 3 or 4 (there was like 30GB of free RAM at that moment).
Here is the output.

                      SDDP.jl (c) Oscar Dowson, 2017-21

Problem
  Nodes           : 36
  State variables : 25
  Scenarios       : 1.03144e+64
  Existing cuts   : false
  Subproblem structure                                              : (min, max)
    Variables                                                       : (21107, 21107)
    GenericAffExpr{Float64,VariableRef} in MOI.GreaterThan{Float64} : (3352, 3354)
    VariableRef in MOI.LessThan{Float64}                            : (1, 1)
    GenericAffExpr{Float64,VariableRef} in MOI.Interval{Float64}    : (5585, 5585)
    VariableRef in MOI.GreaterThan{Float64}                         : (19568, 19568)
    GenericAffExpr{Float64,VariableRef} in MOI.LessThan{Float64}    : (5050, 5051)
    GenericAffExpr{Float64,VariableRef} in MOI.EqualTo{Float64}     : (6506, 6516)
Options
  Solver          : Asynchronous mode with 4 workers.
  Risk measure    : SDDP.Expectation()
  Sampling scheme : SDDP.InSampleMonteCarlo

Numerical stability report
  Non-zero Matrix range     [7e-04, 1e+04]
  Non-zero Objective range  [9e-02, 1e+06]
  Non-zero Bounds range     [5e+05, 5e+05]
  Non-zero RHS range        [1e-01, 4e+04]
No problems detected

 Iteration    Simulation       Bound         Time (s)    Proc. ID   # Solves
        1    5.423759e+07   8.267030e+05   4.416711e+02          1       2196
        2    2.940131e+07   8.782821e+06   7.309163e+02          1       4392
        3    5.217217e+07   8.782821e+06   7.387564e+02          2       4452
        4    4.352296e+07   8.782821e+06   7.465079e+02          3       4512
        5    5.251501e+07   8.782821e+06   7.541364e+02          5       4572
        6    4.686811e+07   8.782821e+06   7.621236e+02          4       4632
        7    1.719751e+07   1.037191e+07   1.096181e+03          1       6828
        8    1.844196e+07   1.037192e+07   1.422216e+03          1       9024
        9    1.547676e+07   1.118443e+07   1.725801e+03          1      11220
       10    1.574095e+07   1.118467e+07   2.099916e+03          1      13416
       11    1.661916e+07   1.118481e+07   2.460442e+03          1      15612
       12    1.545823e+07   1.118489e+07   2.782426e+03          1      17808
       13    1.357699e+07   1.271447e+07   3.092544e+03          1      20004
       14    1.477563e+07   1.271460e+07   3.466616e+03          1      22200
       15    1.953569e+07   1.271463e+07   3.803677e+03          1      24396
       16    1.790396e+07   1.274343e+07   4.153817e+03          1      26592
       17    1.501449e+07   1.274408e+07   4.489968e+03          1      28788
       18    1.861698e+07   1.274434e+07   4.840596e+03          1      30984
       19    1.214280e+07   1.284335e+07   5.198810e+03          1      33180
       20    1.743427e+07   1.284566e+07   5.553337e+03          1      35376
       21    1.358681e+07   1.287126e+07   5.906797e+03          1      37572
       22    1.456575e+07   1.287127e+07   6.252890e+03          1      39768
       23    1.869709e+07   1.298196e+07   6.575987e+03          1      41964
       24    1.438251e+07   1.298244e+07   6.954527e+03          1      44160
       25    1.394985e+07   1.298285e+07   7.338875e+03          1      46356
       26    1.427660e+07   1.298287e+07   7.682910e+03          1      48552
       27    1.350326e+07   1.298307e+07   8.120624e+03          1      50748
       28    1.367676e+07   1.308422e+07   8.451969e+03          1      52944
       29    1.396336e+07   1.308630e+07   8.860270e+03          1      55140
       30    1.664874e+07   1.308645e+07   9.281819e+03          1      57336

Terminating training
  Status         : iteration_limit
  Total time (s) : 9.281819e+03
  Total solves   : 57336
  Best bound     :  1.308645e+07
  Simulation CI  :  2.179678e+07 ± 4.737844e+06
------------------------------------------------------------------------------

I know that the time es slow, but the strange thing it's that it's not using the other workers again.

Thanks for all!

@odow
Copy link
Owner

odow commented Dec 7, 2021

There's probably something todo with the channels to the remote processes disconnecting due to the time it takes to receive.

Unfortunately, I don't have the time to look into this in any detail. (What institution are you with?)

Here's the majority of the parallel code. It'd take some digging to find the problem:

function master_loop(
async::Asynchronous,
model::PolicyGraph{T},
options::Options,
) where {T}
# Initialize the remote channels. There are two types:
# 1) updates: master -> slaves[i]: a unique channel for each slave, which
# is used to distribute results found by other slaves.
# 2) results: slaves -> master: a channel which slaves collectively push to
# to feed the master new results.
updates = Dict(
pid => Distributed.RemoteChannel(
() -> Channel{IterationResult{T}}(Inf),
) for pid in async.slave_ids
)
results = Distributed.RemoteChannel(() -> Channel{IterationResult{T}}(Inf))
futures = Distributed.Future[]
_uninitialize_solver(model; throw_error = true)
for pid in async.slave_ids
let model_pid = model, options_pid = options
f = Distributed.remotecall(
slave_loop,
pid,
async,
model_pid,
options_pid,
updates[pid],
results,
)
push!(futures, f)
end
end
_initialize_solver(model; throw_error = true)
while true
# Starting workers has a high overhead. We have to copy the models across, and then
# precompile all the methods on every process :(. While that's happening, let's
# start running iterations on master. It has the added benefit that if the master
# is ever idle waiting for a result from a slave, it will do some useful work :).
#
# It also means that Asynchronous() can be our default setting, since if there are
# no workers, ther should be no overhead, _and_ this inner loop is just the serial
# implementation anyway.
while async.use_master && !isready(results)
result = iteration(model, options)
for (_, ch) in updates
put!(ch, result)
end
log_iteration(options)
if result.has_converged
close(results)
wait.(futures)
return result.status
end
end
while !isready(results)
sleep(1.0)
end
# We'll only reach here is isready(results) == true, so we won't hang waiting for a
# new result on take!. After we receive a new result from a slave, there are a few
# things to do:
# 1) send the result to the other slaves
# 2) update the master problem with the new cuts
# 3) compute the revised bound, update the log, and print to screen
# 4) test for convergence (e.g., bound stalling, time limit, iteration limit)
# 5) Exit, killing the running task on the workers.
result = take!(results)
for pid in async.slave_ids
if pid != result.pid
put!(updates[pid], result)
end
end
slave_update(model, result)
bound = calculate_bound(model)
push!(
options.log,
Log(
length(options.log) + 1,
bound,
result.cumulative_value,
time() - options.start_time,
result.pid,
model.ext[:total_solves],
duality_log_key(options.duality_handler),
),
)
log_iteration(options)
has_converged, status =
convergence_test(model, options.log, options.stopping_rules)
if has_converged
close(results)
wait.(futures)
return status
end
end
return

@anavarro01
Copy link
Author

Thanks for the answer.
I'm working with Pontificia Universidad Católica de Chile.
I will look the parallel code to see what i find. I will see if the process get disconected because of the time that takes to solve the sob problem.

Thanks!

@odow
Copy link
Owner

odow commented Dec 9, 2021

Ah so this is for the new Chilean model?

Try the undocumented option SDDP.Asynchronous(use_master = false)

parallel_scheme = SDDP.Asynchronous(use_master = false) do m
    env = Gurobi.Env()
    for node in values(m.nodes)
        set_optimizer(node.subproblem, () -> Gurobi.Optimizer(env))
        set_silent(node.subproblem)
    end
end

@anavarro01
Copy link
Author

Yes, i'm working with the Chilean model.
I've just tested the option with use_master = false and it worked! Now all the workers make new cuts on the iterations. Anyway, i think that every cut is not that effective as the past, but i will test the time and convergence to be sure about this.
The output of the train is this one:

------------------------------------------------------------------------------
                      SDDP.jl (c) Oscar Dowson, 2017-21

Problem
  Nodes           : 36
  State variables : 26
  Scenarios       : 1.03144e+64
  Existing cuts   : false
  Subproblem structure                                              : (min, max)
    Variables                                                       : (21109, 21109)
    GenericAffExpr{Float64,VariableRef} in MOI.GreaterThan{Float64} : (3352, 3354)
    VariableRef in MOI.LessThan{Float64}                            : (1, 1)
    GenericAffExpr{Float64,VariableRef} in MOI.Interval{Float64}    : (5585, 5585)
    VariableRef in MOI.GreaterThan{Float64}                         : (19568, 19568)
    GenericAffExpr{Float64,VariableRef} in MOI.LessThan{Float64}    : (5050, 5051)
    GenericAffExpr{Float64,VariableRef} in MOI.EqualTo{Float64}     : (6507, 6517)
Options
  Solver          : Asynchronous mode with 7 workers.
  Risk measure    : SDDP.Expectation()
  Sampling scheme : SDDP.InSampleMonteCarlo

Numerical stability report
  Non-zero Matrix range     [1e-03, 1e+03]
  Non-zero Objective range  [9e-02, 1e+06]
  Non-zero Bounds range     [5e+05, 5e+05]
  Non-zero RHS range        [1e-02, 4e+04]
No problems detected

 Iteration    Simulation       Bound         Time (s)    Proc. ID   # Solves
        1    1.104024e+08   6.981477e+05   5.127468e+02          3         60
        2    1.067315e+08   5.849218e+05   5.284406e+02          2        120
        3    9.880152e+07   6.981528e+05   5.483199e+02          4        180
        4    1.176214e+08   5.849218e+05   5.630537e+02          5        240
        5    1.101718e+08   5.849218e+05   5.796307e+02          6        300
        6    1.051674e+08   5.849218e+05   5.951814e+02          7        360
        7    1.037979e+08   6.981528e+05   6.077584e+02          8        420
        8    5.060291e+07   5.755976e+06   9.617168e+02          3        480
        9    5.258060e+07   5.788610e+06   9.809755e+02          2        540
       10    5.360229e+07   5.851016e+06   1.007858e+03          5        600
       11    5.022065e+07   5.851016e+06   1.019422e+03          6        660
       12    4.750682e+07   5.851016e+06   1.030757e+03          4        720
       13    5.085912e+07   5.851016e+06   1.042366e+03          7        780
       14    4.859930e+07   5.851016e+06   1.079381e+03          8        840
       15    1.957667e+07   5.851016e+06   1.392243e+03          3        900
       16    2.180245e+07   5.851016e+06   1.403444e+03          2        960
       17    2.287774e+07   5.912278e+06   1.415303e+03          5       1020
       18    1.647738e+07   5.912278e+06   1.426319e+03          6       1080
       19    2.357789e+07   6.048069e+06   1.436571e+03          7       1140
       20    2.668270e+07   6.048069e+06   1.447359e+03          4       1200
       21    2.818930e+07   6.048069e+06   1.490012e+03          8       1260
       22    1.360967e+07   6.048069e+06   1.774825e+03          3       1320
       23    1.069678e+07   6.048069e+06   1.801416e+03          2       1380
       24    1.635207e+07   7.168963e+06   1.819407e+03          6       1440
       25    1.477921e+07   7.168963e+06   1.837739e+03          5       1500
       26    2.053416e+07   7.168963e+06   1.850033e+03          7       1560
       27    2.138778e+07   7.168963e+06   1.867305e+03          4       1620
       28    1.942672e+07   7.168963e+06   1.909715e+03          8       1680
       29    1.574227e+07   7.168963e+06   2.190866e+03          3       1740
       30    1.549916e+07   7.168963e+06   2.224452e+03          2       1800

Terminating training
  Status         : iteration_limit
  Total time (s) : 2.224452e+03
  Total solves   : 1800
  Best bound     :  7.168963e+06
  Simulation CI  :  4.712925e+07 ± 1.306993e+07
------------------------------------------------------------------------------

Thanks!

@odow
Copy link
Owner

odow commented Dec 9, 2021

Okay. I obviously have some scheduling issues switching between the serial and parallel modes.

Anyway, i think that every cut is not that effective as the past

Yes. This is true. You'll need more cuts to achieve the same bound compared with serial mode.

@anavarro01
Copy link
Author

I've been testing, and with use_master = false it's working really well.
I used a model with simplified the transmission losses, and the Asinchronous model (with 3 extra workers) need like 55 iteretions to pass the convergence test. The same model in Synchronous mode need like 44 iterations, but the time that they take is like the double of the asynchronous case.

Thanks!

@odow
Copy link
Owner

odow commented Dec 10, 2021

Okay, great that it's working. At some point I'll look into the master scheduling issue.

Be very careful with how you measure convergence. If you have 26 state variables and 36 stages, you're likely going to need hundreds or thousands of iterations. Read #178.

@anavarro01
Copy link
Author

Actually, based on #178 i'm using SDDP.statistical with a 2.5% confidence interval to define the convergence of the model. Like with 50 or 55 train iterations the model converges with 3 extra workers, so i use 60 right now. By the way, it needs really fewer iterations when i use a reasonable lower bound for the objective function.

@odow
Copy link
Owner

odow commented Dec 11, 2021

That suggests the myopic policy (do the best now, ignore the future) is near optimal.

I'd still try a run with a lot more iterations (like 500) can compare plots of the two policies.

@odow
Copy link
Owner

odow commented Feb 10, 2022

Closing because there doesn't seem to be anything actionable here. I'm aware that the parallel scheme needs work, but I don't have any concrete plans to work on it.

If, future reader, this is important for you, I'm available for paid consulting.

@odow odow closed this as completed Feb 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants