Feature Request: MPI support for sequential tasks #11

hz-xiaxz · 2024-07-27T09:36:44Z

Hello! Thanks for your robust and nice-documented package!
In Variational Monte Carlo tasks one needs to perform Monte Carlo calculation in sequence. That's because the variational parameter of the subsequent task should relate to the MC results from the previous one. To employ this, I hacked the job file as follow:

for _ in 1:SRsteps
    tm = TaskMaker()
    # set tm paras here
    tm.g = g
    task(tm)

    dir = @__DIR__
    savepath = dir * "/../data/" * process_time *
               "/$(tm.nx)x$(tm.ny)g=$(tm.g)"
    job = JobInfo(
        savepath,
        FastFermionSampling.MC;
        tasks = make_tasks(tm),
        checkpoint_time = "30:00",
        run_time = "24:00:00"
    )

    with_logger(Carlo.default_logger()) do
        start(Carlo.SingleScheduler, job)
        # start(Carlo.MPIScheduler, job)
    end 

    update_g()
end

It runs all well with SingleScheduler, but with MPIScheduler it throws errors like Stacktrace:running in parallel run mode but measure(::MC, ::MCContext, ::MPI.Comm) not implemented. I wonder if MPI in this job script is simply Not Supported or can be done with the configuration of MPI.comm? I still use the mpirun -n 96 julia ./job.jl command line script to run this job.

To put it more further, will it be more elegant to add this feature into the JobTools module? Like letting tasks=make_tasks(tm) have different parallel and sequential modes? I'm not quite sure if it is easy to be done.

The text was updated successfully, but these errors were encountered:

lukas-weber · 2024-07-27T10:17:53Z

Hi,

Basically this should work. But what you have run into for some reason with that error is Parallel run mode. Usually that should not happen if you use MPIScheduler without setting the ranks_per_run option, I’ll investigate it later.

But Parallel run mode would enable you to do something similar: you can MPI-parallelize your own section of the code. That is why it asks you to implement this version of measure and other methods, which get a communicator. Maybe an elegant way to do this in the first place is to use parallel run mode to put that whole update loop into your code rather than in the jobfile.

function Carlo.sweep!(mc::MC, ctx::MCContext, comm::MPI.Comm)
    sample_gradients_in_parallel!(mc, ctx, comm)
    if time_to_update_parameters()
        update_parameters!(mc, comm)
    end
end

The downside is that you have to write some MPI code yourself, but the result will be faster because you don’t have to write everything to disk every time.

hz-xiaxz · 2024-07-28T07:59:58Z

Thanks for your prompt reply! I'll further check the Parallel run mode out.
One more question here, do you mean that I should wrap the updateConfiguration (the old sweep!(mc, ctx) function), the measure! functions into the new sweep!(mc,ctx,comm) function? And thus new sweep!(mc,ctx,comm) will only update the parameters in a sweep?

lukas-weber · 2024-07-28T11:52:40Z

The new sweep function has to do everything the old one does and then some manual communication to exchange the data for the gradients before updating the parameters and syncing them across the workers. (Probably you can get away with an MPI.gather and MPI.broadcast) Instead of using the Carlo measurements for accumulating the gradients, you would have to average them manually.

…

-------- Original Message --------

On 7/28/24 04:00, LeoXia wrote: Thanks for your prompt reply! I'll further check the Parallel run mode out. One more question here, do you mean that I should wrap the updateConfiguration (the old sweep!(mc, ctx) function), the measure! functions into the new sweep!(mc,ctx,comm) function? And thus new sweep!(mc,ctx,comm) will only update the parameters in a sweep? — Reply to this email directly, [view it on GitHub](#11 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/ALX63H6RENQK6CZBMOKMFCTZOSQJHAVCNFSM6AAAAABLRXNWZWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENJUGM4DGNZZG4). You are receiving this because you commented.Message ID: ***@***.***>

hz-xiaxz · 2024-07-28T11:55:19Z

Got it, thanks for you help! And I think this issue can be closed.

hz-xiaxz closed this as completed Jul 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: MPI support for sequential tasks #11

Feature Request: MPI support for sequential tasks #11

hz-xiaxz commented Jul 27, 2024

lukas-weber commented Jul 27, 2024

hz-xiaxz commented Jul 28, 2024

lukas-weber commented Jul 28, 2024 via email

hz-xiaxz commented Jul 28, 2024

Feature Request: MPI support for sequential tasks #11

Feature Request: MPI support for sequential tasks #11

Comments

hz-xiaxz commented Jul 27, 2024

lukas-weber commented Jul 27, 2024

hz-xiaxz commented Jul 28, 2024

lukas-weber commented Jul 28, 2024 via email

hz-xiaxz commented Jul 28, 2024