Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: MPI support for sequential tasks #11

Closed
hz-xiaxz opened this issue Jul 27, 2024 · 4 comments
Closed

Feature Request: MPI support for sequential tasks #11

hz-xiaxz opened this issue Jul 27, 2024 · 4 comments

Comments

@hz-xiaxz
Copy link
Contributor

Hello! Thanks for your robust and nice-documented package!
In Variational Monte Carlo tasks one needs to perform Monte Carlo calculation in sequence. That's because the variational parameter of the subsequent task should relate to the MC results from the previous one. To employ this, I hacked the job file as follow:

for _ in 1:SRsteps
    tm = TaskMaker()
    # set tm paras here
    tm.g = g
    task(tm)

    dir = @__DIR__
    savepath = dir * "/../data/" * process_time *
               "/$(tm.nx)x$(tm.ny)g=$(tm.g)"
    job = JobInfo(
        savepath,
        FastFermionSampling.MC;
        tasks = make_tasks(tm),
        checkpoint_time = "30:00",
        run_time = "24:00:00"
    )

    with_logger(Carlo.default_logger()) do
        start(Carlo.SingleScheduler, job)
        # start(Carlo.MPIScheduler, job)
    end 

    update_g()
end

It runs all well with SingleScheduler, but with MPIScheduler it throws errors like Stacktrace:running in parallel run mode but measure(::MC, ::MCContext, ::MPI.Comm) not implemented. I wonder if MPI in this job script is simply Not Supported or can be done with the configuration of MPI.comm? I still use the mpirun -n 96 julia ./job.jl command line script to run this job.

To put it more further, will it be more elegant to add this feature into the JobTools module? Like letting tasks=make_tasks(tm) have different parallel and sequential modes? I'm not quite sure if it is easy to be done.

@lukas-weber
Copy link
Owner

Hi,

Basically this should work. But what you have run into for some reason with that error is Parallel run mode. Usually that should not happen if you use MPIScheduler without setting the ranks_per_run option, I’ll investigate it later.

But Parallel run mode would enable you to do something similar: you can MPI-parallelize your own section of the code. That is why it asks you to implement this version of measure and other methods, which get a communicator. Maybe an elegant way to do this in the first place is to use parallel run mode to put that whole update loop into your code rather than in the jobfile.

function Carlo.sweep!(mc::MC, ctx::MCContext, comm::MPI.Comm)
    sample_gradients_in_parallel!(mc, ctx, comm)
    if time_to_update_parameters()
        update_parameters!(mc, comm)
    end
end

The downside is that you have to write some MPI code yourself, but the result will be faster because you don’t have to write everything to disk every time.

@hz-xiaxz
Copy link
Contributor Author

Thanks for your prompt reply! I'll further check the Parallel run mode out.
One more question here, do you mean that I should wrap the updateConfiguration (the old sweep!(mc, ctx) function), the measure! functions into the new sweep!(mc,ctx,comm) function? And thus new sweep!(mc,ctx,comm) will only update the parameters in a sweep?

@lukas-weber
Copy link
Owner

lukas-weber commented Jul 28, 2024 via email

@hz-xiaxz
Copy link
Contributor Author

Got it, thanks for you help! And I think this issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants