Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelism in the vertical #20

Closed
milankl opened this issue Dec 14, 2021 · 5 comments · Fixed by #264
Closed

Parallelism in the vertical #20

milankl opened this issue Dec 14, 2021 · 5 comments · Fixed by #264
Assignees
Labels
parallel 🐎 Things being computed in parallel performance 🚀 Faster faster! vertical ⬆️ Affecting the vertical dimension

Comments

@milankl
Copy link
Member

milankl commented Dec 14, 2021

One easy way to parallelise speedy might be to distribute the calculation of the spectral transform across n workers in the vertical, using SharedArrays (documentation here) from Julia's standard library. While this limits us to nx speedups from parallelisation, which might be fine as SpeedyWeather.jl will probably run on small clusters only anyway. Such that for T30 and n=8 levels we can (hopefully) efficiently run on 8 cores, but for n=48,64 levels we could get significant speedups for higher resolution versions of SpeedyWeather.jl (T100-T500). Given the shared memory of this approach, we'll be limited by the 48 cores on A64FX, but that might be absolutely sufficient for now.

@milankl milankl added the performance 🚀 Faster faster! label Apr 5, 2022
@milankl
Copy link
Member Author

milankl commented Apr 27, 2022

As outlined here, written with @eval metaprogramming style one could similar collect all functions that need a vertical loop and think about adding an @distributed here once SharedArrays are set up
https://github.com/milankl/SpeedyWeather.jl/blob/f21b69eae24f9be0b0667ebe381671d994b9b14f/src/distributed_vertical.jl#L1-L16

@milankl milankl added the parallel 🐎 Things being computed in parallel label May 20, 2022
@milankl
Copy link
Member Author

milankl commented Aug 15, 2022

With #117 the @eval-based looping on the level of individiual functions has been removed. We now have, e.g. for the barotropic model a single place in timestep! where the looping over the vertical is happening, hence we moved that loop as far up as possible
https://github.com/milankl/SpeedyWeather.jl/blob/4a3ad4a7659bd1ae8437b15617781420b396678e/src/time_integration.jl#L246-L252
this is where any @distributed-like parallelism should be applied, as the evaluation within the loop is conceptually independent across layers.

@milankl milankl added this to the v0.5 milestone Aug 31, 2022
@milankl milankl added the vertical ⬆️ Affecting the vertical dimension label Oct 10, 2022
@milankl
Copy link
Member Author

milankl commented Mar 22, 2023

One thread

julia> run_speedy(Float32,PrimitiveDryCore,trunc=127,nlev=26,n_days=1,orography=ZonalRidge)
475.47 days/day

vs 8 threads (4.1x)

julia> run_speedy(Float32,PrimitiveDryCore,trunc=127,nlev=26,n_days=1,orography=ZonalRidge)
5.50 years/day

vs 16 threads (6.1x)

julia> run_speedy(Float32,PrimitiveDryCore,trunc=127,nlev=26,n_days=1,orography=ZonalRidge)
8.21 years/day

@milankl milankl linked a pull request Mar 27, 2023 that will close this issue
@white-alistair
Copy link
Member

@milankl what's the versioninfo for these tests? Where do you suspect the bottlenecks are? What kind of scaling would you be satisfied with?

One thing I noticed is that the speedups here are quite consistent with the parallel mergesort showcased in the original Julia multi- threading blogpost.

@milankl
Copy link
Member Author

milankl commented Mar 29, 2023

That's a good resource, thanks for sharing. In contrast to mergesort the tasks here are quite a bit more expensive, e.g. we can multi-thread this part of the right-hand side completely

    @floop for layer in diagn.layers
        vertical_velocity!(layer,surface,model)     # calculate σ̇ for the vertical mass flux M = pₛσ̇
                                                    # add the RTₖlnpₛ term to geopotential
        linear_pressure_gradient!(layer,progn,model,lf_implicit)
        vertical_advection!(layer,diagn,model)      # use σ̇ for the vertical advection of u,v,T,q

        vordiv_tendencies!(layer,surface,model)     # vorticity advection, pressure gradient term
        temperature_tendency!(layer,surface,model)  # hor. advection + adiabatic term
        humidity_tendency!(layer,model)             # horizontal advection of humidity (nothing for wetcore)
        bernoulli_potential!(layer,S)               # add -∇²(E+ϕ+RTₖlnpₛ) term to div tendency
    end

across layers, which includes several spectral transforms and other expensive operations. So on this I'd expect scaling to be nearly perfect. On the other hand, there's things like the vertical integrations and geopotential which are currently single-threaded (maybe the can be spawned off though). So in short: I haven't done enough profiling to know what parallelism potential there is and at what resolution/number of vertical levels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parallel 🐎 Things being computed in parallel performance 🚀 Faster faster! vertical ⬆️ Affecting the vertical dimension
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants