-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelism in the vertical #20
Comments
As outlined here, written with |
With #117 the |
One thread julia> run_speedy(Float32,PrimitiveDryCore,trunc=127,nlev=26,n_days=1,orography=ZonalRidge)
475.47 days/day vs 8 threads (4.1x) julia> run_speedy(Float32,PrimitiveDryCore,trunc=127,nlev=26,n_days=1,orography=ZonalRidge)
5.50 years/day vs 16 threads (6.1x) julia> run_speedy(Float32,PrimitiveDryCore,trunc=127,nlev=26,n_days=1,orography=ZonalRidge)
8.21 years/day |
That's a good resource, thanks for sharing. In contrast to mergesort the tasks here are quite a bit more expensive, e.g. we can multi-thread this part of the right-hand side completely @floop for layer in diagn.layers
vertical_velocity!(layer,surface,model) # calculate σ̇ for the vertical mass flux M = pₛσ̇
# add the RTₖlnpₛ term to geopotential
linear_pressure_gradient!(layer,progn,model,lf_implicit)
vertical_advection!(layer,diagn,model) # use σ̇ for the vertical advection of u,v,T,q
vordiv_tendencies!(layer,surface,model) # vorticity advection, pressure gradient term
temperature_tendency!(layer,surface,model) # hor. advection + adiabatic term
humidity_tendency!(layer,model) # horizontal advection of humidity (nothing for wetcore)
bernoulli_potential!(layer,S) # add -∇²(E+ϕ+RTₖlnpₛ) term to div tendency
end across layers, which includes several spectral transforms and other expensive operations. So on this I'd expect scaling to be nearly perfect. On the other hand, there's things like the vertical integrations and geopotential which are currently single-threaded (maybe the can be spawned off though). So in short: I haven't done enough profiling to know what parallelism potential there is and at what resolution/number of vertical levels. |
One easy way to parallelise speedy might be to distribute the calculation of the spectral transform across
n
workers in the vertical, using SharedArrays (documentation here) from Julia's standard library. While this limits us ton
x speedups from parallelisation, which might be fine as SpeedyWeather.jl will probably run on small clusters only anyway. Such that for T30 andn=8
levels we can (hopefully) efficiently run on 8 cores, but forn=48,64
levels we could get significant speedups for higher resolution versions of SpeedyWeather.jl (T100-T500). Given the shared memory of this approach, we'll be limited by the 48 cores on A64FX, but that might be absolutely sufficient for now.The text was updated successfully, but these errors were encountered: