Skip to content

Commit

Permalink
Merge pull request eth-cscs#40 from eth-cscs/docs
Browse files Browse the repository at this point in the history
Add docs
  • Loading branch information
omlins committed Jul 3, 2023
2 parents 7a1251f + 32c4eda commit b24e3f2
Show file tree
Hide file tree
Showing 30 changed files with 316 additions and 28 deletions.
16 changes: 16 additions & 0 deletions .github/workflows/CompatHelper.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
name: CompatHelper
on:
schedule:
- cron: 0 0 * * *
workflow_dispatch:
jobs:
CompatHelper:
runs-on: ubuntu-latest
steps:
- name: Pkg.add("CompatHelper")
run: julia -e 'using Pkg; Pkg.add("CompatHelper")'
- name: CompatHelper.main()
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
COMPATHELPER_PRIV: ${{ secrets.DOCUMENTER_KEY }}
run: julia -e 'using CompatHelper; CompatHelper.main()'
4 changes: 3 additions & 1 deletion .github/workflows/TagBot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ on:
types:
- created
workflow_dispatch:

inputs:
lookback:
default: 3
jobs:
TagBot:
if: github.event_name == 'workflow_dispatch' || github.actor == 'JuliaTagBot'
Expand Down
42 changes: 36 additions & 6 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,15 @@
name: CI
on:
- push
- pull_request
push:
branches:
- master
tags: '*'
pull_request:
concurrency:
# Skip intermediate builds: always.
# Cancel intermediate builds: only if it is a pull request build.
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ startsWith(github.ref, 'refs/pull/') }}
env:
JULIA_NUM_THREADS: 2
jobs:
Expand All @@ -14,7 +22,7 @@ jobs:
version:
- '1.7' # Minimum required Julia version (due to dependency of AMDGPU.jl)
- '1' # Latest stable 1.x release of Julia
- 'nightly'
# - 'nightly'
os:
- ubuntu-latest
- macOS-latest
Expand All @@ -37,6 +45,28 @@ jobs:
${{ runner.os }}-test-${{ env.cache-name }}-
${{ runner.os }}-test-
${{ runner.os }}-
- uses: julia-actions/julia-buildpkg@latest
- uses: julia-actions/julia-runtest@latest

- uses: julia-actions/julia-buildpkg@v1
- uses: julia-actions/julia-runtest@v1
- uses: julia-actions/julia-processcoverage@v1
- uses: codecov/codecov-action@v2
with:
files: lcov.info
docs:
name: Documentation
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: julia-actions/setup-julia@v1
with:
version: '1'
- uses: julia-actions/julia-buildpkg@v1
- uses: julia-actions/julia-docdeploy@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
DOCUMENTER_KEY: ${{ secrets.DOCUMENTER_KEY }}
- run: |
julia --project=docs -e '
using Documenter: DocMeta, doctest
using ImplicitGlobalGrid
DocMeta.setdocmeta!(ImplicitGlobalGrid, :DocTestSetup, :(using ImplicitGlobalGrid); recursive=true)
doctest(ImplicitGlobalGrid)'
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,17 @@
<h1> <img src="docs/logo/logo_ImplicitGlobalGrid.png" alt="ImplicitGlobalGrid.jl" width="50"> ImplicitGlobalGrid.jl </h1>
<h1> <img src="docs/src/assets/logo.png" alt="ImplicitGlobalGrid.jl" width="50"> ImplicitGlobalGrid.jl </h1>

[![Build Status](https://github.com/eth-cscs/ImplicitGlobalGrid.jl/workflows/CI/badge.svg)](https://github.com/eth-cscs/ImplicitGlobalGrid.jl/actions)
[![CI](https://github.com/eth-cscs/ImplicitGlobalGrid.jl/workflows/CI/badge.svg?branch=master)](https://github.com/eth-cscs/ImplicitGlobalGrid.jl/actions/workflows/CI.yml?query=branch%3Amain)
[![Coverage](https://codecov.io/gh/omlins/ImplicitGlobalGrid.jl/branch/main/graph/badge.svg)](https://codecov.io/gh/omlins/ImplicitGlobalGrid.jl)

ImplicitGlobalGrid is an outcome of a collaboration of the Swiss National Supercomputing Centre, ETH Zurich (Dr. Samuel Omlin) with Stanford University (Dr. Ludovic Räss) and the Swiss Geocomputing Centre (Prof. Yuri Podladchikov). It renders the distributed parallelization of stencil-based GPU and CPU applications on a regular staggered grid almost trivial and enables close to ideal weak scaling of real-world applications on thousands of GPUs \[[1][JuliaCon19], [2][PASC19], [3][JuliaCon20a]\]:

![Weak scaling Piz Daint](docs/images/fig_parEff_HM3D_Julia_CUDA_all_Daint_extrapol.png)
![Weak scaling Piz Daint](docs/src/assets/images/fig_parEff_HM3D_Julia_CUDA_all_Daint_extrapol.png)

ImplicitGlobalGrid relies on the Julia MPI wrapper ([MPI.jl]) to perform halo updates close to hardware limit and leverages CUDA-aware or ROCm-aware MPI for GPU-applications. The communication can straightforwardly be hidden behind computation \[[1][JuliaCon19], [3][JuliaCon20a]\] (how this can be done automatically when using ParallelStencil.jl is shown in \[[3][JuliaCon20a]\]; a general approach particularly suited for CUDA C applications is explained in \[[4][GTC19]\]).

A particularity of ImplicitGlobalGrid is the automatic *implicit creation of the global computational grid* based on the number of processes the application is run with (and based on the process topology, which can be explicitly chosen by the user or automatically defined). As a consequence, the user only needs to write a code to solve his problem on one GPU/CPU (*local grid*); then, **as little as three functions can be enough to transform a single GPU/CPU application into a massively scaling Multi-GPU/CPU application**. See the [example](#multi-gpu-with-three-functions) below. 1-D, 2-D and 3-D grids are supported. Here is a sketch of the global grid that results from running a 2-D solver with 4 processes (P1-P4) (a 2x2 process topology is created by default in this case):

![Implicit global grid](docs/images/implicit_global_grid.png)
![Implicit global grid](docs/src/assets/images/implicit_global_grid.png)

## Contents
* [Multi-GPU with three functions](#multi-gpu-with-three-functions)
Expand Down Expand Up @@ -157,10 +158,10 @@ diffusion3D()

Here is the resulting movie when running the application on 8 GPUs, solving 3-D heat diffusion with heterogeneous heat capacity (two Gaussian anomalies) on a global computational grid of size 510x510x510 grid points. It shows the x-z-dimension plane in the middle of the dimension y:

![Implicit global grid](docs/movies/diffusion3D_8gpus.gif)
![Implicit global grid](docs/src/assets/videos/diffusion3D_8gpus.gif)

The simulation producing this movie - *including the in-situ visualization* - took 29 minutes on 8 NVIDIA® Tesla® P100 GPUs on Piz Daint (an optimized solution using [CUDA.jl]'s native kernel programming capabilities can be more than 10 times faster).
The complete example can be found [here](docs/examples/diffusion3D_multigpu_CuArrays.jl). A corresponding basic cpu-only example is available [here](docs/examples/diffusion3D_multicpu.jl) (no usage of multi-threading) and a movie of a simulation with 254x254x254 grid points which it produced within 34 minutes using 8 Intel® Xeon® E5-2690 v3 is found [here](docs/movies/diffusion3D_8cpus.gif) (with 8 processes, no multi-threading).
The complete example can be found [here](docs/examples/diffusion3D_multigpu_CuArrays.jl). A corresponding basic cpu-only example is available [here](docs/examples/diffusion3D_multicpu.jl) (no usage of multi-threading) and a movie of a simulation with 254x254x254 grid points which it produced within 34 minutes using 8 Intel® Xeon® E5-2690 v3 is found [here](docs/src/assets/videos/diffusion3D_8cpus.gif) (with 8 processes, no multi-threading).

## Seamless interoperability with MPI.jl
ImplicitGlobalGrid is seamlessly interoperable with [MPI.jl]. The Cartesian MPI communicator it uses is created by default when calling `init_global_grid` and can then be obtained as follows (variable `comm_cart`):
Expand Down
4 changes: 4 additions & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[deps]
ImplicitGlobalGrid = "d35fcfd7-7af4-4c67-b1aa-d78070614af4"
DocExtensions = "cbdad009-89f1-4e05-85a0-06b07b50707d"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
53 changes: 53 additions & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
using ImplicitGlobalGrid
using Documenter
using DocExtensions
using DocExtensions.DocumenterExtensions

const DOCSRC = joinpath(@__DIR__, "src")
const DOCASSETS = joinpath(DOCSRC, "assets")
const EXAMPLEROOT = joinpath(@__DIR__, "..", "examples")

DocMeta.setdocmeta!(ImplicitGlobalGrid, :DocTestSetup, :(using ImplicitGlobalGrid); recursive=true)


@info "Copy examples folder to assets..."
mkpath(DOCASSETS)
cp(EXAMPLEROOT, joinpath(DOCASSETS, "examples"); force=true)


@info "Preprocessing .MD-files..."
include("reflinks.jl")
MarkdownExtensions.expand_reflinks(reflinks; rootdir=DOCSRC)


@info "Building documentation website using Documenter.jl..."
makedocs(;
modules = [ImplicitGlobalGrid],
authors = "Samuel Omlin, Ludovic Räss, Ivan Utkin",
repo = "https://github.com/eth-cscs/ImplicitGlobalGrid.jl/blob/{commit}{path}#{line}",
sitename = "ImplicitGlobalGrid.jl",
format = Documenter.HTML(;
prettyurls = true,
canonical = "https://omlins.github.io/ImplicitGlobalGrid.jl",
collapselevel = 1,
sidebar_sitename = true,
edit_link = "master",
),
pages = [
"Introduction" => "index.md",
"Usage" => "usage.md",
"Examples" => [hide("..." => "examples.md"),
"examples/diffusion3D_multigpu_CuArrays_novis.md",
"examples/diffusion3D_multigpu_CuArrays_onlyvis.md",
],
"API reference" => "api.md",
],
)


@info "Deploying docs..."
deploydocs(;
repo = "github.com/eth-cscs/ImplicitGlobalGrid.jl",
push_preview = true,
devbranch = "master",
)
17 changes: 17 additions & 0 deletions docs/reflinks.jl
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
reflinks = Dict(
"[AMDGPU.jl]" => "https://github.com/JuliaGPU/AMDGPU.jl",
"[CUDA.jl]" => "https://github.com/JuliaGPU/CUDA.jl",
"[GTC19]" => "https://on-demand.gputechconf.com/gtc/2019/video/_/S9368/",
"[IJulia]" => "https://github.com/JuliaLang/IJulia.jl",
"[ImplicitGlobalGrid.jl]" => "https://github.com/eth-cscs/ImplicitGlobalGrid.jl",
"[JuliaCon19]" => "https://pretalx.com/juliacon2019/talk/LGHLC3/",
"[JuliaCon20a]" => "https://www.youtube.com/watch?v=vPsfZUqI4_0",
"[Julia CUDA paper 1]" => "https://doi.org/10.1109/TPDS.2018.2872064",
"[Julia CUDA paper 2]" => "https://doi.org/10.1016/j.advengsoft.2019.02.002",
"[Julia Plots documentation]" => "http://docs.juliaplots.org/latest/backends/",
"[Julia Plots package]" => "https://github.com/JuliaPlots/Plots.jl",
"[Julia package manager]" => "https://docs.julialang.org/en/v1/stdlib/Pkg/",
"[Julia REPL]" => "https://docs.julialang.org/en/v1/stdlib/REPL/",
"[MPI.jl]" => "https://github.com/JuliaParallel/MPI.jl",
"[PASC19]" => "https://pasc19.pasc-conference.org/program/schedule/index.html%3Fpost_type=page&p=10&id=msa218&sess=sess144.html",
)
24 changes: 24 additions & 0 deletions docs/src/api.MD
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
```@meta
CurrentModule = ImplicitGlobalGrid
```

# API reference

This is the offical API reference of ImplicitGlobalGrid. Note that it can also be queried interactively from the [Julia REPL] using the [help mode](https://docs.julialang.org/en/v1/stdlib/REPL/#Help-mode):
```julia-repl
julia> using ImplicitGlobalGrid
julia>?
help?> ImplicitGlobalGrid
```

## Functions
#### Index
```@index
Modules = [ImplicitGlobalGrid]
Order = [:function]
```
#### Documentation
```@autodocs
Modules = [ImplicitGlobalGrid]
Order = [:function]
```
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes.
File renamed without changes
File renamed without changes.
9 changes: 9 additions & 0 deletions docs/src/examples.MD
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Examples

```@contents
Pages = ["examples/diffusion3D_multigpu_CuArrays_novis.md"]
```

```@contents
Pages = ["examples/diffusion3D_multigpu_CuArrays_onlyvis.md"]
```
9 changes: 9 additions & 0 deletions docs/src/examples/diffusion3D_multigpu_CuArrays_novis.MD
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# [50-lines Multi-GPU example](@id example-50-lines)

This simple Multi-GPU 3-D heat diffusion solver uses ImplicitGlobalGrid. It relies fully on the broadcasting capabilities of [CUDA.jl]'s `CuArray` type to perform the stencil-computations with maximal simplicity ([CUDA.jl] enables also writing explicit GPU kernels which can lead to significantly better performance for these computations).

```@eval
Main.mdinclude(joinpath(Main.EXAMPLEROOT, "diffusion3D_multigpu_CuArrays_novis.jl"))
```

The corresponding file can be found [here](../../../assets/examples/diffusion3D_multigpu_CuArrays_novis.jl). A basic CPU-only example is available [here](../../../assets/examples/diffusion3D_multicpu_novis.jl) (no usage of multi-threading).
14 changes: 14 additions & 0 deletions docs/src/examples/diffusion3D_multigpu_CuArrays_onlyvis.MD
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# [Straightforward in-situ visualization / monitoring](@id in-situ-example)

Thanks to the function [`gather!`](@ref), ImplicitGlobalGrid enables straightforward in-situ visualization or monitoring of Multi-GPU/CPU applications using e.g. the [Julia Plots package] as shown in the following (the GR backend is used as it is particularly fast according to the [Julia Plots documentation]). It is enough to add a couple of lines to the [previous example](@ref example-50-lines) (omitted unmodified lines are represented with `#(...)`):

```@eval
Main.mdinclude(joinpath(Main.EXAMPLEROOT, "diffusion3D_multigpu_CuArrays_onlyvis.jl"))
```

Here is the resulting movie when running the application on 8 GPUs, solving 3-D heat diffusion with heterogeneous heat capacity (two Gaussian anomalies) on a global computational grid of size 510x510x510 grid points. It shows the x-z-dimension plane in the middle of the dimension y:

![Implicit global grid](../../../assets/videos/diffusion3D_8gpus.gif)

The simulation producing this movie - *including the in-situ visualization* - took 29 minutes on 8 NVIDIA® Tesla® P100 GPUs on Piz Daint (an optimized solution using [CUDA.jl]'s native kernel programming capabilities can be more than 10 times faster).
The complete example can be found [here](../../../assets/examples/diffusion3D_multigpu_CuArrays.jl). A corresponding basic cpu-only example is available [here](../../../assets/examples/diffusion3D_multicpu.jl) (no usage of multi-threading) and a movie of a simulation with 254x254x254 grid points which it produced within 34 minutes using 8 Intel® Xeon® E5-2690 v3 is found [here](../../../assets/videos/diffusion3D_8cpus.gif) (with 8 processes, no multi-threading).
34 changes: 34 additions & 0 deletions docs/src/index.MD
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# [ImplicitGlobalGrid.jl] [![Star on GitHub](https://img.shields.io/github/stars/eth-cscs/ImplicitGlobalGrid.jl.svg)](https://github.com/eth-cscs/ImplicitGlobalGrid.jl/stargazers)

ImplicitGlobalGrid renders the distributed parallelization of stencil-based GPU and CPU applications on a regular staggered grid almost trivial and enables close to ideal weak scaling of real-world applications on thousands of GPUs \[[1][JuliaCon19], [2][PASC19], [3][JuliaCon20a]\]:

![Weak scaling Piz Daint](./assets/images/fig_parEff_HM3D_Julia_CUDA_all_Daint_extrapol.png)

ImplicitGlobalGrid relies on the Julia MPI wrapper ([MPI.jl]) to perform halo updates close to hardware limit and leverages CUDA-aware or ROCm-aware MPI for GPU-applications. The communication can straightforwardly be hidden behind computation \[[1][JuliaCon19], [3][JuliaCon20a]\] (how this can be done automatically when using ParallelStencil.jl is shown in \[[3][JuliaCon20a]\]; a general approach particularly suited for CUDA C applications is explained in \[[4][GTC19]\]).

A particularity of ImplicitGlobalGrid is the automatic *implicit creation of the global computational grid* based on the number of processes the application is run with (and based on the process topology, which can be explicitly chosen by the user or automatically defined). As a consequence, the user only needs to write a code to solve his problem on one GPU/CPU (*local grid*); then, **as little as three functions can be enough to transform a single GPU/CPU application into a massively scaling Multi-GPU/CPU application**. See the [50-lines Multi-GPU example](@ref example-50-lines) in the section [Examples](@ref). 1-D, 2-D and 3-D grids are supported. Here is a sketch of the global grid that results from running a 2-D solver with 4 processes (P1-P4) (a 2x2 process topology is created by default in this case):

![Implicit global grid](./assets/images/implicit_global_grid.png)


## Dependencies
ImplicitGlobalGrid relies on the Julia MPI wrapper ([MPI.jl]), the Julia CUDA package ([CUDA.jl] \[[5][Julia CUDA paper 1], [6][Julia CUDA paper 2]\]) and the Julia AMDGPU package ([AMDGPU.jl]).

## Contributors
The principal contributors to [ImplicitGlobalGrid.jl] are (ordered by the significance of the relative contributions):
- Dr. Samuel Omlin ([@omlins](https://github.com/omlins)), CSCS - Swiss National Supercomputing Centre, ETH Zurich
- Dr. Ludovic Räss ([@luraess](https://github.com/luraess)), Laboratory of Hydraulics, Hydrology, Glaciology - ETH Zurich
- Dr. Ivan Utkin ([@utkinis](https://github.com/utkinis)), Laboratory of Hydraulics, Hydrology, Glaciology - ETH Zurich

## References
\[1\] [Räss, L., Omlin, S., & Podladchikov, Y. Y. (2019). Porting a Massively Parallel Multi-GPU Application to Julia: a 3-D Nonlinear Multi-Physics Flow Solver. JuliaCon Conference, Baltimore, USA.][JuliaCon19]

\[2\] [Räss, L., Omlin, S., & Podladchikov, Y. Y. (2019). A Nonlinear Multi-Physics 3-D Solver: From CUDA C + MPI to Julia. PASC19 Conference, Zurich, Switzerland.][PASC19]

\[3\] [Omlin, S., Räss, L., Kwasniewski, G., Malvoisin, B., & Podladchikov, Y. Y. (2020). Solving Nonlinear Multi-Physics on GPU Supercomputers with Julia. JuliaCon Conference, virtual.][JuliaCon20a]

\[4\] [Räss, L., Omlin, S., & Podladchikov, Y. Y. (2019). Resolving Spontaneous Nonlinear Multi-Physics Flow Localisation in 3-D: Tackling Hardware Limit. GPU Technology Conference 2019, San Jose, Silicon Valley, CA, USA.][GTC19]

\[5\] [Besard, T., Foket, C., & De Sutter, B. (2018). Effective Extensible Programming: Unleashing Julia on GPUs. IEEE Transactions on Parallel and Distributed Systems, 30(4), 827-841. doi: 10.1109/TPDS.2018.2872064][Julia CUDA paper 1]

\[6\] [Besard, T., Churavy, V., Edelman, A., & De Sutter B. (2019). Rapid software prototyping for heterogeneous and distributed platforms. Advances in Engineering Software, 132, 29-46. doi: 10.1016/j.advengsoft.2019.02.002][Julia CUDA paper 2]
Loading

0 comments on commit b24e3f2

Please sign in to comment.