Merge pull request eth-cscs#40 from eth-cscs/docs

Add docs
marinlauber · Jul 3, 2023 · b24e3f2 · b24e3f2
2 parents 7a1251f + 32c4eda
commit b24e3f2
Show file tree

Hide file tree

Showing 30 changed files with 316 additions and 28 deletions.
diff --git a/.github/workflows/CompatHelper.yml b/.github/workflows/CompatHelper.yml
@@ -0,0 +1,16 @@
+name: CompatHelper
+on:
+  schedule:
+    - cron: 0 0 * * *
+  workflow_dispatch:
+jobs:
+  CompatHelper:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Pkg.add("CompatHelper")
+        run: julia -e 'using Pkg; Pkg.add("CompatHelper")'
+      - name: CompatHelper.main()
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          COMPATHELPER_PRIV: ${{ secrets.DOCUMENTER_KEY }}
+        run: julia -e 'using CompatHelper; CompatHelper.main()'
diff --git a/.github/workflows/TagBot.yml b/.github/workflows/TagBot.yml
@@ -5,7 +5,9 @@ on:
     types:
       - created
   workflow_dispatch:
-
+    inputs:
+      lookback:
+        default: 3
 jobs:
   TagBot:
     if: github.event_name == 'workflow_dispatch' || github.actor == 'JuliaTagBot'

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -1,7 +1,15 @@
 name: CI
 on:
-  - push
-  - pull_request
+  push:
+    branches:
+      - master
+    tags: '*'
+  pull_request:
+concurrency:
+  # Skip intermediate builds: always.
+  # Cancel intermediate builds: only if it is a pull request build.
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: ${{ startsWith(github.ref, 'refs/pull/') }}
 env:
   JULIA_NUM_THREADS: 2
 jobs:
@@ -14,7 +22,7 @@ jobs:
         version:
           - '1.7' # Minimum required Julia version (due to dependency of AMDGPU.jl)
           - '1'   # Latest stable 1.x release of Julia
-          - 'nightly'
+          # - 'nightly'
         os:
           - ubuntu-latest
           - macOS-latest
@@ -37,6 +45,28 @@ jobs:
             ${{ runner.os }}-test-${{ env.cache-name }}-
             ${{ runner.os }}-test-
             ${{ runner.os }}-
-      - uses: julia-actions/julia-buildpkg@latest
-      - uses: julia-actions/julia-runtest@latest
-
+      - uses: julia-actions/julia-buildpkg@v1
+      - uses: julia-actions/julia-runtest@v1
+      - uses: julia-actions/julia-processcoverage@v1
+      - uses: codecov/codecov-action@v2
+        with:
+          files: lcov.info
+  docs:
+    name: Documentation
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+      - uses: julia-actions/setup-julia@v1
+        with:
+          version: '1'
+      - uses: julia-actions/julia-buildpkg@v1
+      - uses: julia-actions/julia-docdeploy@v1
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          DOCUMENTER_KEY: ${{ secrets.DOCUMENTER_KEY }}
+      - run: |
+          julia --project=docs -e '
+            using Documenter: DocMeta, doctest
+            using ImplicitGlobalGrid
+            DocMeta.setdocmeta!(ImplicitGlobalGrid, :DocTestSetup, :(using ImplicitGlobalGrid); recursive=true)
+            doctest(ImplicitGlobalGrid)'
diff --git a/README.md b/README.md
@@ -1,16 +1,17 @@
-<h1> <img src="docs/logo/logo_ImplicitGlobalGrid.png" alt="ImplicitGlobalGrid.jl" width="50"> ImplicitGlobalGrid.jl </h1>
+<h1> <img src="docs/src/assets/logo.png" alt="ImplicitGlobalGrid.jl" width="50"> ImplicitGlobalGrid.jl </h1>
 
-[![Build Status](https://github.com/eth-cscs/ImplicitGlobalGrid.jl/workflows/CI/badge.svg)](https://github.com/eth-cscs/ImplicitGlobalGrid.jl/actions)
+[![CI](https://github.com/eth-cscs/ImplicitGlobalGrid.jl/workflows/CI/badge.svg?branch=master)](https://github.com/eth-cscs/ImplicitGlobalGrid.jl/actions/workflows/CI.yml?query=branch%3Amain)
+[![Coverage](https://codecov.io/gh/omlins/ImplicitGlobalGrid.jl/branch/main/graph/badge.svg)](https://codecov.io/gh/omlins/ImplicitGlobalGrid.jl)
 
 ImplicitGlobalGrid is an outcome of a collaboration of the Swiss National Supercomputing Centre, ETH Zurich (Dr. Samuel Omlin) with Stanford University (Dr. Ludovic Räss) and the Swiss Geocomputing Centre (Prof. Yuri Podladchikov). It renders the distributed parallelization of stencil-based GPU and CPU applications on a regular staggered grid almost trivial and enables close to ideal weak scaling of real-world applications on thousands of GPUs \[[1][JuliaCon19], [2][PASC19], [3][JuliaCon20a]\]:
 
-![Weak scaling Piz Daint](docs/images/fig_parEff_HM3D_Julia_CUDA_all_Daint_extrapol.png)
+![Weak scaling Piz Daint](docs/src/assets/images/fig_parEff_HM3D_Julia_CUDA_all_Daint_extrapol.png)
 
 ImplicitGlobalGrid relies on the Julia MPI wrapper ([MPI.jl]) to perform halo updates close to hardware limit and leverages CUDA-aware or ROCm-aware MPI for GPU-applications. The communication can straightforwardly be hidden behind computation \[[1][JuliaCon19], [3][JuliaCon20a]\] (how this can be done automatically when using ParallelStencil.jl is shown in \[[3][JuliaCon20a]\]; a general approach particularly suited for CUDA C applications is explained in \[[4][GTC19]\]).
 
 A particularity of ImplicitGlobalGrid is the automatic *implicit creation of the global computational grid* based on the number of processes the application is run with (and based on the process topology, which can be explicitly chosen by the user or automatically defined). As a consequence, the user only needs to write a code to solve his problem on one GPU/CPU (*local grid*); then, **as little as three functions can be enough to transform a single GPU/CPU application into a massively scaling Multi-GPU/CPU application**. See the [example](#multi-gpu-with-three-functions) below. 1-D, 2-D and 3-D grids are supported. Here is a sketch of the global grid that results from running a 2-D solver with 4 processes (P1-P4) (a 2x2 process topology is created by default in this case):
 
-![Implicit global grid](docs/images/implicit_global_grid.png)
+![Implicit global grid](docs/src/assets/images/implicit_global_grid.png)
 
 ## Contents
 * [Multi-GPU with three functions](#multi-gpu-with-three-functions)
@@ -157,10 +158,10 @@ diffusion3D()
 
 Here is the resulting movie when running the application on 8 GPUs, solving 3-D heat diffusion with heterogeneous heat capacity (two Gaussian anomalies) on a global computational grid of size 510x510x510 grid points. It shows the x-z-dimension plane in the middle of the dimension y:
 
-![Implicit global grid](docs/movies/diffusion3D_8gpus.gif)
+![Implicit global grid](docs/src/assets/videos/diffusion3D_8gpus.gif)
 
 The simulation producing this movie - *including the in-situ visualization* - took 29 minutes on 8 NVIDIA® Tesla® P100 GPUs on Piz Daint (an optimized solution using [CUDA.jl]'s native kernel programming capabilities can be more than 10 times faster).
-The complete example can be found [here](docs/examples/diffusion3D_multigpu_CuArrays.jl). A corresponding basic cpu-only example is available [here](docs/examples/diffusion3D_multicpu.jl) (no usage of multi-threading) and a movie of a simulation with 254x254x254 grid points which it produced within 34 minutes using 8 Intel® Xeon® E5-2690 v3 is found [here](docs/movies/diffusion3D_8cpus.gif) (with 8 processes, no multi-threading).
+The complete example can be found [here](docs/examples/diffusion3D_multigpu_CuArrays.jl). A corresponding basic cpu-only example is available [here](docs/examples/diffusion3D_multicpu.jl) (no usage of multi-threading) and a movie of a simulation with 254x254x254 grid points which it produced within 34 minutes using 8 Intel® Xeon® E5-2690 v3 is found [here](docs/src/assets/videos/diffusion3D_8cpus.gif) (with 8 processes, no multi-threading).
 
 ## Seamless interoperability with MPI.jl
 ImplicitGlobalGrid is seamlessly interoperable with [MPI.jl]. The Cartesian MPI communicator it uses is created by default when calling `init_global_grid` and can then be obtained as follows (variable `comm_cart`):

diff --git a/docs/Project.toml b/docs/Project.toml
@@ -0,0 +1,4 @@
+[deps]
+ImplicitGlobalGrid = "d35fcfd7-7af4-4c67-b1aa-d78070614af4"
+DocExtensions = "cbdad009-89f1-4e05-85a0-06b07b50707d"
+Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
diff --git a/docs/make.jl b/docs/make.jl
@@ -0,0 +1,53 @@
+using ImplicitGlobalGrid
+using Documenter
+using DocExtensions
+using DocExtensions.DocumenterExtensions
+
+const DOCSRC      = joinpath(@__DIR__, "src")
+const DOCASSETS   = joinpath(DOCSRC, "assets")
+const EXAMPLEROOT = joinpath(@__DIR__, "..", "examples")
+
+DocMeta.setdocmeta!(ImplicitGlobalGrid, :DocTestSetup, :(using ImplicitGlobalGrid); recursive=true)
+
+
+@info "Copy examples folder to assets..."
+mkpath(DOCASSETS)
+cp(EXAMPLEROOT, joinpath(DOCASSETS, "examples"); force=true)
+
+
+@info "Preprocessing .MD-files..."
+include("reflinks.jl")
+MarkdownExtensions.expand_reflinks(reflinks; rootdir=DOCSRC)
+
+
+@info "Building documentation website using Documenter.jl..."
+makedocs(;
+    modules  = [ImplicitGlobalGrid],
+    authors  = "Samuel Omlin, Ludovic Räss, Ivan Utkin",
+    repo     = "https://github.com/eth-cscs/ImplicitGlobalGrid.jl/blob/{commit}{path}#{line}",
+    sitename = "ImplicitGlobalGrid.jl",
+    format   = Documenter.HTML(;
+        prettyurls       = true,
+        canonical        = "https://omlins.github.io/ImplicitGlobalGrid.jl",
+        collapselevel    = 1,
+        sidebar_sitename = true,
+        edit_link        = "master",
+    ),
+    pages   = [
+        "Introduction"  => "index.md",
+        "Usage"         => "usage.md",
+        "Examples"      => [hide("..." => "examples.md"),
+                            "examples/diffusion3D_multigpu_CuArrays_novis.md",
+                            "examples/diffusion3D_multigpu_CuArrays_onlyvis.md",
+                           ],
+        "API reference" => "api.md",
+    ],
+)
+
+
+@info "Deploying docs..."
+deploydocs(;
+    repo         = "github.com/eth-cscs/ImplicitGlobalGrid.jl",
+    push_preview = true,
+    devbranch    = "master",
+)
diff --git a/docs/reflinks.jl b/docs/reflinks.jl
@@ -0,0 +1,17 @@
+reflinks = Dict(
+    "[AMDGPU.jl]"                 => "https://github.com/JuliaGPU/AMDGPU.jl",
+    "[CUDA.jl]"                   => "https://github.com/JuliaGPU/CUDA.jl",
+    "[GTC19]"                     => "https://on-demand.gputechconf.com/gtc/2019/video/_/S9368/",
+    "[IJulia]"                    => "https://github.com/JuliaLang/IJulia.jl",
+    "[ImplicitGlobalGrid.jl]"     => "https://github.com/eth-cscs/ImplicitGlobalGrid.jl",
+    "[JuliaCon19]"                => "https://pretalx.com/juliacon2019/talk/LGHLC3/",
+    "[JuliaCon20a]"               => "https://www.youtube.com/watch?v=vPsfZUqI4_0",
+    "[Julia CUDA paper 1]"        => "https://doi.org/10.1109/TPDS.2018.2872064",
+    "[Julia CUDA paper 2]"        => "https://doi.org/10.1016/j.advengsoft.2019.02.002",
+    "[Julia Plots documentation]" => "http://docs.juliaplots.org/latest/backends/",
+    "[Julia Plots package]"       => "https://github.com/JuliaPlots/Plots.jl",
+    "[Julia package manager]" => "https://docs.julialang.org/en/v1/stdlib/Pkg/",
+    "[Julia REPL]"            => "https://docs.julialang.org/en/v1/stdlib/REPL/",
+    "[MPI.jl]"                    => "https://github.com/JuliaParallel/MPI.jl",
+    "[PASC19]"                    => "https://pasc19.pasc-conference.org/program/schedule/index.html%3Fpost_type=page&p=10&id=msa218&sess=sess144.html",
+)
diff --git a/docs/src/api.MD b/docs/src/api.MD
@@ -0,0 +1,24 @@
+```@meta
+CurrentModule = ImplicitGlobalGrid
+```
+
+# API reference
+
+This is the offical API reference of ImplicitGlobalGrid. Note that it can also be queried interactively from the [Julia REPL] using the [help mode](https://docs.julialang.org/en/v1/stdlib/REPL/#Help-mode):
+```julia-repl
+julia> using ImplicitGlobalGrid
+julia>?
+help?> ImplicitGlobalGrid
+```
+
+## Functions
+#### Index
+```@index
+Modules = [ImplicitGlobalGrid]
+Order = [:function]
+```
+#### Documentation
+```@autodocs
+Modules = [ImplicitGlobalGrid]
+Order   = [:function]
+```
diff --git a/...ff_HM3D_Julia_CUDA_all_Daint_extrapol.png → ...ff_HM3D_Julia_CUDA_all_Daint_extrapol.png b/...ff_HM3D_Julia_CUDA_all_Daint_extrapol.png → ...ff_HM3D_Julia_CUDA_all_Daint_extrapol.png
diff --git a/docs/images/implicit_global_grid.png → ...rc/assets/images/implicit_global_grid.png b/docs/images/implicit_global_grid.png → ...rc/assets/images/implicit_global_grid.png
diff --git a/docs/logo/logo_ImplicitGlobalGrid.png → docs/src/assets/logo.png b/docs/logo/logo_ImplicitGlobalGrid.png → docs/src/assets/logo.png
diff --git a/docs/logo/logo_ImplicitGlobalGrid_avatar.png → docs/src/assets/logo_avatar.png b/docs/logo/logo_ImplicitGlobalGrid_avatar.png → docs/src/assets/logo_avatar.png
diff --git a/docs/movies/diffusion3D_8cpus.gif → docs/src/assets/videos/diffusion3D_8cpus.gif b/docs/movies/diffusion3D_8cpus.gif → docs/src/assets/videos/diffusion3D_8cpus.gif
diff --git a/docs/movies/diffusion3D_8cpus.mp4 → docs/src/assets/videos/diffusion3D_8cpus.mp4 b/docs/movies/diffusion3D_8cpus.mp4 → docs/src/assets/videos/diffusion3D_8cpus.mp4
diff --git a/docs/movies/diffusion3D_8gpus.gif → docs/src/assets/videos/diffusion3D_8gpus.gif b/docs/movies/diffusion3D_8gpus.gif → docs/src/assets/videos/diffusion3D_8gpus.gif
diff --git a/docs/movies/diffusion3D_8gpus.mp4 → docs/src/assets/videos/diffusion3D_8gpus.mp4 b/docs/movies/diffusion3D_8gpus.mp4 → docs/src/assets/videos/diffusion3D_8gpus.mp4
diff --git a/docs/src/examples.MD b/docs/src/examples.MD
@@ -0,0 +1,9 @@
+# Examples
+
+```@contents
+Pages = ["examples/diffusion3D_multigpu_CuArrays_novis.md"]
+```
+
+```@contents
+Pages = ["examples/diffusion3D_multigpu_CuArrays_onlyvis.md"]
+```
diff --git a/docs/src/examples/diffusion3D_multigpu_CuArrays_novis.MD b/docs/src/examples/diffusion3D_multigpu_CuArrays_novis.MD
@@ -0,0 +1,9 @@
+# [50-lines Multi-GPU example](@id example-50-lines)
+
+This simple Multi-GPU 3-D heat diffusion solver uses ImplicitGlobalGrid. It relies fully on the broadcasting capabilities of [CUDA.jl]'s `CuArray` type to perform the stencil-computations with maximal simplicity ([CUDA.jl] enables also writing explicit GPU kernels which can lead to significantly better performance for these computations).
+
+```@eval
+Main.mdinclude(joinpath(Main.EXAMPLEROOT, "diffusion3D_multigpu_CuArrays_novis.jl"))
+```
+
+The corresponding file can be found [here](../../../assets/examples/diffusion3D_multigpu_CuArrays_novis.jl). A basic CPU-only example is available [here](../../../assets/examples/diffusion3D_multicpu_novis.jl) (no usage of multi-threading).
diff --git a/docs/src/examples/diffusion3D_multigpu_CuArrays_onlyvis.MD b/docs/src/examples/diffusion3D_multigpu_CuArrays_onlyvis.MD
@@ -0,0 +1,14 @@
+# [Straightforward in-situ visualization / monitoring](@id in-situ-example)
+
+Thanks to the function [`gather!`](@ref), ImplicitGlobalGrid enables straightforward in-situ visualization or monitoring of Multi-GPU/CPU applications using e.g. the [Julia Plots package] as shown in the following (the GR backend is used as it is particularly fast according to the [Julia Plots documentation]). It is enough to add a couple of lines to the [previous example](@ref example-50-lines) (omitted unmodified lines are represented with `#(...)`):
+
+```@eval
+Main.mdinclude(joinpath(Main.EXAMPLEROOT, "diffusion3D_multigpu_CuArrays_onlyvis.jl"))
+```
+
+Here is the resulting movie when running the application on 8 GPUs, solving 3-D heat diffusion with heterogeneous heat capacity (two Gaussian anomalies) on a global computational grid of size 510x510x510 grid points. It shows the x-z-dimension plane in the middle of the dimension y:
+
+![Implicit global grid](../../../assets/videos/diffusion3D_8gpus.gif)
+
+The simulation producing this movie - *including the in-situ visualization* - took 29 minutes on 8 NVIDIA® Tesla® P100 GPUs on Piz Daint (an optimized solution using [CUDA.jl]'s native kernel programming capabilities can be more than 10 times faster).
+The complete example can be found [here](../../../assets/examples/diffusion3D_multigpu_CuArrays.jl). A corresponding basic cpu-only example is available [here](../../../assets/examples/diffusion3D_multicpu.jl) (no usage of multi-threading) and a movie of a simulation with 254x254x254 grid points which it produced within 34 minutes using 8 Intel® Xeon® E5-2690 v3 is found [here](../../../assets/videos/diffusion3D_8cpus.gif) (with 8 processes, no multi-threading).
diff --git a/docs/src/index.MD b/docs/src/index.MD
@@ -0,0 +1,34 @@
+# [ImplicitGlobalGrid.jl] [![Star on GitHub](https://img.shields.io/github/stars/eth-cscs/ImplicitGlobalGrid.jl.svg)](https://github.com/eth-cscs/ImplicitGlobalGrid.jl/stargazers)
+
+ImplicitGlobalGrid renders the distributed parallelization of stencil-based GPU and CPU applications on a regular staggered grid almost trivial and enables close to ideal weak scaling of real-world applications on thousands of GPUs \[[1][JuliaCon19], [2][PASC19], [3][JuliaCon20a]\]:
+
+![Weak scaling Piz Daint](./assets/images/fig_parEff_HM3D_Julia_CUDA_all_Daint_extrapol.png)
+
+ImplicitGlobalGrid relies on the Julia MPI wrapper ([MPI.jl]) to perform halo updates close to hardware limit and leverages CUDA-aware or ROCm-aware MPI for GPU-applications. The communication can straightforwardly be hidden behind computation \[[1][JuliaCon19], [3][JuliaCon20a]\] (how this can be done automatically when using ParallelStencil.jl is shown in \[[3][JuliaCon20a]\]; a general approach particularly suited for CUDA C applications is explained in \[[4][GTC19]\]).
+
+A particularity of ImplicitGlobalGrid is the automatic *implicit creation of the global computational grid* based on the number of processes the application is run with (and based on the process topology, which can be explicitly chosen by the user or automatically defined). As a consequence, the user only needs to write a code to solve his problem on one GPU/CPU (*local grid*); then, **as little as three functions can be enough to transform a single GPU/CPU application into a massively scaling Multi-GPU/CPU application**. See the [50-lines Multi-GPU example](@ref example-50-lines) in the section [Examples](@ref). 1-D, 2-D and 3-D grids are supported. Here is a sketch of the global grid that results from running a 2-D solver with 4 processes (P1-P4) (a 2x2 process topology is created by default in this case):
+
+![Implicit global grid](./assets/images/implicit_global_grid.png)
+
+
+## Dependencies
+ImplicitGlobalGrid relies on the Julia MPI wrapper ([MPI.jl]), the Julia CUDA package ([CUDA.jl] \[[5][Julia CUDA paper 1], [6][Julia CUDA paper 2]\]) and the Julia AMDGPU package ([AMDGPU.jl]).
+
+## Contributors
+The principal contributors to [ImplicitGlobalGrid.jl] are (ordered by the significance of the relative contributions):
+- Dr. Samuel Omlin ([@omlins](https://github.com/omlins)), CSCS - Swiss National Supercomputing Centre, ETH Zurich
+- Dr. Ludovic Räss ([@luraess](https://github.com/luraess)), Laboratory of Hydraulics, Hydrology, Glaciology - ETH Zurich
+- Dr. Ivan Utkin ([@utkinis](https://github.com/utkinis)), Laboratory of Hydraulics, Hydrology, Glaciology - ETH Zurich
+
+## References
+\[1\] [Räss, L., Omlin, S., & Podladchikov, Y. Y. (2019). Porting a Massively Parallel Multi-GPU Application to Julia: a 3-D Nonlinear Multi-Physics Flow Solver. JuliaCon Conference, Baltimore, USA.][JuliaCon19]
+
+\[2\] [Räss, L., Omlin, S., & Podladchikov, Y. Y. (2019). A Nonlinear Multi-Physics 3-D Solver: From CUDA C + MPI to Julia. PASC19 Conference, Zurich, Switzerland.][PASC19]
+
+\[3\] [Omlin, S., Räss, L., Kwasniewski, G., Malvoisin, B., & Podladchikov, Y. Y. (2020). Solving Nonlinear Multi-Physics on GPU Supercomputers with Julia. JuliaCon Conference, virtual.][JuliaCon20a]
+
+\[4\] [Räss, L., Omlin, S., & Podladchikov, Y. Y. (2019). Resolving Spontaneous Nonlinear Multi-Physics Flow Localisation in 3-D: Tackling Hardware Limit. GPU Technology Conference 2019, San Jose, Silicon Valley, CA, USA.][GTC19]
+
+\[5\] [Besard, T., Foket, C., & De Sutter, B. (2018). Effective Extensible Programming: Unleashing Julia on GPUs. IEEE Transactions on Parallel and Distributed Systems, 30(4), 827-841. doi: 10.1109/TPDS.2018.2872064][Julia CUDA paper 1]
+
+\[6\] [Besard, T., Churavy, V., Edelman, A., & De Sutter B. (2019). Rapid software prototyping for heterogeneous and distributed platforms. Advances in Engineering Software, 132, 29-46. doi: 10.1016/j.advengsoft.2019.02.002][Julia CUDA paper 2]