Merge tag 'v4.0-rc1' into exaconstit-dev-4.0rc1

Release candidate #1 for version 4.0
rcarson3 · Apr 24, 2019 · 0bab3cb · 0bab3cb
2 parents d07f450 + ff9819e
commit 0bab3cb
Show file tree

Hide file tree

Showing 109 changed files with 9,625 additions and 1,255 deletions.
diff --git a/.gitignore b/.gitignore
@@ -169,6 +169,7 @@ miniapps/performance/sol.*
 miniapps/tools/display-basis
 miniapps/tools/load-dc
 miniapps/tools/convert-dc
+miniapps/tools/lor-transfer
 
 miniapps/nurbs/ex1
 miniapps/nurbs/ex1p

diff --git a/CHANGELOG b/CHANGELOG
@@ -8,12 +8,69 @@
                                http://mfem.org
 
 
-Version 3.4.1 (development)
-===========================
+Version 4.0-RC1, Apr 11, 2019
+=============================
+
+Requirements and Limitations
+----------------------------
+- This is a release candidate for mfem-4.0.
+- Use at your own risk -- not everything will work, the API may change.
+- We are looking for feedback from friendly users.
+- Unlike previous MFEM releases, this version requires a C++11 compiler.
+
+- GPU-related limitations:
+  * NVCC is not supported in the CMake build system yet.
+  * Element batching is currently ignored.
+  * Full-assembly (on device), element assembly, and matrix-free bilinear forms
+    are not supported yet.
+  * FunctionCoefficients do not currently work on GPUs.
+  * Partial assembly kernels are not implemented yet for simplices.
+
+GPU support
+-----------
+- Added initial support for hardware devices, such as GPUs, and programming
+  models, such as CUDA, OCCA, RAJA and OpenMP.
+
+- The GPU/device support is based on MFEM's new backends and kernels working
+  seamlessly with a new lightweight device/host memory manager. The kernels can
+  be implemented either in OCCA, or as a simple wrapper around for-loops, which
+  can then be dispatched to RAJA and native backends. See the files forall.hpp
+  and mem_manager.hpp in the general/ directory.
+
+- Several of the MFEM example codes (ex1, ex1p, ex6, and ex6p) can now take
+  advantage of GPU acceleration with the backend selectable at runtime. Many of
+  the linear algebra and finite element operations (e.g. partially assembled
+  bilinear forms) have been extended to take advantage of kernel acceleration by
+  simply replacing loops with the MFEM_FORALL() macro.
+
+- In addition to pure CUDA, the library currently supports OCCA, RAJA and OpenMP
+  kernels, which could be mixed and matched in different parts of the same
+  application. We plan on adding support for more programming models and devices
+  in the future, without the need for significant modifications in user code.
+  The list of current backends is: "occa-cuda", "raja-cuda", "cuda", "occa-omp",
+  "raja-omp", "omp", "occa-cpu", "raja-cpu", and "cpu".
+
+Discretization improvements
+---------------------------
+- Added support for a general "low-order refined"-to-"high-order" transfer of
+  GridFunction data from a "low-order refined" (LOR) space defined on a refined
+  mesh to a "high-order" (HO) finite element space defined on a coarse mesh. See
+  the new classes InterpolationGridTransfer and L2ProjectionGridTransfer and the
+  new LOR Transfer miniapp: miniapps/tools/lor-transfer.cpp.
+
+- Added support for derefinement of vector (RT + ND) spaces.
+
+- Added element flux, and flux energy computation in class ElasticityIntegrator,
+  allowing for the use of Zienkiewicz-Zhu type error estimators with the
+  integrator. For an illustration of this addition, see the new Example 22.
+
+- Added a variety of coefficients which are sums or products of existing
+  coefficients as well as grid function coefficients which return the
+  divergence, gradient, or curl of their GridFunctions.
 
 Support for wedge elements and meshes with mixed element types
 --------------------------------------------------------------
-- Added support for wedge shaped mesh elements of arbitrary order (with Geometry
+- Added support for wedge-shaped mesh elements of arbitrary order (with Geometry
   type PRISM) which have two triangular faces and three quadrilateral faces.
   Several examples of such meshes can be found in the data/ directory.
 
@@ -39,33 +96,27 @@ Other meshing improvements
   follows precisely the paper:
 
      D. Arnold, A. Mukherjee, and L. Pouly, "Locally Adapted Tetrahedral Meshes
-     Using Bisection", SIAM J. Sci. Comput., 22(2), 431–448.
+     Using Bisection", SIAM J. Sci. Comput. 22 (2000), 431–448.
 
   This guarantees that the shape regularity of the elements will be preserved
   under refinement.
 
 - Added support for parallel communication groups on non-conforming meshes.
 
+- Improved parallel partitioning of non-conforming meshes. If the coarse mesh
+  elements are ordered as a sequence of face-neighbors, the parallel partitions
+  are now guaranteed to be continuous. To that end, inline quadrilateral and
+  hexahedral meshes are now by default ordered along a space-filling curve.
+
+- A boundary in a NURBS mesh can now be connected with another boundary. Such a
+  periodic NURBS mesh is a simple way to impose periodic boundary conditions.
+
 - Added support for reading linear and quadratic 2D quadrilateral and triangular
   Cubit meshes.
 
 - The TMOP mesh optimization algorithms were extended to support user-defined
-  space-dependent limiting terms. Improved the TMOP objective functions by
-  more accurate normalization of the different terms.
-
-Discretization improvements
----------------------------
-- Added element flux, and flux energy computation in class ElasticityIntegrator,
-  allowing for the use of Zienkiewicz-Zhu type error estimators with the
-  integrator. For an illustration of this addition, see the new Example 22.
-
-- Added a variety of coefficients which are sums or products of existing
-  coefficients as well as grid function coefficients which return the
-  divergence, gradient, or curl of their GridFunctions.
-
-New and improved solvers and preconditioners
---------------------------------------------
-- Added support for parallel ILU preconditioning via hypre's Euclid solver.
+  space-dependent limiting terms. Improved the TMOP objective functions by more
+  accurate normalization of the different terms.
 
 New and updated examples and miniapps
 -------------------------------------
@@ -75,17 +126,32 @@ New and updated examples and miniapps
 - Added a new meshing miniapp, Extruder, that demonstrates the capability to
   produce 3D meshes by extruding 2D meshes.
 
+- Added a simple miniapp, LOR Transfer, for visualizing the actions of the
+  transfer operators between a high-order and a low-order refined spaces.
+
 - Added a new example, Example 20/20p, that solves a system of 1D ODEs derived
   from a Hamiltonian. The example demonstrates the use of the variable order,
   symplectic integration algorithm implemented in class SIAVSolver.
 
 - Added a new example, Example 22/22p, that illustrates the use of AMR to solve
   a linear elasticity problem. This is an extension of Example 2/2p.
 
+New and improved solvers and preconditioners
+--------------------------------------------
+- Added support for parallel ILU preconditioning via hypre's Euclid solver.
+
+- Added support for STRUMPACK v3 with a small API change in the class
+  STRUMPACKSolver, see "API changes" below.
+
 Miscellaneous
 -------------
 - Added unit tests based on the Catch++ library.
 
+- Renamed the option MFEM_USE_OPENMP to MFEM_USE_LEGACY_OPENMP. This legacy
+  option is deprecated and planned for removal in a future release. The original
+  option name, MFEM_USE_OPENMP, is now used to enable the new OpenMP backends in
+  the new kernels.
+
 - Altered the way FGMRES counts its iterations so that it matches GMRES.
 
 - Various other simplifications, extensions, and bugfixes in the code.
@@ -110,6 +176,8 @@ API changes
 - Removed the virtual method Element::GetRefinementFlag, it is only used by the
   derived class Tetrahedron.
 - Added new methods: Array::CopyTo, Tetrahedron::Init.
+- In class STRUMPACKSolver, the method SetMC64Job() was replaced by the new
+  methods: DisableMatching(), EnableMatching(), and EnableParallelMatching().
 
 
 Version 3.4, released on May 29, 2018

diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -13,6 +13,11 @@ cmake_minimum_required(VERSION 2.8.11)
 set(USER_CONFIG "${CMAKE_CURRENT_SOURCE_DIR}/config/user.cmake" CACHE PATH
   "Path to optional user configuration file.")
 
+# Require C++11 and disable compiler-specific extensions
+set(CMAKE_CXX_STANDARD 11)
+set(CMAKE_CXX_STANDARD_REQUIRED ON)
+set(CMAKE_CXX_EXTENSIONS OFF)
+
 # Load user settings before the defaults - this way the defaults will not
 # overwrite the user set options. If the user has not set all options, we still
 # have the defaults.
@@ -170,12 +175,11 @@ if (MFEM_USE_LAPACK)
 endif()
 
 # OpenMP
-if (MFEM_USE_OPENMP)
-  if (MFEM_THREAD_SAFE)
-    find_package(OpenMP REQUIRED)
-  else()
-    message(FATAL_ERROR " *** MFEM_USE_OPENMP requires MFEM_THREAD_SAFE=ON.")
+if (MFEM_USE_OPENMP OR MFEM_USE_LEGACY_OPENMP)
+  if (NOT MFEM_THREAD_SAFE AND MFEM_USE_LEGACY_OPENMP)
+    message(FATAL_ERROR " *** MFEM_USE_LEGACY_OPENMP requires MFEM_THREAD_SAFE=ON.")
   endif()
+  find_package(OpenMP REQUIRED)
 endif()
 
 # SuiteSparse (before SUNDIALS which may depend on KLU)

diff --git a/INSTALL b/INSTALL
@@ -21,6 +21,24 @@ requires an MPI C++ compiler, as well as the following external libraries:
 The METIS dependency can be disabled but that is not generally recommended, see
 the option MFEM_USE_METIS.
 
+MFEM also includes support for devices such as GPUs, and programming models such
+as CUDA, OCCA, OpenMP and RAJA.
+
+- Starting with version 4.0, MFEM requires a C++11 compiler
+
+- CUDA support requires an NVIDIA GPU and an installation of the CUDA Toolkit
+  https://developer.nvidia.com/cuda-toolkit
+
+- OCCA support requires the OCCA library
+  https://libocca.org
+
+- OpenMP support requires a compiler implementing the OpenMP API
+  https://www.openmp.org
+
+- RAJA support requires installation of the RAJA performance portability layer
+  with (optionally) support for CUDA and OpenMP
+  https://github.com/LLNL/RAJA
+
 The library supports two build systems: one based on GNU make, and a second one
 based on CMake. Both build systems are described below. Some hints for building
 without GNU make or CMake can be found at the end of this file.
@@ -47,6 +65,10 @@ Parallel build:
    (build hypre 2.10.0b in ../hypre-2.10.0b relative to mfem/)
    make parallel -j 4
 
+CUDA build:
+   make cuda -j 4
+   (build for a specific compute capability: 'make cuda -j 4 CUDA_ARCH=sm_30')
+
 Example codes (serial/parallel, depending on the build):
    cd examples
    make -j 4
@@ -57,7 +79,6 @@ Build everything (library, examples and miniapps) with current configuration:
 Quick-check the build by running Example 1/1p (optional):
    make check
 
-
 Quick start with CMake
 ======================
 Serial build:
@@ -132,6 +153,10 @@ are also defined:
    make parallel -> Builds parallel optimized version of the library
    make debug    -> Builds serial debug version of the library
    make pdebug   -> Builds parallel debug version of the library
+   make cuda     -> Builds serial cuda optimized version of the library
+   make pcuda    -> Builds parallel cuda optimized version of the library
+   make cudebug  -> Builds serial cuda debug version of the library
+   make pcudebug -> Builds parallel cuda debug version of the library
 
 Note that any of the above shortcuts accept configuration options, either at the
 command line or through a user configuration file.
@@ -193,8 +218,9 @@ Configuration options (GNU make)
 See the configuration file config/defaults.mk for the default settings.
 
 Compilers:
-   CXX    - C++ compiler, serial build
-   MPICXX - MPI C++ compiler, parallel build
+   CXX      - C++ compiler, serial build
+   MPICXX   - MPI C++ compiler, parallel build
+   CUDA_CXX - The CUDA compiler, 'nvcc'
 
 Compiler options:
    OPTIM_FLAGS - Options for optimized build
@@ -230,7 +256,7 @@ MFEM_DEBUG = YES/NO
    and consistency checks that may simplify bug-hunting.
 
 MFEM_USE_EXCEPTIONS = YES/NO
-   Enable the use of exceptions. In particular, modifies the default bahavior
+   Enable the use of exceptions. In particular, modifies the default behavior
    when errors are encountered: throw an exception, instead of aborting.
 
 MFEM_USE_LIBUNWIND = YES/NO
@@ -250,9 +276,12 @@ MFEM_THREAD_SAFE = YES/NO
    Use thread-safe implementation for some classes/methods. This comes at the
    cost of extra memory allocation and de-allocation.
 
-MFEM_USE_OPENMP = YES/NO
+MFEM_USE_LEGACY_OPENMP = YES/NO
    Enable (basic) experimental OpenMP support. Requires MFEM_THREAD_SAFE.
 
+MFEM_USE_OPENMP = YES/NO
+   Enable the OpenMP backend.
+
 MFEM_USE_MEMALLOC = YES/NO
    Internal MFEM option: enable batch allocation for some small objects.
    Recommended value is YES.
@@ -362,6 +391,29 @@ MFEM_USE_PUMI = YES/NO
    models and effectively supports automated adaptive analysis. PUMI enables
    support for parallel unstructured mesh modifications in MFEM.
 
+MFEM_USE_MM = YES/NO
+   Enables support for the MFEM's memory manager (MM), which is required to
+   support devices with different memory spaces.
+
+MFEM_USE_CUDA = YES/NO
+   Enables support for CUDA devices in MFEM. CUDA is a parallel computing
+   platform and programming model for general computing on graphical processing
+   units (GPUs). This option requires MFEM_USE_MM. The variable CUDA_ARCH is
+   used to specify the CUDA compute capability used during compilation (by
+   default, CUDA_ARCH=sm_60). When enabled, this option uses the CUDA_* build
+   options, see below.
+
+MFEM_USE_RAJA = YES/NO
+   Enable support for the RAJA performance portability layer in MFEM. RAJA
+   provides a portable abstraction for loops, supporting different programming
+   model backends. When using the RAJA CUDA backend, MFEM_USE_MM is required.
+
+MFEM_USE_OCCA = YES/NO
+   Enables support for the OCCA library in MFEM. OCCA is an open-source library
+   which aims to make it easy to program different types of devices (e.g. CPU,
+   GPU, FPGA) by providing an unified API for interacting with JIT-compiled
+   backends. When using the OCCA CUDA backend, MFEM_USE_MM is required.
+
 MFEM_BUILD_TAG = (any value)
    An optional tag to characterize the build. Exported to config/config.mk.
    Can be used to identify the MFEM build from other makefiles.
@@ -397,7 +449,8 @@ The specific libraries and their options are:
        http://math-atlas.sourceforge.net (ATLAS)
   Options: LAPACK_OPT (currently not used/needed), LAPACK_LIB.
 
-- OpenMP (optional), usually part of compiler, used when MFEM_USE_OPENMP = YES.
+- OpenMP (optional), usually part of compiler, used when either MFEM_USE_OPENMP
+  or MFEM_USE_LEGACY_OPENMP is set to YES.
   Options: OPENMP_OPT, OPENMP_LIB.
 
 - High-resolution POSIX clocks: when using MFEM_TIMER_TYPE = 2, it may be
@@ -429,7 +482,8 @@ The specific libraries and their options are:
 
 - STRUMPACK (optional), used when MFEM_USE_STRUMPACK = YES. Note that STRUMPACK
   requires the PT-Scotch and Scalapack libraries as well as ParMETIS, which
-  includes METIS 5 in its distribution.
+  includes METIS 5 in its distribution. Starting with STRUMPACK v2.2.0, ParMETIS
+  and PT-Scotch are optional dependencies.
   The support for STRUMPACK was added in MFEM v3.3.2 and it requires STRUMPACK
   2.0.0 or later.
   URL: http://portal.nersc.gov/project/sparse/strumpack
@@ -475,6 +529,18 @@ The specific libraries and their options are:
   URL: https://scorec.rpi.edu/pumi
   Options: PUMI_OPT, PUMI_LIB.
 
+- CUDA, used when MFEM_USE_CUDA = YES.
+  URL: https://developer.nvidia.com/cuda-toolkit
+  Options: CUDA_CXX, CUDA_ARCH, CUDA_OPT, CUDA_LIB.
+
+- OCCA, used when MFEM_USE_OCCA = YES.
+  URL: https://libocca.org
+  Options: OCCA_DIR, OCCA_OPT, OCCA_LIB.
+
+- RAJA, used when MFEM_USE_RAJA = YES.
+  URL: https://github.com/LLNL/RAJA
+  Options: RAJA_DIR, RAJA_OPT, RAJA_LIB.
+
 - MPFR (optional), used when MFEM_USE_MPFR = YES.
   URL: http://mpfr.org, it depends on the GMP library: https://gmplib.org
   Options: MPFR_OPT, MPFR_LIB.
@@ -596,6 +662,7 @@ MFEM_USE_METIS - Set to ${MFEM_USE_MPI}, can be overwritten.
 MFEM_USE_LIBUNWIND
 MFEM_USE_LAPACK
 MFEM_THREAD_SAFE
+MFEM_USE_LEGACY_OPENMP
 MFEM_USE_OPENMP
 MFEM_USE_MEMALLOC
 MFEM_TIMER_TYPE - Set automatically, can be overwritten.
@@ -609,6 +676,13 @@ MFEM_USE_MPFR
 MFEM_USE_GZSTREAM
 MFEM_USE_PUMI
 
+The following GNU make options are not supported with CMake yet:
+
+MFEM_USE_CUDA
+MFEM_USE_OCCA
+MFEM_USE_RAJA
+MFEM_USE_MM
+
 The following options are CMake specific:
 
 MFEM_ENABLE_TESTING  - Enable the ctest framework for testing.

diff --git a/config/cmake/MFEMConfig.cmake.in b/config/cmake/MFEMConfig.cmake.in
@@ -25,6 +25,7 @@ set(MFEM_USE_LIBUNWIND @MFEM_USE_LIBUNWIND@)
 set(MFEM_USE_LAPACK @MFEM_USE_LAPACK@)
 set(MFEM_THREAD_SAFE @MFEM_THREAD_SAFE@)
 set(MFEM_USE_OPENMP @MFEM_USE_OPENMP@)
+set(MFEM_USE_LEGACY_OPENMP @MFEM_USE_LEGACY_OPENMP@)
 set(MFEM_USE_MEMALLOC @MFEM_USE_MEMALLOC@)
 set(MFEM_TIMER_TYPE @MFEM_TIMER_TYPE@)
 set(MFEM_USE_SUNDIALS @MFEM_USE_SUNDIALS@)

diff --git a/config/cmake/config.hpp.in b/config/cmake/config.hpp.in
@@ -62,9 +62,12 @@
 // allocation and de-allocation.
 #cmakedefine MFEM_THREAD_SAFE
 
-// Enable experimental OpenMP support. Requires MFEM_THREAD_SAFE.
+// Enable the OpenMP backend.
 #cmakedefine MFEM_USE_OPENMP
 
+// [Deprecated] Enable experimental OpenMP support. Requires MFEM_THREAD_SAFE.
+#cmakedefine MFEM_USE_LEGACY_OPENMP
+
 // Enable MFEM functionality based on the Mesquite library.
 #cmakedefine MFEM_USE_MESQUITE