Skip to content

Commit

Permalink
Merge tag 'v4.0-rc1' into exaconstit-dev-4.0rc1
Browse files Browse the repository at this point in the history
Release candidate #1 for version 4.0
  • Loading branch information
rcarson3 committed Apr 24, 2019
2 parents d07f450 + ff9819e commit 0bab3cb
Show file tree
Hide file tree
Showing 109 changed files with 9,625 additions and 1,255 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,7 @@ miniapps/performance/sol.*
miniapps/tools/display-basis
miniapps/tools/load-dc
miniapps/tools/convert-dc
miniapps/tools/lor-transfer

miniapps/nurbs/ex1
miniapps/nurbs/ex1p
Expand Down
108 changes: 88 additions & 20 deletions CHANGELOG
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,69 @@
http://mfem.org


Version 3.4.1 (development)
===========================
Version 4.0-RC1, Apr 11, 2019
=============================

Requirements and Limitations
----------------------------
- This is a release candidate for mfem-4.0.
- Use at your own risk -- not everything will work, the API may change.
- We are looking for feedback from friendly users.
- Unlike previous MFEM releases, this version requires a C++11 compiler.

- GPU-related limitations:
* NVCC is not supported in the CMake build system yet.
* Element batching is currently ignored.
* Full-assembly (on device), element assembly, and matrix-free bilinear forms
are not supported yet.
* FunctionCoefficients do not currently work on GPUs.
* Partial assembly kernels are not implemented yet for simplices.

GPU support
-----------
- Added initial support for hardware devices, such as GPUs, and programming
models, such as CUDA, OCCA, RAJA and OpenMP.

- The GPU/device support is based on MFEM's new backends and kernels working
seamlessly with a new lightweight device/host memory manager. The kernels can
be implemented either in OCCA, or as a simple wrapper around for-loops, which
can then be dispatched to RAJA and native backends. See the files forall.hpp
and mem_manager.hpp in the general/ directory.

- Several of the MFEM example codes (ex1, ex1p, ex6, and ex6p) can now take
advantage of GPU acceleration with the backend selectable at runtime. Many of
the linear algebra and finite element operations (e.g. partially assembled
bilinear forms) have been extended to take advantage of kernel acceleration by
simply replacing loops with the MFEM_FORALL() macro.

- In addition to pure CUDA, the library currently supports OCCA, RAJA and OpenMP
kernels, which could be mixed and matched in different parts of the same
application. We plan on adding support for more programming models and devices
in the future, without the need for significant modifications in user code.
The list of current backends is: "occa-cuda", "raja-cuda", "cuda", "occa-omp",
"raja-omp", "omp", "occa-cpu", "raja-cpu", and "cpu".

Discretization improvements
---------------------------
- Added support for a general "low-order refined"-to-"high-order" transfer of
GridFunction data from a "low-order refined" (LOR) space defined on a refined
mesh to a "high-order" (HO) finite element space defined on a coarse mesh. See
the new classes InterpolationGridTransfer and L2ProjectionGridTransfer and the
new LOR Transfer miniapp: miniapps/tools/lor-transfer.cpp.

- Added support for derefinement of vector (RT + ND) spaces.

- Added element flux, and flux energy computation in class ElasticityIntegrator,
allowing for the use of Zienkiewicz-Zhu type error estimators with the
integrator. For an illustration of this addition, see the new Example 22.

- Added a variety of coefficients which are sums or products of existing
coefficients as well as grid function coefficients which return the
divergence, gradient, or curl of their GridFunctions.

Support for wedge elements and meshes with mixed element types
--------------------------------------------------------------
- Added support for wedge shaped mesh elements of arbitrary order (with Geometry
- Added support for wedge-shaped mesh elements of arbitrary order (with Geometry
type PRISM) which have two triangular faces and three quadrilateral faces.
Several examples of such meshes can be found in the data/ directory.

Expand All @@ -39,33 +96,27 @@ Other meshing improvements
follows precisely the paper:

D. Arnold, A. Mukherjee, and L. Pouly, "Locally Adapted Tetrahedral Meshes
Using Bisection", SIAM J. Sci. Comput., 22(2), 431–448.
Using Bisection", SIAM J. Sci. Comput. 22 (2000), 431–448.

This guarantees that the shape regularity of the elements will be preserved
under refinement.

- Added support for parallel communication groups on non-conforming meshes.

- Improved parallel partitioning of non-conforming meshes. If the coarse mesh
elements are ordered as a sequence of face-neighbors, the parallel partitions
are now guaranteed to be continuous. To that end, inline quadrilateral and
hexahedral meshes are now by default ordered along a space-filling curve.

- A boundary in a NURBS mesh can now be connected with another boundary. Such a
periodic NURBS mesh is a simple way to impose periodic boundary conditions.

- Added support for reading linear and quadratic 2D quadrilateral and triangular
Cubit meshes.

- The TMOP mesh optimization algorithms were extended to support user-defined
space-dependent limiting terms. Improved the TMOP objective functions by
more accurate normalization of the different terms.

Discretization improvements
---------------------------
- Added element flux, and flux energy computation in class ElasticityIntegrator,
allowing for the use of Zienkiewicz-Zhu type error estimators with the
integrator. For an illustration of this addition, see the new Example 22.

- Added a variety of coefficients which are sums or products of existing
coefficients as well as grid function coefficients which return the
divergence, gradient, or curl of their GridFunctions.

New and improved solvers and preconditioners
--------------------------------------------
- Added support for parallel ILU preconditioning via hypre's Euclid solver.
space-dependent limiting terms. Improved the TMOP objective functions by more
accurate normalization of the different terms.

New and updated examples and miniapps
-------------------------------------
Expand All @@ -75,17 +126,32 @@ New and updated examples and miniapps
- Added a new meshing miniapp, Extruder, that demonstrates the capability to
produce 3D meshes by extruding 2D meshes.

- Added a simple miniapp, LOR Transfer, for visualizing the actions of the
transfer operators between a high-order and a low-order refined spaces.

- Added a new example, Example 20/20p, that solves a system of 1D ODEs derived
from a Hamiltonian. The example demonstrates the use of the variable order,
symplectic integration algorithm implemented in class SIAVSolver.

- Added a new example, Example 22/22p, that illustrates the use of AMR to solve
a linear elasticity problem. This is an extension of Example 2/2p.

New and improved solvers and preconditioners
--------------------------------------------
- Added support for parallel ILU preconditioning via hypre's Euclid solver.

- Added support for STRUMPACK v3 with a small API change in the class
STRUMPACKSolver, see "API changes" below.

Miscellaneous
-------------
- Added unit tests based on the Catch++ library.

- Renamed the option MFEM_USE_OPENMP to MFEM_USE_LEGACY_OPENMP. This legacy
option is deprecated and planned for removal in a future release. The original
option name, MFEM_USE_OPENMP, is now used to enable the new OpenMP backends in
the new kernels.

- Altered the way FGMRES counts its iterations so that it matches GMRES.

- Various other simplifications, extensions, and bugfixes in the code.
Expand All @@ -110,6 +176,8 @@ API changes
- Removed the virtual method Element::GetRefinementFlag, it is only used by the
derived class Tetrahedron.
- Added new methods: Array::CopyTo, Tetrahedron::Init.
- In class STRUMPACKSolver, the method SetMC64Job() was replaced by the new
methods: DisableMatching(), EnableMatching(), and EnableParallelMatching().


Version 3.4, released on May 29, 2018
Expand Down
14 changes: 9 additions & 5 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,11 @@ cmake_minimum_required(VERSION 2.8.11)
set(USER_CONFIG "${CMAKE_CURRENT_SOURCE_DIR}/config/user.cmake" CACHE PATH
"Path to optional user configuration file.")

# Require C++11 and disable compiler-specific extensions
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)

# Load user settings before the defaults - this way the defaults will not
# overwrite the user set options. If the user has not set all options, we still
# have the defaults.
Expand Down Expand Up @@ -170,12 +175,11 @@ if (MFEM_USE_LAPACK)
endif()

# OpenMP
if (MFEM_USE_OPENMP)
if (MFEM_THREAD_SAFE)
find_package(OpenMP REQUIRED)
else()
message(FATAL_ERROR " *** MFEM_USE_OPENMP requires MFEM_THREAD_SAFE=ON.")
if (MFEM_USE_OPENMP OR MFEM_USE_LEGACY_OPENMP)
if (NOT MFEM_THREAD_SAFE AND MFEM_USE_LEGACY_OPENMP)
message(FATAL_ERROR " *** MFEM_USE_LEGACY_OPENMP requires MFEM_THREAD_SAFE=ON.")
endif()
find_package(OpenMP REQUIRED)
endif()

# SuiteSparse (before SUNDIALS which may depend on KLU)
Expand Down
88 changes: 81 additions & 7 deletions INSTALL
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,24 @@ requires an MPI C++ compiler, as well as the following external libraries:
The METIS dependency can be disabled but that is not generally recommended, see
the option MFEM_USE_METIS.

MFEM also includes support for devices such as GPUs, and programming models such
as CUDA, OCCA, OpenMP and RAJA.

- Starting with version 4.0, MFEM requires a C++11 compiler

- CUDA support requires an NVIDIA GPU and an installation of the CUDA Toolkit
https://developer.nvidia.com/cuda-toolkit

- OCCA support requires the OCCA library
https://libocca.org

- OpenMP support requires a compiler implementing the OpenMP API
https://www.openmp.org

- RAJA support requires installation of the RAJA performance portability layer
with (optionally) support for CUDA and OpenMP
https://github.com/LLNL/RAJA

The library supports two build systems: one based on GNU make, and a second one
based on CMake. Both build systems are described below. Some hints for building
without GNU make or CMake can be found at the end of this file.
Expand All @@ -47,6 +65,10 @@ Parallel build:
(build hypre 2.10.0b in ../hypre-2.10.0b relative to mfem/)
make parallel -j 4

CUDA build:
make cuda -j 4
(build for a specific compute capability: 'make cuda -j 4 CUDA_ARCH=sm_30')

Example codes (serial/parallel, depending on the build):
cd examples
make -j 4
Expand All @@ -57,7 +79,6 @@ Build everything (library, examples and miniapps) with current configuration:
Quick-check the build by running Example 1/1p (optional):
make check


Quick start with CMake
======================
Serial build:
Expand Down Expand Up @@ -132,6 +153,10 @@ are also defined:
make parallel -> Builds parallel optimized version of the library
make debug -> Builds serial debug version of the library
make pdebug -> Builds parallel debug version of the library
make cuda -> Builds serial cuda optimized version of the library
make pcuda -> Builds parallel cuda optimized version of the library
make cudebug -> Builds serial cuda debug version of the library
make pcudebug -> Builds parallel cuda debug version of the library

Note that any of the above shortcuts accept configuration options, either at the
command line or through a user configuration file.
Expand Down Expand Up @@ -193,8 +218,9 @@ Configuration options (GNU make)
See the configuration file config/defaults.mk for the default settings.

Compilers:
CXX - C++ compiler, serial build
MPICXX - MPI C++ compiler, parallel build
CXX - C++ compiler, serial build
MPICXX - MPI C++ compiler, parallel build
CUDA_CXX - The CUDA compiler, 'nvcc'

Compiler options:
OPTIM_FLAGS - Options for optimized build
Expand Down Expand Up @@ -230,7 +256,7 @@ MFEM_DEBUG = YES/NO
and consistency checks that may simplify bug-hunting.

MFEM_USE_EXCEPTIONS = YES/NO
Enable the use of exceptions. In particular, modifies the default bahavior
Enable the use of exceptions. In particular, modifies the default behavior
when errors are encountered: throw an exception, instead of aborting.

MFEM_USE_LIBUNWIND = YES/NO
Expand All @@ -250,9 +276,12 @@ MFEM_THREAD_SAFE = YES/NO
Use thread-safe implementation for some classes/methods. This comes at the
cost of extra memory allocation and de-allocation.

MFEM_USE_OPENMP = YES/NO
MFEM_USE_LEGACY_OPENMP = YES/NO
Enable (basic) experimental OpenMP support. Requires MFEM_THREAD_SAFE.

MFEM_USE_OPENMP = YES/NO
Enable the OpenMP backend.

MFEM_USE_MEMALLOC = YES/NO
Internal MFEM option: enable batch allocation for some small objects.
Recommended value is YES.
Expand Down Expand Up @@ -362,6 +391,29 @@ MFEM_USE_PUMI = YES/NO
models and effectively supports automated adaptive analysis. PUMI enables
support for parallel unstructured mesh modifications in MFEM.

MFEM_USE_MM = YES/NO
Enables support for the MFEM's memory manager (MM), which is required to
support devices with different memory spaces.

MFEM_USE_CUDA = YES/NO
Enables support for CUDA devices in MFEM. CUDA is a parallel computing
platform and programming model for general computing on graphical processing
units (GPUs). This option requires MFEM_USE_MM. The variable CUDA_ARCH is
used to specify the CUDA compute capability used during compilation (by
default, CUDA_ARCH=sm_60). When enabled, this option uses the CUDA_* build
options, see below.

MFEM_USE_RAJA = YES/NO
Enable support for the RAJA performance portability layer in MFEM. RAJA
provides a portable abstraction for loops, supporting different programming
model backends. When using the RAJA CUDA backend, MFEM_USE_MM is required.

MFEM_USE_OCCA = YES/NO
Enables support for the OCCA library in MFEM. OCCA is an open-source library
which aims to make it easy to program different types of devices (e.g. CPU,
GPU, FPGA) by providing an unified API for interacting with JIT-compiled
backends. When using the OCCA CUDA backend, MFEM_USE_MM is required.

MFEM_BUILD_TAG = (any value)
An optional tag to characterize the build. Exported to config/config.mk.
Can be used to identify the MFEM build from other makefiles.
Expand Down Expand Up @@ -397,7 +449,8 @@ The specific libraries and their options are:
http://math-atlas.sourceforge.net (ATLAS)
Options: LAPACK_OPT (currently not used/needed), LAPACK_LIB.

- OpenMP (optional), usually part of compiler, used when MFEM_USE_OPENMP = YES.
- OpenMP (optional), usually part of compiler, used when either MFEM_USE_OPENMP
or MFEM_USE_LEGACY_OPENMP is set to YES.
Options: OPENMP_OPT, OPENMP_LIB.

- High-resolution POSIX clocks: when using MFEM_TIMER_TYPE = 2, it may be
Expand Down Expand Up @@ -429,7 +482,8 @@ The specific libraries and their options are:

- STRUMPACK (optional), used when MFEM_USE_STRUMPACK = YES. Note that STRUMPACK
requires the PT-Scotch and Scalapack libraries as well as ParMETIS, which
includes METIS 5 in its distribution.
includes METIS 5 in its distribution. Starting with STRUMPACK v2.2.0, ParMETIS
and PT-Scotch are optional dependencies.
The support for STRUMPACK was added in MFEM v3.3.2 and it requires STRUMPACK
2.0.0 or later.
URL: http://portal.nersc.gov/project/sparse/strumpack
Expand Down Expand Up @@ -475,6 +529,18 @@ The specific libraries and their options are:
URL: https://scorec.rpi.edu/pumi
Options: PUMI_OPT, PUMI_LIB.

- CUDA, used when MFEM_USE_CUDA = YES.
URL: https://developer.nvidia.com/cuda-toolkit
Options: CUDA_CXX, CUDA_ARCH, CUDA_OPT, CUDA_LIB.

- OCCA, used when MFEM_USE_OCCA = YES.
URL: https://libocca.org
Options: OCCA_DIR, OCCA_OPT, OCCA_LIB.

- RAJA, used when MFEM_USE_RAJA = YES.
URL: https://github.com/LLNL/RAJA
Options: RAJA_DIR, RAJA_OPT, RAJA_LIB.

- MPFR (optional), used when MFEM_USE_MPFR = YES.
URL: http://mpfr.org, it depends on the GMP library: https://gmplib.org
Options: MPFR_OPT, MPFR_LIB.
Expand Down Expand Up @@ -596,6 +662,7 @@ MFEM_USE_METIS - Set to ${MFEM_USE_MPI}, can be overwritten.
MFEM_USE_LIBUNWIND
MFEM_USE_LAPACK
MFEM_THREAD_SAFE
MFEM_USE_LEGACY_OPENMP
MFEM_USE_OPENMP
MFEM_USE_MEMALLOC
MFEM_TIMER_TYPE - Set automatically, can be overwritten.
Expand All @@ -609,6 +676,13 @@ MFEM_USE_MPFR
MFEM_USE_GZSTREAM
MFEM_USE_PUMI

The following GNU make options are not supported with CMake yet:

MFEM_USE_CUDA
MFEM_USE_OCCA
MFEM_USE_RAJA
MFEM_USE_MM

The following options are CMake specific:

MFEM_ENABLE_TESTING - Enable the ctest framework for testing.
Expand Down
1 change: 1 addition & 0 deletions config/cmake/MFEMConfig.cmake.in
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ set(MFEM_USE_LIBUNWIND @MFEM_USE_LIBUNWIND@)
set(MFEM_USE_LAPACK @MFEM_USE_LAPACK@)
set(MFEM_THREAD_SAFE @MFEM_THREAD_SAFE@)
set(MFEM_USE_OPENMP @MFEM_USE_OPENMP@)
set(MFEM_USE_LEGACY_OPENMP @MFEM_USE_LEGACY_OPENMP@)
set(MFEM_USE_MEMALLOC @MFEM_USE_MEMALLOC@)
set(MFEM_TIMER_TYPE @MFEM_TIMER_TYPE@)
set(MFEM_USE_SUNDIALS @MFEM_USE_SUNDIALS@)
Expand Down
5 changes: 4 additions & 1 deletion config/cmake/config.hpp.in
Original file line number Diff line number Diff line change
Expand Up @@ -62,9 +62,12 @@
// allocation and de-allocation.
#cmakedefine MFEM_THREAD_SAFE

// Enable experimental OpenMP support. Requires MFEM_THREAD_SAFE.
// Enable the OpenMP backend.
#cmakedefine MFEM_USE_OPENMP

// [Deprecated] Enable experimental OpenMP support. Requires MFEM_THREAD_SAFE.
#cmakedefine MFEM_USE_LEGACY_OPENMP

// Enable MFEM functionality based on the Mesquite library.
#cmakedefine MFEM_USE_MESQUITE

Expand Down
Loading

0 comments on commit 0bab3cb

Please sign in to comment.