Generalize floating point type #3922

dylan-copeland · 2023-10-09T21:21:02Z

The following are working, with exceptions listed below:

All examples 0 to 37, and all miniapps except gslib, serial and parallel (with hypre), float and double.
Device sample runs for examples, with pcuda build on GPU, serial and parallel, float and double.

The following are not working or not supported. Examples and miniapps that do not work in single precision abort with an error message.

ex10(p) has a Newton convergence failure, regardless of how tolerance rel_tol is set.
ex11p, ex12p, ex13p, ex32p, nurbs_ex11p produce reasonable results, although they output Error in LOBPCG: GEVP solver failure.
Example 33 produces nan apparently in PartialFractionExpansion, where many numbers are multiplied.
ex34(p) converges to a strange solution, and I'm not sure if it is right.
miniapps/electromagnetics/maxwell computes nan in GetMaximumTimeStep().
miniapps/nurbs/nurbs_patch_ex1 fails in lapack (NNLSSolver).
miniapps/spde/generate_random_field fails with nan.
Files involving other libraries or file types are not generalized: conduitdatacollection.cpp, fmsconvert.*, gslib.*, sidredatacollection.*, fem/moonolith/*.
Files that are optionally built are not generalized: hiop.*.
Regression tests for floating point types other than double have not yet been added.

Most of the changes are simply double -> real_t, with real_t defined as either float or double in general/globals.hpp, depending on the build flag MFEM_USE_SINGLE. Some noteworthy changes are

MPI_DOUBLE -> MPITypeMap<real_t>::mpi_type
LAPACK function name changes (starting with s or d for real, c or z for complex)
fmax -> std::max, etc.
Some hard-coded tolerances depend on the type, e.g. QuadratureFunctions1D::GaussLobatto in intrules.cpp.
CUDA changes in sparsemat.cpp: 64F -> 32F, Dcsr -> Scsr.
The floating point size is used by atomicAdd in general/backends.hpp, which now has different implementations.
Many hard-coded constants (e.g. 0.0, 1.0) must be cast to real_t, one of the inconveniences that will persist after this PR is merged.
A minor correction is made to the sample runs for autodiff/seq_example.cpp and par_example.cpp.
CONTRIBUTING.md states a policy of using real_t instead of double when possible.

Support for floating point types other than float and double is left for a future PR, which should involve many fewer lines of code.

In single-precision, the unit tests build but are not expected to run. A future PR should support single-precision unit tests.

PR	Author	Editor	Reviewers	Assignment	Approval	Merge
#3922	@dylan-copeland	@tzanio	@jamiebramwell + @artv3 + @hughcars + @kmittal2 + @psocratis + @camierjs + @pazner + @cjvogl + @sebastiangrimberg + @vladotomov + @mlstowell + @acfisher + @bslazarov + @tomstitt + @sohailreddy + @v-dobrev + @YohannDudouit + @termi-official + @jandrej	11/16/23	3/25/24	3/26/24

PR Checklist

… without lapack.

fem/dgmassinv.cpp

fem/integ/bilininteg_br2.cpp

…x, etc.

…a pcuda build.

samuelpmishLLNL · 2023-11-07T21:15:23Z

Since this PR seems to be introducing a huge ABI-breaking change, would you guys consider adopting inline namespaces (for single vs. double precision) to avoid creating ABI-incompatibility headaches for mfem's users? See, for example: https://www.youtube.com/watch?v=rUESOjhvLw0

Also, is it the case that hypre is vending libraries with symbols of the same name but different underlying floating point precisions? If so, this is worrying/harmful and needs to be addressed in hypre!

jandrej · 2023-11-07T21:39:45Z

config/cmake/MFEMConfig.cmake.in

@@ -63,6 +63,7 @@ set(MFEM_USE_ALGOIM @MFEM_USE_ALGOIM@)
 set(MFEM_USE_BENCHMARK @MFEM_USE_BENCHMARK@)
 set(MFEM_USE_PARELAG @MFEM_USE_PARELAG@)
 set(MFEM_USE_ENZYME @MFEM_USE_ENZYME@)
+set(MFEM_USE_FLOAT @MFEM_USE_FLOAT@)


This really means "MFEM_USE_SINGLE_PRECISION" right?

We probably need to adjust this, but basically the idea in this branch is that we can replace all doubles with another type e.g. float

Yes, it means to use float as the floating point type, see general/globals.hpp. We may also want to add options for long double and maybe even half precision for the most adventurous users. The changes are mostly general, so that any floating point type could be used, but a few places requires specially defined constants (e.g. tolerances for Newton solvers), so we may need different flags for different precision levels. I'm sure there will be opinions about names for these flags.

That means the MFEM_USE_XXX macros are not the right way to provide this option. It's more convenient to provide a build macro like MFEM_FP_TYPE=XXX where the person who builds MFEM puts the type. This would save quite a bit of ifdefs in general/globals.hpp.

tzanio · 2023-11-08T02:58:56Z

Since this PR seems to be introducing a huge ABI-breaking change, would you guys consider adopting inline namespaces (for single vs. double precision) to avoid creating ABI-incompatibility headaches for mfem's users? See, for example: https://www.youtube.com/watch?v=rUESOjhvLw0

This is an interesting suggestion, how do you suggest we use it exactly?

tzanio · 2023-11-08T03:00:59Z

Also, is it the case that hypre is vending libraries with symbols of the same name but different underlying floating point precisions? If so, this is worrying/harmful and needs to be addressed in hypre!

We are essentially doing the same thing as hypre -- typedefing the type that was double before. I agree this is not ideal, but it is a restriction that we have no control over.

I personally am not aware of applications that link with different builds of hypre, and I will expect the same to be true for MFEM.

dylan-copeland · 2023-11-08T03:27:14Z

Since this PR seems to be introducing a huge ABI-breaking change, would you guys consider adopting inline namespaces (for single vs. double precision) to avoid creating ABI-incompatibility headaches for mfem's users? See, for example: https://www.youtube.com/watch?v=rUESOjhvLw0

@samuelpmishLLNL If I understood this correctly, for every class in MFEM (e.g. named A), we could add a few lines
#ifdef MFEM_USE_FLOAT
inline namespace mfem_float {
#else
inline namespace mfem_double {
#endif
class A {

I'm not sure if this is necessary for every class, or if this namespace could be used for the entire file. Or why not just use a normal namespace (without inline) for every file? Or is there a better way that I'm missing?

It may be sufficient to do this just for class Vector, since nothing in MFEM builds without Vector.

samuelpmishLLNL · 2023-11-08T17:51:25Z

This is an interesting suggestion, how do you suggest we use it exactly?

If I understand correctly, rather than having everything live in the mfem namespace

namespace mfem {
  class Vector { ... };
  class Mesh { ... };
}

you would have anything with a different ABI for single/double live in a separate inline namespace, like

// in some header, maybe config.hpp
#if mfem_uses_single_precision
#define PRECISION float32
#else
#define PRECISION float64
#endif

///////////////

namespace mfem {
inline namespace PRECISION {
  class Vector { ... };
  class Mesh { ... };
}
}

That way, the underlying symbols have different names and prevent a code that expects double-mfem from accidentally calling into a float-mfem binary.

We are essentially doing the same thing as hypre -- typedefing the type that was double before. I agree this is not ideal, but it is a restriction that we have no control over.

I don't agree with the second statement: hypre and mfem are written in different languages, so they have different tools available to them.

hypre is mostly a C library, so if hypre wants to support different floating point precisions the only choices I know of are:

define symbols with different names (e.g. DGESV vs SGESV)
reuse the same symbol name to mean different things depending on some preprocessor macros.

If it is the case that hypre went with option 2, that seems like the wrong choice for a linear algebra library. By comparison, LAPACK doesn't force users to pick exactly one kind of floating point precision, why should hypre?

In contrast, mfem is a C++ library, so it has way more options to choose from for managing different precisions (function overloads, namespaces, templates, ...). Choosing to use the C-language mechanisms is an option but not a requirement.

Also, if you feel that hypre is forcing you to write code you believe hurts mfem's usability then let's talk with the hypre developers about it, and see if we can address the underlying cause.

I personally am not aware of applications that link with different builds of hypre, and I will expect the same to be true for MFEM.

If I'm understanding things correctly, the causality is backwards here: applications don't link against multiple builds of hypre because hypre's typedef approach here prevents them from doing so.

I would ask that you give mfem's users the benefit of the doubt when it comes to supporting mixed precision. There are a lot of interesting research opportunities in mixed precision, and it's especially relevant for GPUs since cheap consumer hardware has single-precision performance that rivals the even the fanciest flagship GPUs:

NVIDIA RTX 4080 ($1000): 50 TFLOPS FP32
NVIDIA H100 ($30000): 50 TFLOPS FP32

why not just use a normal namespace (without inline) for every file? Or is there a better way that I'm missing?

Inline namespaces don't have to be explicitly referenced by the user, so people could still write mfem::Vector and it would map to mfem::float32::Vector (for example). Using a normal namespace would require users to rewrite their code to explicitly include the ::float32:: or ::float64:: part, which would be annoying.

It may be sufficient to do this just for class Vector, since nothing in MFEM builds without Vector.

I would bet that it's not sufficient to apply the inline namespaces to only mfem::Vector, but I would like to prototype it to make sure.

dylan-copeland · 2023-11-10T04:01:16Z

Is there a way to disable the branch history check? It fails because too many files were committed.

v-dobrev · 2023-11-10T04:07:24Z

Is there a way to disable the branch history check? It fails because too many files were committed.

You can temporarily edit this:

mfem/tests/scripts/branch-history

Lines 63 to 64 in ecfab8f

    
           # Maximum number of acceptable files changed in any commit in the branch 
        
           my $commit_max_files_changed = 50;

However, just so we don't forget to revert it before merging, create a TODO checkbox for it in the first comment above.

v-dobrev · 2024-03-24T03:16:09Z

Re-merged in next for testing...

Fix warnings about RAND_MAX when using single precision. Introduce an inline function `real_t rand_real()` that returns a random number in the interval [0,1) using rand(). This function handles better the case of single precision where the expression `real_t(rand())/(real_t(RAND_MAX)+1)` can return 1.0f due to round-off when rand() returns a number close to RAND_MAX. Use `rand_real()` in a few places that before used code similar to `real_t(rand())/(real_t(RAND_MAX)+1)`.

v-dobrev · 2024-03-25T20:09:50Z

Re-merged in next for testing...

tzanio

Thanks for all your hard work on this @dylan-copeland and @v-dobrev !

adam-sim-dev · 2024-03-27T11:40:42Z

My build with MUMPS now fails. I have a question. Should -lsmumps or -ldmumps be before -lmumps_common -lpord?

adam-sim-dev · 2024-03-27T12:37:01Z

FindMUMPS.cmake needs update too.

v-dobrev · 2024-03-27T19:13:56Z

My build with MUMPS now fails. I have a question. Should -lsmumps or -ldmumps be before -lmumps_common -lpord?

Oops, I forgot to update this part:

mfem/config/defaults.mk

Lines 323 to 327 in bcdf7cc

    
           ifeq ($(MFEM_USE_SINGLE),YES) 
        
              MUMPS_LIB += -lsmumps 
        
           else 
        
              MUMPS_LIB += -ldmumps 
        
           endif

Try replacing the if with

ifneq ($(filter single Single SINGLE,$(MFEM_PRECISION)),)

similar to here:

mfem/makefile

Line 212 in bcdf7cc

else ifneq ($(filter single Single SINGLE,$(MFEM_PRECISION)),)

Another option will be to move the logic for "MFEM_PRECISION -> MFEM_USE_SINGLE, MFEM_USE_DOUBLE" from the top level makefile to config/defaults.mk.

FindMUMPS.cmake needs update too.

You are right. If using double precision, it should still work though.

najlkin · 2024-04-29T15:28:38Z

CI build and test is done in #4262 😉

Generalized floating point type. So far, ex1 works for a serial build…

f106c03

… without lapack.

dylan-copeland added enhancement WIP Work in Progress labels Oct 9, 2023

github-advanced-security bot found potential problems Oct 9, 2023

View reviewed changes

fem/dgmassinv.cpp Fixed Show resolved Hide resolved

fem/dgmassinv.cpp Fixed Show fixed Hide fixed

fem/integ/bilininteg_br2.cpp Fixed Show resolved Hide resolved

dylan-copeland and others added 10 commits October 31, 2023 12:04

Adding build option for single-precision.

c64d785

Added support for LAPACK in single-precision. Changed fmax -> std::ma…

ee3a46a

…x, etc.

Generalized for MPI and hypre.

3b4fc3b

Finished generalizing type in fem directory.

9017b97

Generalized type for PA kernels.

d02a90f

Fix a warning from Apple clang

bada9b8

Generalized CUDA for float case, so example device runs succeed with …

c80fdc6

…a pcuda build.

Merge branch 'float' of github.com:mfem/mfem into float

69e4e13

Generalized floating point type for all remaining examples.

b892227

Generalized everything in miniapps/meshing.

9eca33b

jandrej reviewed Nov 7, 2023

View reviewed changes

dylan-copeland added 2 commits November 7, 2023 14:00

Generalized miniapps in dpg and toys.

eea5e91

Generalized miniapps in autodiff and electromagnetics.

36106d6

Generalized miniapps hdiv and hooke.

1911fbc

dylan-copeland added 2 commits November 9, 2023 13:28

Generalized more miniapps.

a3e7080

Generalized the remaining miniapps, except gslib.

686756d

dylan-copeland added 3 commits November 9, 2023 20:17

Temporary change to branch-history so CI can pass.

89f62d0

Merge branch 'master' of github.com:mfem/mfem into float

47e62a5

Style

e617238

v-dobrev added a commit that referenced this pull request Mar 24, 2024

Merge branch 'float' (PR #3922) into next

e3f4569

v-dobrev added 3 commits March 24, 2024 10:43

Update two doxygen comments

f37a814

Merge branch 'master' into float

8487564

v-dobrev added a commit that referenced this pull request Mar 25, 2024

Merge branch 'float' (PR #3922) into next

66e9e89

v-dobrev approved these changes Mar 25, 2024

View reviewed changes

v-dobrev mentioned this pull request Mar 26, 2024

Fix a synchronization issue with some HypreParMatrix constructors #4211

Merged

62 tasks

tzanio approved these changes Mar 26, 2024

View reviewed changes

tzanio merged commit bcdf7cc into master Mar 26, 2024
26 of 27 checks passed

tzanio deleted the float branch March 26, 2024 19:08

v-dobrev mentioned this pull request Mar 27, 2024

Fix MUMPS libraries link sequence #4214

Merged

61 tasks

This was referenced Mar 31, 2024

Tribol contact patch test miniapp [tribol-miniapp] #4054

Merged

Hypre runtime compute policy #3844

Merged

Dg diffusion #3904

Merged

sebastiangrimberg mentioned this pull request Apr 9, 2024

Single-precision builds awslabs/palace#227

Open

hughcars mentioned this pull request May 30, 2024

[BUG]: Mfem library not loading: single vs double precision not specified compiler-explorer/compiler-explorer#6541

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize floating point type #3922

Generalize floating point type #3922

dylan-copeland commented Oct 9, 2023 •

edited by tzanio

samuelpmishLLNL commented Nov 7, 2023

jandrej Nov 7, 2023

tzanio Nov 8, 2023

dylan-copeland Nov 8, 2023

jandrej Nov 8, 2023

tzanio commented Nov 8, 2023

tzanio commented Nov 8, 2023 •

edited

dylan-copeland commented Nov 8, 2023 •

edited

samuelpmishLLNL commented Nov 8, 2023

dylan-copeland commented Nov 10, 2023

v-dobrev commented Nov 10, 2023

v-dobrev commented Mar 24, 2024

v-dobrev commented Mar 25, 2024

tzanio left a comment

adam-sim-dev commented Mar 27, 2024

adam-sim-dev commented Mar 27, 2024

v-dobrev commented Mar 27, 2024

najlkin commented Apr 29, 2024

Generalize floating point type #3922

Generalize floating point type #3922

Conversation

dylan-copeland commented Oct 9, 2023 • edited by tzanio

samuelpmishLLNL commented Nov 7, 2023

jandrej Nov 7, 2023

Choose a reason for hiding this comment

tzanio Nov 8, 2023

Choose a reason for hiding this comment

dylan-copeland Nov 8, 2023

Choose a reason for hiding this comment

jandrej Nov 8, 2023

Choose a reason for hiding this comment

tzanio commented Nov 8, 2023

tzanio commented Nov 8, 2023 • edited

dylan-copeland commented Nov 8, 2023 • edited

samuelpmishLLNL commented Nov 8, 2023

dylan-copeland commented Nov 10, 2023

v-dobrev commented Nov 10, 2023

v-dobrev commented Mar 24, 2024

v-dobrev commented Mar 25, 2024

tzanio left a comment

Choose a reason for hiding this comment

adam-sim-dev commented Mar 27, 2024

adam-sim-dev commented Mar 27, 2024

v-dobrev commented Mar 27, 2024

najlkin commented Apr 29, 2024

dylan-copeland commented Oct 9, 2023 •

edited by tzanio

tzanio commented Nov 8, 2023 •

edited

dylan-copeland commented Nov 8, 2023 •

edited