Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize floating point type #3922

Merged
merged 92 commits into from
Mar 26, 2024
Merged

Generalize floating point type #3922

merged 92 commits into from
Mar 26, 2024

Conversation

dylan-copeland
Copy link
Member

@dylan-copeland dylan-copeland commented Oct 9, 2023

See TODO items

The following are working, with exceptions listed below:

  • All examples 0 to 37, and all miniapps except gslib, serial and parallel (with hypre), float and double.
  • Device sample runs for examples, with pcuda build on GPU, serial and parallel, float and double.

The following are not working or not supported. Examples and miniapps that do not work in single precision abort with an error message.

  • ex10(p) has a Newton convergence failure, regardless of how tolerance rel_tol is set.
  • ex11p, ex12p, ex13p, ex32p, nurbs_ex11p produce reasonable results, although they output Error in LOBPCG: GEVP solver failure.
  • Example 33 produces nan apparently in PartialFractionExpansion, where many numbers are multiplied.
  • ex34(p) converges to a strange solution, and I'm not sure if it is right.
  • miniapps/electromagnetics/maxwell computes nan in GetMaximumTimeStep().
  • miniapps/nurbs/nurbs_patch_ex1 fails in lapack (NNLSSolver).
  • miniapps/spde/generate_random_field fails with nan.
  • Files involving other libraries or file types are not generalized: conduitdatacollection.cpp, fmsconvert.*, gslib.*, sidredatacollection.*, fem/moonolith/*.
  • Files that are optionally built are not generalized: hiop.*.
  • Regression tests for floating point types other than double have not yet been added.

Most of the changes are simply double -> real_t, with real_t defined as either float or double in general/globals.hpp, depending on the build flag MFEM_USE_SINGLE. Some noteworthy changes are

  • MPI_DOUBLE -> MPITypeMap<real_t>::mpi_type
  • LAPACK function name changes (starting with s or d for real, c or z for complex)
  • fmax -> std::max, etc.
  • Some hard-coded tolerances depend on the type, e.g. QuadratureFunctions1D::GaussLobatto in intrules.cpp.
  • CUDA changes in sparsemat.cpp: 64F -> 32F, Dcsr -> Scsr.
  • The floating point size is used by atomicAdd in general/backends.hpp, which now has different implementations.
  • Many hard-coded constants (e.g. 0.0, 1.0) must be cast to real_t, one of the inconveniences that will persist after this PR is merged.
  • A minor correction is made to the sample runs for autodiff/seq_example.cpp and par_example.cpp.
  • CONTRIBUTING.md states a policy of using real_t instead of double when possible.

Support for floating point types other than float and double is left for a future PR, which should involve many fewer lines of code.

In single-precision, the unit tests build but are not expected to run. A future PR should support single-precision unit tests.

PR Author Editor Reviewers Assignment Approval Merge
#3922 @dylan-copeland @tzanio @jamiebramwell + @artv3 + @hughcars + @kmittal2 + @psocratis + @camierjs + @pazner + @cjvogl + @sebastiangrimberg + @vladotomov + @mlstowell + @acfisher + @bslazarov + @tomstitt + @sohailreddy + @v-dobrev + @YohannDudouit + @termi-official + @jandrej 11/16/23 3/25/24 3/26/24
PR Checklist
  • Code builds.
  • Code passes make style.
  • Update CHANGELOG:
    • Is this a new feature users need to be aware of? New or updated example or miniapp?
    • Does it make sense to create a new section in the CHANGELOG to group with other related features?
  • Update INSTALL:
    • Had a new optional library been added? If so, what range of versions of this library are required? (Make sure the external library is compatible with our BSD license, e.g. it is not licensed under GPL!)
    • Have the version ranges for any required or optional libraries changed?
    • Does make or cmake have a new target?
    • Did the requirements or the installation process change? (rare)
  • Update continuous integration server configurations if necessary (e.g. with new version requirements for each of MFEM's dependencies)
    • .github
    • .appveyor.yml
  • Update .gitignore:
    • Check if make distclean; git status shows any files that were generated from the source by the project (not an IDE) but we don't want to track in the repository.
    • Add new patterns (just for the new files above) and re-run the above test.
  • New examples:
    • All sample runs at the top of the example source file work.
    • Update examples/makefile:
      • Add the example code to the appropriate SEQ_EXAMPLES and PAR_EXAMPLES variables.
      • Add any files generated by it to the clean target.
      • Add the example binary and any files generated by it to the top-level .gitignore file.
    • Update examples/CMakeLists.txt:
      • Add the example code to the ALL_EXE_SRCS variable.
      • Make sure THIS_TEST_OPTIONS is set correctly for the new example.
    • List the new example in doc/CodeDocumentation.dox.
    • If new examples directory (e.g.examples/pumi), list it in doc/CodeDocumentation.conf.in
    • Companion pull request for documentation in mfem/web repo:
      • Update or add example-specific documentation, see e.g. the src/examples.md.
      • Add the description, labels and screenshots in src/examples.md and src/img.
      • In examples.md, list the example under the appropriate categories, add new categories if necessary.
      • Add a short description of the example in the "Extensive Examples" section of features.md.
  • New miniapps:
    • All sample runs at the top of the miniapp source file work.
    • Update top-level makefile and makefile in corresponding miniapp directory.
    • Add the miniapp binary and any files generated by it to the top-level .gitignore file.
    • Update CMake build system:
      • Update the CMakeLists.txt file in the miniapps directory, if the new miniapp is in a new directory.
      • Add/update the CMakeLists.txt file in the new miniapp directory.
      • Consider adding a new test for the new miniapp.
    • List the new miniapp in doc/CodeDocumentation.dox
    • If new miniapps directory (e.g.miniapps/nurbs), add it to MINIAPP_SUBDIRS in the makefile.
    • If new miniapps directory (e.g.miniapps/nurbs), list it in doc/CodeDocumentation.conf.in
    • Companion pull request for documentation in mfem/web repo:
      • Update or add miniapp-specific documentation, see e.g. the src/meshing.md and src/electromagnetics.md files.
      • Add the description, labels and screenshots in src/examples.md and src/img.
      • The miniapps go at the end of the page, and are usually listed only under a specific "Application (PDE)" category.
      • Add a short description of the miniapp in the "Extensive Examples" section of features.md.
  • New capability:
    • All new public, protected, and private classes, methods, data members, and functions have full Doxygen-style documentation in source comments. Documentation should include descriptions of member data, function arguments and return values, template parameters, and prerequisites for calling new functions.
    • Pointer arguments and return values must specify whether ownership is being transferred or lent with the call.
    • Any new functions should include descriptions of their intended use e.g. for internal use only, user-facing, etc., along with references to example code whenever possible/appropriate.
    • Consider adding new sample runs in existing examples to highlight the new capability.
    • Consider saving cool simulation pictures with the new capability in the Confluence gallery (LLNL only) or submitting them, via pull request, to the gallery section of the mfem/web repo.
    • If this is a major new feature, consider mentioning it in the short summary inside README (rare).
    • List major new classes in doc/CodeDocumentation.dox (rare).
  • Update this checklist, if the new pull request affects it.
  • Run make unittest to make sure all unit tests pass.
  • Run the tests in tests/scripts.
  • (LLNL only) After merging:
    • Update internal tests to include the new features.

@dylan-copeland dylan-copeland added enhancement WIP Work in Progress labels Oct 9, 2023
fem/dgmassinv.cpp Fixed Show resolved Hide resolved
fem/dgmassinv.cpp Fixed Show fixed Hide fixed
fem/integ/bilininteg_br2.cpp Fixed Show resolved Hide resolved
@samuelpmishLLNL
Copy link
Contributor

Since this PR seems to be introducing a huge ABI-breaking change, would you guys consider adopting inline namespaces (for single vs. double precision) to avoid creating ABI-incompatibility headaches for mfem's users? See, for example: https://www.youtube.com/watch?v=rUESOjhvLw0

Also, is it the case that hypre is vending libraries with symbols of the same name but different underlying floating point precisions? If so, this is worrying/harmful and needs to be addressed in hypre!

@@ -63,6 +63,7 @@ set(MFEM_USE_ALGOIM @MFEM_USE_ALGOIM@)
set(MFEM_USE_BENCHMARK @MFEM_USE_BENCHMARK@)
set(MFEM_USE_PARELAG @MFEM_USE_PARELAG@)
set(MFEM_USE_ENZYME @MFEM_USE_ENZYME@)
set(MFEM_USE_FLOAT @MFEM_USE_FLOAT@)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really means "MFEM_USE_SINGLE_PRECISION" right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to adjust this, but basically the idea in this branch is that we can replace all doubles with another type e.g. float

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it means to use float as the floating point type, see general/globals.hpp. We may also want to add options for long double and maybe even half precision for the most adventurous users. The changes are mostly general, so that any floating point type could be used, but a few places requires specially defined constants (e.g. tolerances for Newton solvers), so we may need different flags for different precision levels. I'm sure there will be opinions about names for these flags.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means the MFEM_USE_XXX macros are not the right way to provide this option. It's more convenient to provide a build macro like MFEM_FP_TYPE=XXX where the person who builds MFEM puts the type. This would save quite a bit of ifdefs in general/globals.hpp.

@tzanio
Copy link
Member

tzanio commented Nov 8, 2023

Since this PR seems to be introducing a huge ABI-breaking change, would you guys consider adopting inline namespaces (for single vs. double precision) to avoid creating ABI-incompatibility headaches for mfem's users? See, for example: https://www.youtube.com/watch?v=rUESOjhvLw0

This is an interesting suggestion, how do you suggest we use it exactly?

@tzanio
Copy link
Member

tzanio commented Nov 8, 2023

Also, is it the case that hypre is vending libraries with symbols of the same name but different underlying floating point precisions? If so, this is worrying/harmful and needs to be addressed in hypre!

We are essentially doing the same thing as hypre -- typedefing the type that was double before. I agree this is not ideal, but it is a restriction that we have no control over.

I personally am not aware of applications that link with different builds of hypre, and I will expect the same to be true for MFEM.

@dylan-copeland
Copy link
Member Author

dylan-copeland commented Nov 8, 2023

Since this PR seems to be introducing a huge ABI-breaking change, would you guys consider adopting inline namespaces (for single vs. double precision) to avoid creating ABI-incompatibility headaches for mfem's users? See, for example: https://www.youtube.com/watch?v=rUESOjhvLw0

@samuelpmishLLNL If I understood this correctly, for every class in MFEM (e.g. named A), we could add a few lines
#ifdef MFEM_USE_FLOAT
inline namespace mfem_float {
#else
inline namespace mfem_double {
#endif
class A {

I'm not sure if this is necessary for every class, or if this namespace could be used for the entire file. Or why not just use a normal namespace (without inline) for every file? Or is there a better way that I'm missing?

It may be sufficient to do this just for class Vector, since nothing in MFEM builds without Vector.

@samuelpmishLLNL
Copy link
Contributor

This is an interesting suggestion, how do you suggest we use it exactly?

If I understand correctly, rather than having everything live in the mfem namespace

namespace mfem {
  class Vector { ... };
  class Mesh { ... };
}

you would have anything with a different ABI for single/double live in a separate inline namespace, like

// in some header, maybe config.hpp
#if mfem_uses_single_precision
#define PRECISION float32
#else
#define PRECISION float64
#endif

///////////////

namespace mfem {
inline namespace PRECISION {
  class Vector { ... };
  class Mesh { ... };
}
}

That way, the underlying symbols have different names and prevent a code that expects double-mfem from accidentally calling into a float-mfem binary.


We are essentially doing the same thing as hypre -- typedefing the type that was double before. I agree this is not ideal, but it is a restriction that we have no control over.

I don't agree with the second statement: hypre and mfem are written in different languages, so they have different tools available to them.

hypre is mostly a C library, so if hypre wants to support different floating point precisions the only choices I know of are:

  1. define symbols with different names (e.g. DGESV vs SGESV)
  2. reuse the same symbol name to mean different things depending on some preprocessor macros.

If it is the case that hypre went with option 2, that seems like the wrong choice for a linear algebra library. By comparison, LAPACK doesn't force users to pick exactly one kind of floating point precision, why should hypre?

In contrast, mfem is a C++ library, so it has way more options to choose from for managing different precisions (function overloads, namespaces, templates, ...). Choosing to use the C-language mechanisms is an option but not a requirement.

Also, if you feel that hypre is forcing you to write code you believe hurts mfem's usability then let's talk with the hypre developers about it, and see if we can address the underlying cause.


I personally am not aware of applications that link with different builds of hypre, and I will expect the same to be true for MFEM.

If I'm understanding things correctly, the causality is backwards here: applications don't link against multiple builds of hypre because hypre's typedef approach here prevents them from doing so.

I would ask that you give mfem's users the benefit of the doubt when it comes to supporting mixed precision. There are a lot of interesting research opportunities in mixed precision, and it's especially relevant for GPUs since cheap consumer hardware has single-precision performance that rivals the even the fanciest flagship GPUs:

NVIDIA RTX 4080 ($1000): 50 TFLOPS FP32
NVIDIA H100 ($30000): 50 TFLOPS FP32


why not just use a normal namespace (without inline) for every file? Or is there a better way that I'm missing?

Inline namespaces don't have to be explicitly referenced by the user, so people could still write mfem::Vector and it would map to mfem::float32::Vector (for example). Using a normal namespace would require users to rewrite their code to explicitly include the ::float32:: or ::float64:: part, which would be annoying.


It may be sufficient to do this just for class Vector, since nothing in MFEM builds without Vector.

I would bet that it's not sufficient to apply the inline namespaces to only mfem::Vector, but I would like to prototype it to make sure.

@dylan-copeland
Copy link
Member Author

Is there a way to disable the branch history check? It fails because too many files were committed.

@v-dobrev
Copy link
Member

Is there a way to disable the branch history check? It fails because too many files were committed.

You can temporarily edit this:

# Maximum number of acceptable files changed in any commit in the branch
my $commit_max_files_changed = 50;

However, just so we don't forget to revert it before merging, create a TODO checkbox for it in the first comment above.

v-dobrev added a commit that referenced this pull request Mar 24, 2024
@v-dobrev
Copy link
Member

Re-merged in next for testing...

Fix warnings about RAND_MAX when using single precision.

Introduce an inline function `real_t rand_real()` that returns a
random number in the interval [0,1) using rand(). This function
handles better the case of single precision where the expression
`real_t(rand())/(real_t(RAND_MAX)+1)` can return 1.0f due to round-off
when rand() returns a number close to RAND_MAX.

Use `rand_real()` in a few places that before used code similar to
`real_t(rand())/(real_t(RAND_MAX)+1)`.
v-dobrev added a commit that referenced this pull request Mar 25, 2024
@v-dobrev
Copy link
Member

Re-merged in next for testing...

Copy link
Member

@tzanio tzanio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your hard work on this @dylan-copeland and @v-dobrev !

@adam-sim-dev
Copy link
Contributor

My build with MUMPS now fails. I have a question. Should -lsmumps or -ldmumps be before -lmumps_common -lpord?

@adam-sim-dev
Copy link
Contributor

FindMUMPS.cmake needs update too.

@v-dobrev
Copy link
Member

My build with MUMPS now fails. I have a question. Should -lsmumps or -ldmumps be before -lmumps_common -lpord?

Oops, I forgot to update this part:

mfem/config/defaults.mk

Lines 323 to 327 in bcdf7cc

ifeq ($(MFEM_USE_SINGLE),YES)
MUMPS_LIB += -lsmumps
else
MUMPS_LIB += -ldmumps
endif

Try replacing the if with

ifneq ($(filter single Single SINGLE,$(MFEM_PRECISION)),)

similar to here:

mfem/makefile

Line 212 in bcdf7cc

else ifneq ($(filter single Single SINGLE,$(MFEM_PRECISION)),)

Another option will be to move the logic for "MFEM_PRECISION -> MFEM_USE_SINGLE, MFEM_USE_DOUBLE" from the top level makefile to config/defaults.mk.

FindMUMPS.cmake needs update too.

You are right. If using double precision, it should still work though.

@najlkin
Copy link
Contributor

najlkin commented Apr 29, 2024

CI build and test is done in #4262 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet