Skip to content

Simplify and inline more FE shape computations #4274

@GiudGiud

Description

@GiudGiud

The 2D and 3D shape computations are massive switch statements on the element type and the order.
In that form it likely does not make sense to inline them for (potential) perf gainz

But we could switch (or duplicate during a transition period at least) the signature from

template <FEFamily T>
Real fe_lagrange_2D_shape(const ElemType type,
                          const Elem * elem,
                          const Order order,
                          const unsigned int i,
                          const Point & p)

to

template <FEFamily T, Order order, ElemType type>
inline
Real fe_2D_shape(const Elem * elem,
              const unsigned int i,
              const Point & p)

and get rid of most the switch statements. I don't think it would be much more code overall.
If we templated on i as well, we could get rid of all the switches. Potentially cool for vectorizing and GPUs

The potential gains downstream in MOOSE are about 1-2% in 2D (for now, other things are going down so 1-2% might go up). Possibly more in 3D, and potentially less with higher order

      flat  flat%   sum%        cum   cum%
     9.12s 22.57% 22.57%      9.35s 23.14%  tcmalloc::CentralFreeList::Populate
     1.87s  4.63% 27.20%      2.24s  5.54%  MatSetValues_SeqAIJ
     1.69s  4.18% 31.38%     39.78s 98.44%  [libsasl2.2.dylib]
     1.61s  3.98% 35.36%      2.56s  6.34%  MooseMesh::cacheInfo
     1.42s  3.51% 38.88%      2.75s  6.81%  libMesh::BoundaryInfo::boundary_ids
     1.07s  2.65% 41.52%      1.07s  2.65%  libMesh::fe_lagrange_1D_linear_shape
     0.99s  2.45% 43.97%      0.99s  2.45%  hypre_BoomerAMGRelaxHybridGaussSeidel_core
     0.91s  2.25% 46.23%      0.97s  2.40%  libMesh::Elem::which_child_am_i
     0.85s  2.10% 48.33%      1.48s  3.66%  MooseVariableData::computeValuesInternal
     0.61s  1.51% 49.84%      0.61s  1.51%  libMesh::FEMap::compute_single_point_map
     0.60s  1.48% 51.32%      0.60s  1.48%  libMesh::H1FETransformation::map_dphi
     0.55s  1.36% 52.68%      0.60s  1.48%  libMesh::Elem::contains_vertex_of
     0.54s  1.34% 54.02%      1.61s  3.98%  (anonymous namespace)::fe_lagrange_2D_shape
     0.54s  1.34% 55.36%      0.99s  2.45%  Kernel::computeResidual
     0.47s  1.16% 56.52%      0.47s  1.16%  libMesh::Face::dim
     0.46s  1.14% 57.66%      0.46s  1.14%  MooseMesh::getNodeBlockIds
     0.40s  0.99% 58.65%      1.14s  2.82%  libMesh::FEMap::compute_affine_map

3.98% = 2.65% from 1D shape calc and 1.34% from the 2D, which is the switch statement (there's nothing else there)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions