Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shift functionality for cfl grid related time step restrictions to directory for spatial operator #526

Merged
merged 1 commit into from
Aug 1, 2023

Conversation

nfehn
Copy link
Member

@nfehn nfehn commented Jul 27, 2023

I am not sure if all the code in the new file (which I have just copied) is correct for simplicial elements. @kronbichler @peterrum could you please have a look at this and let me in case something has to be changed for simplex. Then, I will address this in the other PR on simplex elements (PR #525), once the present PR has been merged.

…p restrictions to directory for spatial operator
Copy link
Contributor

@kronbichler kronbichler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all code uses either cell->center() or MatrixFree + CellIntegrator, which should be agnostic of the underlying element type, this looks fine. If we want to support mixed meshes, we will obviously need to augment some code, because then we have different CellIntegrator objects on different cells, but that should be it.

When looking through the code, I observed some stylistic issues in terms of HPC programming practices that we could change, but it all those places have pre-existed and I merely noted them now. I can open a PR elsewhere if you do not want them now.

double const exponent_fe_degree = 3.0)
{
double const time_step =
pow(global_min_cell_diameter, 2.0) / pow(fe_degree, exponent_fe_degree) / diffusivity;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though this is not in the critical path, we should use an efficient coding style that tries to avoid non-integer powers if possible,

Suggested change
pow(global_min_cell_diameter, 2.0) / pow(fe_degree, exponent_fe_degree) / diffusivity;
std::pow(global_min_cell_diameter, 2) / std::pow(fe_degree, exponent_fe_degree) / diffusivity;

if(cell->is_locally_owned())
{
// calculate maximum velocity
dealii::Point<dim> point = cell->center();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering #524, shouldn't we use a mapping here? It won't probably change much, but it might help the consistency in the code base.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

Comment on lines +101 to +103
U = vel.norm();
if(U > max_U)
max_U = U;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This leads to inefficient code. Albeit it won't matter here, it is better to consistently use

Suggested change
U = vel.norm();
if(U > max_U)
max_U = U;
max_U = std::max(max_U, vel.norm());

The reason is that max typically maps to single-instruction operations available in the instruction set (maxsd and related functions in the case of x86-64), whereas if leads to a conditional branch. If we would like to be even more efficient, we could compute the maximum for vel.norm_square() and only compute the the square root in the end once we return from the function.

double const exponent_fe_degree = 2.0)
{
double const time_step =
global_min_cell_diameter / pow(fe_degree, exponent_fe_degree) / max_velocity;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
global_min_cell_diameter / pow(fe_degree, exponent_fe_degree) / max_velocity;
global_min_cell_diameter / (std::pow(fe_degree, exponent_fe_degree) * max_velocity);


// loop over vectorized array
double dt = std::numeric_limits<double>::max();
for(unsigned int v = 0; v < dealii::VectorizedArray<value_type>::size(); ++v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we discussed this elsewhere, but it would be safer to compute this as

Suggested change
for(unsigned int v = 0; v < dealii::VectorizedArray<value_type>::size(); ++v)
for(unsigned int v = 0; v < data.n_active_entries_per_cell_batch(cell); ++v)

because then we do not need to depend on the fact that delta_t_cell is valid.


double new_time_step = std::numeric_limits<double>::max();

double const cfl_p = 1.0 / pow(degree, exponent_fe_degree);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious: Do we import namespace std somewhere, or why is pow without std:: well-defined? I think we should try to be consistent here and elsewhere and use std::pow.

Comment on lines +248 to +256
if(cfl_condition_type == CFLConditionType::VelocityNorm)
{
delta_t_cell = std::min(delta_t_cell, cfl_p / ut_xi.norm());
}
else if(cfl_condition_type == CFLConditionType::VelocityComponents)
{
for(unsigned int d = 0; d < dim; ++d)
delta_t_cell = std::min(delta_t_cell, cfl_p / (std::abs(ut_xi[d])));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now again performance is not important here, but since divisions and square roots are expensive and everything else except ut_xi is constant across all quadrature point, I wonder if it would be better style to compute the maximum of the velocity (either norm_square for the directed one or std::abs(ut_xi[d])) in the loop over q and perform the division and square root outside.

Comment on lines +281 to +282
// at a slightly different time in the next time step and so on). This way, it can be ensured
// that the sequence of time step sizes is exactly reproducible with the results being
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this comment is pre-existing, but it is not ensured that the sequence of time steps is exactly reproducible: Within the gap between the truncation and roundoff, we still have the probability of making the wrong decision, as we saw in a failing test recently.

Suggested change
// at a slightly different time in the next time step and so on). This way, it can be ensured
// that the sequence of time step sizes is exactly reproducible with the results being
// at a slightly different time in the next time step and so on). This way, we get a
// high probability that the sequence of time step sizes is

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also agree here. One could even ask whether we should truncate at all.

invJ = transpose(invJ);
ut_xi = invJ * u_x;

u_va = std::max(u_va, ut_xi.norm());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, I would prefer to use

Suggested change
u_va = std::max(u_va, ut_xi.norm());
u_va = std::max(u_va, ut_xi.norm_square());

and take the square root in line 341.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not so sure about this type of change, since it probably makes the code harder to read/understand. The question is whether this change is really that relevant regarding the overall performance. If we make the change, the variable names should be changed to _square as well.

@nfehn
Copy link
Member Author

nfehn commented Aug 1, 2023

I would suggest to merge the present PR and then address the open issues in a separate PR. I will create an issue for this.

@nfehn nfehn merged commit cc3c14a into exadg:master Aug 1, 2023
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants