-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shift functionality for cfl grid related time step restrictions to directory for spatial operator #526
Conversation
…p restrictions to directory for spatial operator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since all code uses either cell->center()
or MatrixFree
+ CellIntegrator
, which should be agnostic of the underlying element type, this looks fine. If we want to support mixed meshes, we will obviously need to augment some code, because then we have different CellIntegrator
objects on different cells, but that should be it.
When looking through the code, I observed some stylistic issues in terms of HPC programming practices that we could change, but it all those places have pre-existed and I merely noted them now. I can open a PR elsewhere if you do not want them now.
double const exponent_fe_degree = 3.0) | ||
{ | ||
double const time_step = | ||
pow(global_min_cell_diameter, 2.0) / pow(fe_degree, exponent_fe_degree) / diffusivity; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though this is not in the critical path, we should use an efficient coding style that tries to avoid non-integer powers if possible,
pow(global_min_cell_diameter, 2.0) / pow(fe_degree, exponent_fe_degree) / diffusivity; | |
std::pow(global_min_cell_diameter, 2) / std::pow(fe_degree, exponent_fe_degree) / diffusivity; |
if(cell->is_locally_owned()) | ||
{ | ||
// calculate maximum velocity | ||
dealii::Point<dim> point = cell->center(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering #524, shouldn't we use a mapping here? It won't probably change much, but it might help the consistency in the code base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree
U = vel.norm(); | ||
if(U > max_U) | ||
max_U = U; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This leads to inefficient code. Albeit it won't matter here, it is better to consistently use
U = vel.norm(); | |
if(U > max_U) | |
max_U = U; | |
max_U = std::max(max_U, vel.norm()); |
The reason is that max
typically maps to single-instruction operations available in the instruction set (maxsd
and related functions in the case of x86-64), whereas if
leads to a conditional branch. If we would like to be even more efficient, we could compute the maximum for vel.norm_square()
and only compute the the square root in the end once we return from the function.
double const exponent_fe_degree = 2.0) | ||
{ | ||
double const time_step = | ||
global_min_cell_diameter / pow(fe_degree, exponent_fe_degree) / max_velocity; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
global_min_cell_diameter / pow(fe_degree, exponent_fe_degree) / max_velocity; | |
global_min_cell_diameter / (std::pow(fe_degree, exponent_fe_degree) * max_velocity); |
|
||
// loop over vectorized array | ||
double dt = std::numeric_limits<double>::max(); | ||
for(unsigned int v = 0; v < dealii::VectorizedArray<value_type>::size(); ++v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we discussed this elsewhere, but it would be safer to compute this as
for(unsigned int v = 0; v < dealii::VectorizedArray<value_type>::size(); ++v) | |
for(unsigned int v = 0; v < data.n_active_entries_per_cell_batch(cell); ++v) |
because then we do not need to depend on the fact that delta_t_cell
is valid.
|
||
double new_time_step = std::numeric_limits<double>::max(); | ||
|
||
double const cfl_p = 1.0 / pow(degree, exponent_fe_degree); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious: Do we import namespace std
somewhere, or why is pow
without std::
well-defined? I think we should try to be consistent here and elsewhere and use std::pow
.
if(cfl_condition_type == CFLConditionType::VelocityNorm) | ||
{ | ||
delta_t_cell = std::min(delta_t_cell, cfl_p / ut_xi.norm()); | ||
} | ||
else if(cfl_condition_type == CFLConditionType::VelocityComponents) | ||
{ | ||
for(unsigned int d = 0; d < dim; ++d) | ||
delta_t_cell = std::min(delta_t_cell, cfl_p / (std::abs(ut_xi[d]))); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now again performance is not important here, but since divisions and square roots are expensive and everything else except ut_xi
is constant across all quadrature point, I wonder if it would be better style to compute the maximum of the velocity (either norm_square
for the directed one or std::abs(ut_xi[d])
) in the loop over q
and perform the division and square root outside.
// at a slightly different time in the next time step and so on). This way, it can be ensured | ||
// that the sequence of time step sizes is exactly reproducible with the results being |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this comment is pre-existing, but it is not ensured that the sequence of time steps is exactly reproducible: Within the gap between the truncation and roundoff, we still have the probability of making the wrong decision, as we saw in a failing test recently.
// at a slightly different time in the next time step and so on). This way, it can be ensured | |
// that the sequence of time step sizes is exactly reproducible with the results being | |
// at a slightly different time in the next time step and so on). This way, we get a | |
// high probability that the sequence of time step sizes is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also agree here. One could even ask whether we should truncate at all.
invJ = transpose(invJ); | ||
ut_xi = invJ * u_x; | ||
|
||
u_va = std::max(u_va, ut_xi.norm()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, I would prefer to use
u_va = std::max(u_va, ut_xi.norm()); | |
u_va = std::max(u_va, ut_xi.norm_square()); |
and take the square root in line 341.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not so sure about this type of change, since it probably makes the code harder to read/understand. The question is whether this change is really that relevant regarding the overall performance. If we make the change, the variable names should be changed to _square
as well.
I would suggest to merge the present PR and then address the open issues in a separate PR. I will create an issue for this. |
I am not sure if all the code in the new file (which I have just copied) is correct for simplicial elements. @kronbichler @peterrum could you please have a look at this and let me in case something has to be changed for simplex. Then, I will address this in the other PR on simplex elements (PR #525), once the present PR has been merged.