Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copying spectral radii back to main memory after blockette routines … #37

Merged
merged 2 commits into from
Apr 24, 2020
Merged

Conversation

anilyil
Copy link
Contributor

@anilyil anilyil commented Apr 17, 2020

... if updateDt is true. Without this change, reverse mode AD routines use outdated spectral radii values, which results in inaccurate sensitivities.

Purpose

This PR addresses issues #32 and #36.

With this change, the spectral radii values are copied back to main memory from cached memory after a blockette residual computation that also updates the time step. This copy is done inside this if check because we do not need to copy these values for matrix-free matrix-vector products; however, we want to copy them after every ANK and NK step.

The extra copy comes with additional cost due to the increase memory access. This extra cost will only be present with blockette calls that also update the time step, which is not required for matrix-free operations.

I have ran tests to measure the performance difference caused by this change on one residual evaluation. The test is ran on Stampede2, one skylake node, compiled with AVX2 instruction set, 48 processors, blockette size 8, test block sizes of 32^3 and 48^3 per processor. I provide a few results:

Base speed: Millions of cells processed per one processor in one second with default residual routines that operate on main arrays.
Blockette speed: Same speed metric, but with cache-blocked residual routines.
Speedup: Blockette speed / base speed

The timing results:

Old results before this change:

32^3:

Base speed: 0.676747811395
Blockette speed: 1.24815149274
Speedup: 1.844337686393324

48^3:

Base speed: 0.629510355349
Blockette speed: 1.29542792502
Speedup: 2.057834178600312

New results with this change:

32^3:

Base speed: 0.676336855482
Blockette speed: 1.21488916591
Speedup: 1.7962782244717301

48^3:

Base speed: 0.627697421135
Blockette speed: 1.25598801018
Speedup: 2.00094499019755

The addition of this extra memory access causes the speedup to decrease from 1.84 to 1.80 and from 2.06 to 2.00 for test blocks of size 32^3 and 48^3, respectively. This is a very minor decrease in performance. Furthermore, this "slowdown" will only be happening with calls that update the time step, and the main computationally intensive calls (mat-free operations and preconditioner computations) do not update this time step. Therefore, the overall effect of this change on performance will be negligible.

Type of change

  • Bugfix (non-breaking change which fixes an issue)

Testing

Regressions pass. Furthermore, the test in #32 also passes.

Checklist

  • I have run unit and regression tests which pass locally with my changes

…f updateDt is true. Without this change, reverse mode AD routines use outdated spectral radii values, which results in inaccurate sensitivities.
@anilyil anilyil requested a review from a team as a code owner April 17, 2020 21:53
@joanibal
Copy link
Collaborator

I'm going to remove myself as a reviewer because we have already discussed this issue in depth.

I approve of the changes, but don't want my review to count towards the two required

@joanibal joanibal removed their request for review April 17, 2020 22:04
@anilyil
Copy link
Contributor Author

anilyil commented Apr 18, 2020

I was wrong; the bug would not cause adjoint sensitivities to be inaccurate. The intermediate arrays are updated in the call to master in: https://github.com/mdolab/adflow/blob/master/src/adjoint/adjointUtils.F90#L592 The bug would still effect issue #32.

@joanibal
Copy link
Collaborator

Oops, I thought it wouldn't count it since I removed myself as a reviewer.
(I approved so pull panda would stop reminding me)

@Xiaosong2105 Xiaosong2105 merged commit 3ec3273 into mdolab:master Apr 24, 2020
marcomangano pushed a commit to marcomangano/adflow that referenced this pull request Aug 19, 2020
…f updateDt is true. Without this change, reverse mode AD routines use outdated spectral radii values, which results in inaccurate sensitivities. (mdolab#37)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants