Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

corePDEs_SideSetLaplacian_3D failing on weaver after epetra removal #1030

Open
jewatkins opened this issue Mar 11, 2024 · 6 comments
Open

corePDEs_SideSetLaplacian_3D failing on weaver after epetra removal #1030

jewatkins opened this issue Mar 11, 2024 · 6 comments
Assignees

Comments

@jewatkins
Copy link
Collaborator

https://sems-cdash-son.sandia.gov/cdash/test/4156291

It looks like weaver has more tests after #1028 was merged (111 -> 125) so maybe this case never ran before?

@mperego Is this a case that ran with epetra and never ran with tpetra? Were you planning to look into it? Anything obvious that shouldn't work on device? If not, @mcarlson801 can try to see what's going on.

@mperego
Copy link
Collaborator

mperego commented Mar 11, 2024

@jewatkins. That's correct, this test was running with Epetra and never run with Tpetra. I briefly looked into it but I could not figure out the issue. If @mcarlson801 is willing to look into it, it would be great.

@ikalash
Copy link
Collaborator

ikalash commented Jul 2, 2024

This test is failing now in the OpenMP build as well: https://sems-cdash-son.sandia.gov/cdash//test/5973004 . It looks like the comparisons are failing as the response value computed is 0. The Cuda build is failing in the same way: https://sems-cdash-son.sandia.gov/cdash//test/5969262

@mcarlson801
Copy link
Collaborator

I did a little bit of debugging on this and apparently the field (x_data) coming into GatherSolution and GatherSolution_Side is 0 everywhere. I'm not super familiar with how this problem is set up so I'm not sure what might cause this. @bartgol It sounds like you might be familiar with this test, any ideas how this could happen?

@bartgol
Copy link
Collaborator

bartgol commented Jul 3, 2024

That's odd. It should only happen at the first iteration, where the initial guess is 0. The NaN in the solver is likely preventing the solution from ever change, so it stays 0. I'm guessing the problem is that some entry of the Jac that should be 1 is actually kept at 0. Maybe something is amiss with the diagonal terms. I can dig quickly.

@bartgol
Copy link
Collaborator

bartgol commented Jul 3, 2024

It looks like this test passed in the OpenMP build last night. I just ran master on my workstation, and it works just fine. I don't see any relevant commit that went in yesterday, so I don't know what to make of this. As of now, CUDA is the only build that still shows the error.

The fact that cuda consistently fails may suggest an issue with row/col gids for the side equation. Perhaps the same diagonal entry is set twice (once to 0 and the other time to 1), and depending on the order in which the two happen it can end up with the right or wrong value. I don't have time to do an in-depth debug today, and I leave for vacation on Friday, so feel free to disable the test until I get back if you feel like it. When I get back, I can debug some more.

@ikalash
Copy link
Collaborator

ikalash commented Jul 3, 2024

@bartgol : thanks for looking into this. If you check the history of the test in the camobap OpenMP build, it seems it fails for awhile, then passes for awhile: https://sems-cdash-son.sandia.gov/cdash//test/5973004?graph=status . This suggests a heisenbug, which is disturbing. I would suspect the openmp issue is the same as the cuda one, so if the coda one is fixed, hopefully things will be good for openmp as well.

I'm fine with either keeping or disabling the test. I will make it a point not to open any more duplicate issues about this test :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants