Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7.2.0: 57 tests fail #752

Closed
yurivict opened this issue Mar 15, 2023 · 14 comments
Closed

7.2.0: 57 tests fail #752

yurivict opened this issue Mar 15, 2023 · 14 comments
Labels

Comments

@yurivict
Copy link
Contributor

Describe the bug
See the log.

Most failures are in the last digits, but there are real failures too.

Describe settings used
Regular build

Expected behavior
All tests should pass.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
FreeBSD 13.1

@edoapra
Copy link
Collaborator

edoapra commented Mar 15, 2023

The real number of failed tests for numerical reasons appear to be 11 (10 to be precise if subtract the oh2 test that verifies if failures are detected). This number is acceptable and, if you look at the actual failures, the errors are all on the last significant digit.

grep -i 'verifying output ... failed' nwchem-7.2.0-test.log|wc -l

A couple of tests fail with NWChem execution failed and they seem to be due incorrect installation/configuration (e.g the MD k6h2o test)

If think the bulk of failures comes from the fact that mpirun is not correctly set up. You can see a plethora of this repeated error messages at bottom of the log file

setenv MPIRUN_PATH /home/guido/bagheria/bin/mpirun
 Please make sure you have the right mpirun for your system.
 Alternatively set the number of processors to 0.

Unless you configure your setup to pick up correctly the mpirun location, these QA tests are not properly executed.
If you want to go for a quick solution that will not run the tests in parallel, you can follow the suggestion of the error message above, and execute
doqmtests.mpi 0

@yurivict
Copy link
Contributor Author

This looks like a real numeric failure:

@@ -1,4 +1,4 @@
 Effective nuclear repulsion energy (a.u.) 9.20
-Total SCF energy = -67.01054
+Total SCF energy = -76.01054

@edoapra
Copy link
Collaborator

edoapra commented Mar 15, 2023

This looks like a real numeric failure:

@@ -1,4 +1,4 @@
 Effective nuclear repulsion energy (a.u.) 9.20
-Total SCF energy = -67.01054
+Total SCF energy = -76.01054

Yes, it is. The point of the oh2 is to test if the QA mechanism can detect failures.
The QA reference output (as your can see from the diff result above) has the incorrect energy value of -67... instead of -76...
https://github.com/nwchemgit/nwchem/tree/master/QA/tests/oh2#readme

this is the same as the ho2 test case but it is designed to fail
since the good output has -67.  instead of -76. This allows checking
of the scripts as well.

@yurivict
Copy link
Contributor Author

The print output should say "ExpectedFail" instead of "Failure".

@edoapra
Copy link
Collaborator

edoapra commented Mar 15, 2023

Could you open a pull request with your suggested changes?

@jbgoette
Copy link

jbgoette commented Apr 5, 2023

I tested NWChem both running on a single processor and using mpirun with multiple processors. The single processor fast run from the QA directory gives me 10 fails, but if I run it on multiple processors the count increases to 36 with several segmentation faults reported.

doqmtests_mpi4.log

@jeffhammond
Copy link
Collaborator

What is the complete build config? Fortran compiler and MPI version are important.

@jbgoette
Copy link

jbgoette commented Apr 6, 2023

Apologies for the delay. I installed NWchem via the FreeBSD port, configured it with mpich, and in my understanding gfortran is used.

@yurivict
Copy link
Contributor Author

yurivict commented Apr 6, 2023

Yes, gfortran is used.

@jeffhammond
Copy link
Collaborator

I will try to install FreeBSD in Virtualbox on my Apple M2 and debug this. However, you should build NWChem from source and verify that the errors are the same, because this binary looks rather messed up compared to what I see elsewhere.

@yurivict
Copy link
Contributor Author

yurivict commented Apr 7, 2023

Why is it messed up?

@jeffhammond
Copy link
Collaborator

it segfaults and gives the wrong numerical answers. these results are different from what NWChem produces on Linux machines with gfortran and MPICH. therefore, it is reasonable to conclude that the problem is FreeBSD-related.

@jbgoette
Copy link

jbgoette commented Apr 7, 2023

you should build NWChem from source and verify that the errors are the same

It was build from source via the FreeBSD port, but I will try to build it manually.

@edoapra
Copy link
Collaborator

edoapra commented Apr 19, 2023

This seems a repeat of #463 to me.
I have built NWChem using the hotfix/release-7-2-0 branch and parallel runs do work using openmpi3 or openmpi while fail when using mpich.
Build was on FreeBSD 13.2/arm64 using it Qemu.

@edoapra edoapra closed this as completed Jul 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants