Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated test script and GitHub actions #96

Closed
wants to merge 22 commits into from
Closed

Conversation

Madu86
Copy link
Collaborator

@Madu86 Madu86 commented Feb 28, 2021

I have implemented a dacdif based runtest script. This should fix issues #94 and #66. I have also updated the GitHub action files with necessary environment variables and creating artifacts.

@vwcruzeiro
Copy link
Collaborator

Hello @Madu86 . How are users supposed to run the tests now?

fi

# Check the accuracy
accuracy='4.0e-3'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a dangerously large threshold. Energies should probably agree up to 1.0e-6 Hartree and gradients to 1.0e-4 Hartree/Angstrom.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different thresholds should also be used for dipole moments, Mulliken charges, and other molecular properties.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a dangerously large threshold. Energies should probably agree up to 1.0e-6 Hartree and gradients to 1.0e-4 Hartree/Angstrom.

@agoetz @vwcruzeiro How come this threshold is ok with Run.tests.amber then?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really depends on the test. Most tests are supposed to test one thing - e.g. the printed energies (in kcal/mol) in the Amber output file along a trajectory. In other cases this is not possible. For instance, to test my DFTB3 implementation in AmberTools/test/sqm/dftb3 I test differently for single point calculations and geometry optimizations, and grep out stuff that is too sensitive from the geometry optimizations. That is far from ideal but the best I was able to do within the Amber testing framework and my limited time.

set SQM = $AMBERHOME/bin/sqm
set SPDIFF = ../../dacdif
set GODIFF = "../../dacdif -a 0.02"
set SPTESTS = (h2o.sp cysdip.sp lignin.sp)
set GOTESTS = (h2o.go cysdip.go)

# single point tests
foreach i ($SPTESTS)
    set input = $i.in
    set output = $i.out
    $SQM -O -i $input -o $output || goto error
    $SPDIFF $output.save $output
end

# geometry optimization tests
# do not check the electronic energy and
# core repulsion energy, they are too sensitive!
# Also, there may be a different number of optimization steps
# on different machines, hence remove xmin output
# Also, do not check the final geometry since small numerical
# differences on different platforms can result in different
# Cartesian coordinates although the internal coordinates are OK
foreach i ($GOTESTS)
    set input = $i.in
    set output = $i.out
    $SQM -O -i $input -o $output || goto error
    grep -v 'Electronic energy' $output > tmp
    grep -v 'Core-core repulsion' tmp > tmp2
    grep -v 'DIPOLE' tmp2 > tmp
    grep -v 'xmin' tmp > tmp2
    sed -e "/Final Structure/,/Calculation Completed/d" tmp2 > $output
    rm tmp tmp2
    $GODIFF $output.save $output
end

exit(0)

error:
echo "  ${0}:  Program error"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an example of HF based QM/MM MD with Orca (amber/test/qmmm_EXTERN/QMMM_MD_Orca/Run.aladip.hf_sto-3g)

../../dacdif -t 1 $output.save $output
../../dacdif -t 3 $restrt.save $restrt

This was after testing numerical differences on different platforms. I truncate the last digit in the Amber mdout file, leaving energies up to 0.001 kcal/mol ~= 1.e-6 Hartree. And truncate Cartesian coordinates to leave an accuracy of 1.e-4 Angstrom.

Copy link
Collaborator

@agoetz agoetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice changes. But we need to address the major drawback of dacdif that I have always been complaining about: It employs a single numerical threshold for the entire output file while acceptable thresholds depend on the property and its units. I.e. we need different thresholds for (total) energies, gradients, (Mulliken) charges, dipole moments, etc.

The only way I can see how this can be done is to have different saved / output files for each property type. E.g. test-foo.energies, test-foo.gradients, test-foo.dipole, test-foo.popan etc. Alternatively a modified differ that applies different thresholds in different regions of the output file, with regions identified by regular expressions (ADF does something like that).

@agoetz
Copy link
Collaborator

agoetz commented Feb 28, 2021

Hello @Madu86 . How are users supposed to run the tests now?

After installing run make test or in the installation directory call runtest.

@vwcruzeiro
Copy link
Collaborator

@Madu86 , the goal in Run.tests.amber is to evaluate the output file as a whole. This is why a large threshold needs to be used in dacdif. This check should help us pick up major bugs (which is probably enough for Amber), but the main test suite in Quick should check energies and forces with a smaller threshold. My suggestion for you is the following: do exactly the same as I do in Run.tests.amber, but also test the total energy and gradients using different files and with a smaller threshold in dacdif. Perhaps we should modify Run.tests.amber to do that so we have a single test suite for Quick and Amber. I can help you with that. Just let me know.

@Madu86
Copy link
Collaborator Author

Madu86 commented Mar 1, 2021

@agoetz @vwcruzeiro I have improved the new test script based on your suggestions. We are now able to use different cutoffs for different tasks: https://github.com/Madu86/QUICK/blob/issue66/tools/runtest#L170-L217. Please feel free to play with them. I have also incorporated capability to run different types of tests: https://github.com/Madu86/QUICK/blob/issue66/tools/runtest#L79-L81. For eg. if you want to run only serial geometry optimization tests, you can do ./runtest --serial --opt.

@Madu86
Copy link
Collaborator Author

Madu86 commented Mar 1, 2021

the goal in Run.tests.amber is to evaluate the output file as a whole. This is why a large threshold needs to be used in dacdif. This check should help us pick up major bugs (which is probably enough for Amber), but the main test suite in Quick should check energies and forces with a smaller threshold. My suggestion for you is the following: do exactly the same as I do in Run.tests.amber, but also test the total energy and gradients using different files and with a smaller threshold in dacdif. Perhaps we should modify Run.tests.amber to do that so we have a single test suite for Quick and Amber. I can help you with that. Just let me know.

@vwcruzeiro I see. Thanks. Please check the latest version and let me know your suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

QUICK tests fail for some GNU and Intel compiler versions Use Run.tests.amber in GitHub checks
3 participants