-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated test script and GitHub actions #96
Conversation
Hello @Madu86 . How are users supposed to run the tests now? |
fi | ||
|
||
# Check the accuracy | ||
accuracy='4.0e-3' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a dangerously large threshold. Energies should probably agree up to 1.0e-6 Hartree and gradients to 1.0e-4 Hartree/Angstrom.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Different thresholds should also be used for dipole moments, Mulliken charges, and other molecular properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a dangerously large threshold. Energies should probably agree up to 1.0e-6 Hartree and gradients to 1.0e-4 Hartree/Angstrom.
@agoetz @vwcruzeiro How come this threshold is ok with Run.tests.amber then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It really depends on the test. Most tests are supposed to test one thing - e.g. the printed energies (in kcal/mol) in the Amber output file along a trajectory. In other cases this is not possible. For instance, to test my DFTB3 implementation in AmberTools/test/sqm/dftb3
I test differently for single point calculations and geometry optimizations, and grep out stuff that is too sensitive from the geometry optimizations. That is far from ideal but the best I was able to do within the Amber testing framework and my limited time.
set SQM = $AMBERHOME/bin/sqm
set SPDIFF = ../../dacdif
set GODIFF = "../../dacdif -a 0.02"
set SPTESTS = (h2o.sp cysdip.sp lignin.sp)
set GOTESTS = (h2o.go cysdip.go)
# single point tests
foreach i ($SPTESTS)
set input = $i.in
set output = $i.out
$SQM -O -i $input -o $output || goto error
$SPDIFF $output.save $output
end
# geometry optimization tests
# do not check the electronic energy and
# core repulsion energy, they are too sensitive!
# Also, there may be a different number of optimization steps
# on different machines, hence remove xmin output
# Also, do not check the final geometry since small numerical
# differences on different platforms can result in different
# Cartesian coordinates although the internal coordinates are OK
foreach i ($GOTESTS)
set input = $i.in
set output = $i.out
$SQM -O -i $input -o $output || goto error
grep -v 'Electronic energy' $output > tmp
grep -v 'Core-core repulsion' tmp > tmp2
grep -v 'DIPOLE' tmp2 > tmp
grep -v 'xmin' tmp > tmp2
sed -e "/Final Structure/,/Calculation Completed/d" tmp2 > $output
rm tmp tmp2
$GODIFF $output.save $output
end
exit(0)
error:
echo " ${0}: Program error"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is an example of HF based QM/MM MD with Orca (amber/test/qmmm_EXTERN/QMMM_MD_Orca/Run.aladip.hf_sto-3g
)
../../dacdif -t 1 $output.save $output
../../dacdif -t 3 $restrt.save $restrt
This was after testing numerical differences on different platforms. I truncate the last digit in the Amber mdout file, leaving energies up to 0.001 kcal/mol ~= 1.e-6 Hartree. And truncate Cartesian coordinates to leave an accuracy of 1.e-4 Angstrom.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice changes. But we need to address the major drawback of dacdif that I have always been complaining about: It employs a single numerical threshold for the entire output file while acceptable thresholds depend on the property and its units. I.e. we need different thresholds for (total) energies, gradients, (Mulliken) charges, dipole moments, etc.
The only way I can see how this can be done is to have different saved / output files for each property type. E.g. test-foo.energies, test-foo.gradients, test-foo.dipole, test-foo.popan etc. Alternatively a modified differ that applies different thresholds in different regions of the output file, with regions identified by regular expressions (ADF does something like that).
After installing run |
@Madu86 , the goal in |
@agoetz @vwcruzeiro I have improved the new test script based on your suggestions. We are now able to use different cutoffs for different tasks: https://github.com/Madu86/QUICK/blob/issue66/tools/runtest#L170-L217. Please feel free to play with them. I have also incorporated capability to run different types of tests: https://github.com/Madu86/QUICK/blob/issue66/tools/runtest#L79-L81. For eg. if you want to run only serial geometry optimization tests, you can do |
@vwcruzeiro I see. Thanks. Please check the latest version and let me know your suggestions. |
I have implemented a dacdif based runtest script. This should fix issues #94 and #66. I have also updated the GitHub action files with necessary environment variables and creating artifacts.