Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for F functions #299

Merged
merged 163 commits into from
Feb 25, 2024
Merged

Support for F functions #299

merged 163 commits into from
Feb 25, 2024

Conversation

Madu86
Copy link
Collaborator

@Madu86 Madu86 commented Sep 26, 2023

In this PR, I have enabled support for f functions in the ERI and ERI gradient calculations. Specifically, the following tasks were performed.

  1. Updated error trap for F functions. The trap activates only if the user compiles the code without F function support.
  2. Debugged F subroutines of legacy CPU ERI code.
  3. Implemented gradient calculation of ERIs containing F in CPU code.
  4. Ported 2 and 3 implementations to GPU.
  5. Implemented parallel versions of 2-4.
  6. Enabled support to compile new source files added in 3-5 in CMake build system.
  7. Performance optimization of F kernels in CUDA and CUDAMPI versions.
  8. Updated default basis set collection with cc-pVTZ, def2-tZVP and 6-311G(2df,2pd). Also generated atomic densities required for the SAD guess of these basis sets.
  9. Updated test suite with energy, gradient, and geometry optimization tests with cases containing F functions.
  10. Updated CI to compile the code with F function support and test.

The accuracy of the implementations (energy and gradients) was tested against a reference software. The results were in excellent agreement. Please check the PR and test exhaustively.

The code can be configured using the CMake build system for Volta architecture with F function support and GNU compiler toolchain as follows (assuming the build directory is located inside QUICK home directory).

cmake .. -DMPI=TRUE -DCUDA=TRUE -DCMAKE_INSTALL_PREFIX=$(pwd)/../install -DCOMPILER=GNU -DQUICK_USER_ARCH=volta -DENABLEF=TRUE

Here is an accuracy and performance comparison of PSB3 gradient calculation at the B3LYP/cc-pVTZ level of theory. CUDA tests were run on NVIDIA A100 cards.

CPU serial CPU parallel (2 procs) GPU serial GPU parallel (2 procs)
Energy (a.u.) -249.921840664 -249.921840664 -249.921840510 -249.921840510
Gradients (a.u./Hartree)
1X -0.010743 -0.010743 -0.010743 -0.010743
1Y -0.003115 -0.003115 -0.003115 -0.003115
1Z -0.000697 -0.000697 -0.000697 -0.000697
2X 0.002315 0.002315 0.002315 0.002315
2Y 0.000594 0.000594 0.000594 0.000594
2Z 0.010834 0.010834 0.010834 0.010834
3X 0.001040 0.001040 0.001039 0.001039
3Y 0.000329 0.000329 0.000329 0.000329
3Z -0.009902 -0.009902 -0.009902 -0.009902
4X -0.001622 -0.001622 -0.001622 -0.001622
4Y -0.000155 -0.000155 -0.000156 -0.000156
4Z -0.014091 -0.014091 -0.014090 -0.014090
5X 0.016453 0.016453 0.016452 0.016452
5Y 0.004450 0.004450 0.004450 0.004450
5Z 0.013671 0.013671 0.013671 0.013671
6X -0.007907 -0.007907 -0.007906 -0.007906
6Y -0.002272 -0.002272 -0.002263 -0.002263
6Z -0.000343 -0.000343 -0.000357 -0.000357
7X -0.001899 -0.001899 -0.001899 -0.001899
7Y -0.000576 -0.000576 -0.000576 -0.000576
7Z 0.004551 0.004551 0.004551 0.004551
8X 0.003601 0.003601 0.003601 0.003601
8Y 0.000919 0.000919 0.000919 0.000919
8Z 0.004957 0.004957 0.004957 0.004957
9X 0.003502 0.003502 0.003502 0.003502
9Y 0.001099 0.001099 0.001099 0.001099
9Z -0.005542 -0.005542 -0.005542 -0.005542
10X -0.002006 -0.002006 -0.002006 -0.002006
10Y -0.000531 -0.000531 -0.000531 -0.000531
10Z -0.004566 -0.004566 -0.004566 -0.004566
11X -0.003991 -0.003991 -0.003991 -0.003991
11Y -0.001023 -0.001023 -0.001022 -0.001022
11Z -0.006537 -0.006537 -0.006536 -0.006536
12X -0.003578 -0.003578 -0.003578 -0.003578
12Y -0.001116 -0.001116 -0.001125 -0.001125
12Z 0.007073 0.007073 0.007086 0.007086
13X 0.003088 0.003088 0.003088 0.003088
13Y 0.000925 0.000925 0.000925 0.000925
13Z -0.004499 -0.004499 -0.004499 -0.004499
14X 0.001747 0.001747 0.001747 0.001747
14Y 0.000471 0.000471 0.000471 0.000471
14Z 0.005089 0.005089 0.005089 0.005089
Runtime (s) 2997.88 1560.69 131.18 78.40

The input and output files of these runs can be found inside the example.tar.gz attached below.

Finally, it is worth noticing the limitations of the current CUDA/CUDAMPI F implementations.

example.tar.gz

…ore than 10 primitive functions.

If the molecule has certain atoms (eg. V, Mn, etc.), calculation will fail with a seg fault even before the SCF begins.
…e total angular momentum is less than or equal to 8
…ore than 10 primitive functions.

If the molecule has certain atoms (eg. V, Mn, etc.), calculation will fail with a seg fault even before the SCF begins.
…e total angular momentum is less than or equal to 8
@Madu86 Madu86 added the enhancement New feature or request label Sep 26, 2023
@Madu86 Madu86 self-assigned this Sep 26, 2023
Copy link
Collaborator

@agoetz agoetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. I have looked it over and tested the code.

CPU only (with and without F functions)

  • GNU 10.2, OpenMPI 4.0.5
  • GNU 10.2, OpenMPI 4.0.5, MKL 2024.0
  • GNU 11.4, OpenMPI 4.0.5

CPU and GPU (A100, Expanse)

  • GNU 10.2.0, OpenMPI 4.1.3, CUDA 11.7, MKL 2020.4

All tests pass.

Open shell (CPU only MPI tested, not serial)

  • Energy looks good on CPU and GPU
  • Gradient looks good on CPU

Following needs to be done (will do separate PRs)

  • Add trap for G functions
  • Add trap for open shell + F functions + gradient + CUDA
  • Add test with 4 centers with f functions (e.g. def2-TZVP basis)

@agoetz agoetz merged commit 642885c into merzlab:ffunc-gen2 Feb 25, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants