Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More trouble in LAMMPS compilation due to "LAMMPS_NS" #19

Closed
hzone3898 opened this issue Apr 5, 2023 · 5 comments
Closed

More trouble in LAMMPS compilation due to "LAMMPS_NS" #19

hzone3898 opened this issue Apr 5, 2023 · 5 comments

Comments

@hzone3898
Copy link

Hello,
despite reading through the past issues I still can't manage to compile LAMMPS with pair_allegro.

My environment:
gcc: 9.4.0
CUDA: 11.2.2
cudnn: 8.1.0.77-11.2
pytorch: 1.11
cmake: 3.23.1
GPU: A100

I'm getting libtorch with: wget https://download.pytorch.org/libtorch/cu113/libtorch-cxx11-abi-shared-with-deps-1.11.0%2Bcu113.zip , and I'm using lammps-stable_29Sep2021_update2 that I got from here https://github.com/lammps/lammps/releases/tag/stable_29Sep2021_update2

I run "cmake ../cmake -DCMAKE_PREFIX_PATH=../../libtorch/ -DMKL_INCLUDE_DIR=python -c "import sysconfig;from pathlib import Path;print(Path(sysconfig.get_paths()[\"include\"]).parent)" -DPKG_KOKKOS=ON -DKokkos_ENABLE_CUDA=ON -DKokkos_ARCH_AMPERE80=ON"

And I compile with "make -j 16", getting the errors:

  • /lammps-stable_29Sep2021_update2/src/pair_allegro.cpp(129): error: class "LAMMPS_NS::Neighbor" has no member "add_request"

  • /lammps-stable_29Sep2021_update2/src/pair_allegro.cpp(129): error: namespace "LAMMPS_NS::NeighConst" has no member "REQ_FULL"

  • /lammps-stable_29Sep2021_update2/src/pair_allegro.cpp(129): error: namespace "LAMMPS_NS::NeighConst" has no member "REQ_GHOST"

@Linux-cpp-lisp
Copy link
Collaborator

Looks like you are trying to compile the development version of pair_allegro, where if you look at the README you'll see it's been upgraded from requiring stable_29Sep2021_update2 and is now compatible with versions after LAMMPS made that breaking change to their neighborlists. You should be able to use any stable version of LAMMPS from after that update, including the latest without specifying a tag when you pull. See README on https://github.com/mir-group/pair_allegro/tree/stress

@hzone3898
Copy link
Author

Thank you! Using the latest LAMMPS version made it compile.

However when trying to run simple NVT or minimization for a test system (water molecules), I get an "std::out_of_range'" error:

KOKKOS mode is enabled (src/KOKKOS/kokkos.cpp:106)
  will use up to 1 GPU(s) per node
  using 1 OpenMP thread(s) per MPI task
Allegro is using input precision f and output precision f
Allegro is using device cuda
Reading data file ...
  orthogonal box = (-50.378 -49.142 -3.651) to (77.723 98.154 39.909)
  1 by 1 by 1 MPI processor grid
  reading atoms ...
  1002 atoms
  read_data CPU = 0.072 seconds
Allegro: Loading model from deployed.pth
Allegro: Freezing TorchScript model...
Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
0 | H | 1 | H
1 | O | 2 | O
### Equilibration NVT ###
Neighbor list info ...
  update: every = 1 steps, delay = 0 steps, check = yes
  max neighbors/atom: 2000, page size: 100000
  master list distance cutoff = 4.3
  ghost atom cutoff = 4.3
  binsize = 4.3, bins = 30 35 11
  1 neighbor lists, perpetual/occasional/extra = 1 0 0
  (1) pair allegro3232/kk, perpetual
      attributes: full, newton on, ghost, kokkos_device
      pair build: full/bin/ghost/kk/device
      stencil: full/ghost/bin/3d
      bin: kk/device
Setting up Verlet run ...
  Unit style    : lj
  Current step  : 0
  Time step     : 1
terminate called after throwing an instance of 'std::out_of_range'
  what():  Argument passed to at() was not in the map.
[g1101:2612852] *** Process received signal ***
[g1101:2612852] Signal: Aborted (6)
[g1101:2612852] Signal code:  (-6)
[g1101:2612852] [ 0] /lib64/libpthread.so.0(+0x12ce0)[0x7fffbfac1ce0]
[g1101:2612852] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7fff618e3a9f]
[g1101:2612852] [ 2] /lib64/libc.so.6(abort+0x127)[0x7fff618b6e05]
[g1101:2612852] [ 3] /appl/spack/v017/install-tree/gcc-8.5.0/gcc-11.2.0-zshp2k/lib64/libstdc++.so.6(+0xa998a)[0x7fff6208598a]
[g1101:2612852] [ 4] /appl/spack/v017/install-tree/gcc-8.5.0/gcc-11.2.0-zshp2k/lib64/libstdc++.so.6(+0xb51ea)[0x7fff620911ea]
[g1101:2612852] [ 5] /appl/spack/v017/install-tree/gcc-8.5.0/gcc-11.2.0-zshp2k/lib64/libstdc++.so.6(+0xb5255)[0x7fff62091255]
[g1101:2612852] [ 6] /appl/spack/v017/install-tree/gcc-8.5.0/gcc-11.2.0-zshp2k/lib64/libstdc++.so.6(+0xb54e9)[0x7fff620914e9]
[g1101:2612852] [ 7] /projappl/lammps/build/lmp[0xb752b9]
[g1101:2612852] [ 8] /projappl/lammps/build/lmp[0xcbc995]
[g1101:2612852] [ 9] /projappl//lammps/build/lmp[0x8c9042]
[g1101:2612852] [10] /projappl/lammps/build/lmp[0x58542c]
[g1101:2612852] [11] /projappl/peptides/lammps/build/lmp[0x48fbb6]
[g1101:2612852] [12] /projappl/peptides/lammps/build/lmp[0x48fe9e]
[g1101:2612852] [13] /projappl/lammps/build/lmp[0x44fdad]
[g1101:2612852] [14] /lib64/libc.so.6(__libc_start_main+0xf3)[0x7fff618cfcf3]
[g1101:2612852] [15] /projappl/lammps/build/lmp[0x47102e]
[g1101:2612852] *** End of error message ***

I'm using a deployed.pth allegro model trained on nequip==0.5.6, and thus I use "pair_style allegro3232" as seen in #12.

@anjohan
Copy link
Collaborator

anjohan commented Apr 7, 2023

Hi,

With gdb you can pinpoint the line where it fails. Something like gdb -ex=r -ex=where --args /path/to/lammps/build/lmp -in in.script.

Keep in mind that if you use the stress branch branch of pair_allegro and ask LAMMPS for pressure/stress, the model needs to be trained with stress support:

if(vflag){
torch::Tensor v_tensor = output.at("virial").toTensor().cpu();
auto v = v_tensor.accessor<outputtype, 3>();
// Convert from 3x3 symmetric tensor format, which NequIP outputs, to the flattened form LAMMPS expects
// First [0] index on v is batch
this->virial[0] = v[0][0][0];
this->virial[1] = v[0][1][1];
this->virial[2] = v[0][2][2];
this->virial[3] = v[0][0][1];
this->virial[4] = v[0][0][2];
this->virial[5] = v[0][1][2];
}

@hzone3898
Copy link
Author

This is the error when running with gdb:

Setting up Verlet run ...
  Unit style    : lj
  Current step  : 0
  Time step     : 1
[New Thread 0x7ffd25fff000 (LWP 3156788)]
terminate called after throwing an instance of 'std::out_of_range'
  what():  Argument passed to at() was not in the map.

Thread 1 "lmp" received signal SIGABRT, Aborted.
0x00007fff618e3a9f in raise () from /lib64/libc.so.6
#0  0x00007fff618e3a9f in raise () from /lib64/libc.so.6
#1  0x00007fff618b6e05 in abort () from /lib64/libc.so.6
#2  0x00007fff6208598a in ?? ()
   from /appl/spack/v017/install-tree/gcc-8.5.0/gcc-11.2.0-zshp2k/lib64/libstdc++.so.6
#3  0x00007fff620911ea in ?? ()
   from /appl/spack/v017/install-tree/gcc-8.5.0/gcc-11.2.0-zshp2k/lib64/libstdc++.so.6
#4  0x00007fff62091255 in std::terminate() ()
   from /appl/spack/v017/install-tree/gcc-8.5.0/gcc-11.2.0-zshp2k/lib64/libstdc++.so.6
#5  0x00007fff620914e9 in __cxa_throw ()
   from /appl/spack/v017/install-tree/gcc-8.5.0/gcc-11.2.0-zshp2k/lib64/libstdc++.so.6
#6  0x0000000000b752b9 in ska_ordered::order_preserving_flat_hash_map<c10::IValue, c10::IValue, c10::detail::DictKeyHash, c10::detail::DictKeyEqualTo, std::allocator<std::pair<c10::IValue, c10::IValue> > >::at (key=...,
    this=<optimized out>) at /local_scratch/tuple:510
#7  c10::Dict<c10::IValue, c10::IValue>::at (this=this@entry=0x7fffffffaae8,
    key=...) at /projappl/lammps/src/ios_base.h:152
#8  0x0000000000b7c65d in LAMMPS_NS::PairAllegro<(Precision)0>::compute (
    this=0x6a5ada0, eflag=<optimized out>, vflag=2)
    at /projappl/lammps/src/stl_uninitialized.h:1144
#9  0x00000000005f571a in LAMMPS_NS::Verlet::setup (this=0x6c38ed0, flag=1)
    at /projappl/lammps/build/atom_vec.h:140
#10 0x000000000058542c in LAMMPS_NS::Run::command (this=0x4008e020, narg=1,
    arg=0xd713ee0)
    at /appl/spack/v017/install-tree/gcc-11.2.0/cuda-11.5.0-mg4ztb/include/crt/basic_string.tcc:171
#11 0x000000000048fbb6 in LAMMPS_NS::Input::execute_command (this=0x6a18420)
    at /projappl/lammps/build/kspace.h:853
#12 0x000000000048fe9e in LAMMPS_NS::Input::file (this=0x6a18420)
    at /projappl/lammps/build/kspace.h:302
#13 0x000000000044fdad in main (
    argc=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>,
    argv=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at /projappl/lammps/src/main.cpp:105
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-189.5.el8_6.x86_64 hwloc-libs-2.2.0-3.el8.x86_64 libibverbs-56mlnx40-1.56103.x86_64 libjpeg-turbo-1.5.3-12.el8.x86_64 libnl3-3.5.0-1.el8.x86_64 libpng-1.6.34-5.el8.x86_64 librdmacm-56mlnx40-1.56103.x86_64 nvidia-driver-cuda-libs-525.85.12-1.el8.x86_64 openssl-libs-1.1.1k-7.el8_6.x86_64 zlib-1.2.11-19.el8_6.x86_64
(gdb) quit
A debugging session is active.

        Inferior 1 [process 3156674] will be killed.

@hzone3898
Copy link
Author

After unsuccessful debugging, turns out @anjohan was right and my input .yaml file was just wrong.
I did have stresses in the training data, but I forgot to change "ForceOutput" to "StressForceOutput" in the .yaml!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants