More trouble in LAMMPS compilation due to "LAMMPS_NS" #19

hzone3898 · 2023-04-05T15:09:48Z

Hello,
despite reading through the past issues I still can't manage to compile LAMMPS with pair_allegro.

My environment:
gcc: 9.4.0
CUDA: 11.2.2
cudnn: 8.1.0.77-11.2
pytorch: 1.11
cmake: 3.23.1
GPU: A100

I'm getting libtorch with: wget https://download.pytorch.org/libtorch/cu113/libtorch-cxx11-abi-shared-with-deps-1.11.0%2Bcu113.zip , and I'm using lammps-stable_29Sep2021_update2 that I got from here https://github.com/lammps/lammps/releases/tag/stable_29Sep2021_update2

I run "cmake ../cmake -DCMAKE_PREFIX_PATH=../../libtorch/ -DMKL_INCLUDE_DIR=python -c "import sysconfig;from pathlib import Path;print(Path(sysconfig.get_paths()[\"include\"]).parent)" -DPKG_KOKKOS=ON -DKokkos_ENABLE_CUDA=ON -DKokkos_ARCH_AMPERE80=ON"

And I compile with "make -j 16", getting the errors:

/lammps-stable_29Sep2021_update2/src/pair_allegro.cpp(129): error: class "LAMMPS_NS::Neighbor" has no member "add_request"
/lammps-stable_29Sep2021_update2/src/pair_allegro.cpp(129): error: namespace "LAMMPS_NS::NeighConst" has no member "REQ_FULL"
/lammps-stable_29Sep2021_update2/src/pair_allegro.cpp(129): error: namespace "LAMMPS_NS::NeighConst" has no member "REQ_GHOST"

The text was updated successfully, but these errors were encountered:

Linux-cpp-lisp · 2023-04-05T15:44:03Z

Looks like you are trying to compile the development version of pair_allegro, where if you look at the README you'll see it's been upgraded from requiring stable_29Sep2021_update2 and is now compatible with versions after LAMMPS made that breaking change to their neighborlists. You should be able to use any stable version of LAMMPS from after that update, including the latest without specifying a tag when you pull. See README on https://github.com/mir-group/pair_allegro/tree/stress

hzone3898 · 2023-04-06T20:04:48Z

Thank you! Using the latest LAMMPS version made it compile.

However when trying to run simple NVT or minimization for a test system (water molecules), I get an "std::out_of_range'" error:

KOKKOS mode is enabled (src/KOKKOS/kokkos.cpp:106)
  will use up to 1 GPU(s) per node
  using 1 OpenMP thread(s) per MPI task
Allegro is using input precision f and output precision f
Allegro is using device cuda
Reading data file ...
  orthogonal box = (-50.378 -49.142 -3.651) to (77.723 98.154 39.909)
  1 by 1 by 1 MPI processor grid
  reading atoms ...
  1002 atoms
  read_data CPU = 0.072 seconds
Allegro: Loading model from deployed.pth
Allegro: Freezing TorchScript model...
Type mapping:
Allegro type | Allegro name | LAMMPS type | LAMMPS name
0 | H | 1 | H
1 | O | 2 | O
### Equilibration NVT ###
Neighbor list info ...
  update: every = 1 steps, delay = 0 steps, check = yes
  max neighbors/atom: 2000, page size: 100000
  master list distance cutoff = 4.3
  ghost atom cutoff = 4.3
  binsize = 4.3, bins = 30 35 11
  1 neighbor lists, perpetual/occasional/extra = 1 0 0
  (1) pair allegro3232/kk, perpetual
      attributes: full, newton on, ghost, kokkos_device
      pair build: full/bin/ghost/kk/device
      stencil: full/ghost/bin/3d
      bin: kk/device
Setting up Verlet run ...
  Unit style    : lj
  Current step  : 0
  Time step     : 1
terminate called after throwing an instance of 'std::out_of_range'
  what():  Argument passed to at() was not in the map.
[g1101:2612852] *** Process received signal ***
[g1101:2612852] Signal: Aborted (6)
[g1101:2612852] Signal code:  (-6)
[g1101:2612852] [ 0] /lib64/libpthread.so.0(+0x12ce0)[0x7fffbfac1ce0]
[g1101:2612852] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7fff618e3a9f]
[g1101:2612852] [ 2] /lib64/libc.so.6(abort+0x127)[0x7fff618b6e05]
[g1101:2612852] [ 3] /appl/spack/v017/install-tree/gcc-8.5.0/gcc-11.2.0-zshp2k/lib64/libstdc++.so.6(+0xa998a)[0x7fff6208598a]
[g1101:2612852] [ 4] /appl/spack/v017/install-tree/gcc-8.5.0/gcc-11.2.0-zshp2k/lib64/libstdc++.so.6(+0xb51ea)[0x7fff620911ea]
[g1101:2612852] [ 5] /appl/spack/v017/install-tree/gcc-8.5.0/gcc-11.2.0-zshp2k/lib64/libstdc++.so.6(+0xb5255)[0x7fff62091255]
[g1101:2612852] [ 6] /appl/spack/v017/install-tree/gcc-8.5.0/gcc-11.2.0-zshp2k/lib64/libstdc++.so.6(+0xb54e9)[0x7fff620914e9]
[g1101:2612852] [ 7] /projappl/lammps/build/lmp[0xb752b9]
[g1101:2612852] [ 8] /projappl/lammps/build/lmp[0xcbc995]
[g1101:2612852] [ 9] /projappl//lammps/build/lmp[0x8c9042]
[g1101:2612852] [10] /projappl/lammps/build/lmp[0x58542c]
[g1101:2612852] [11] /projappl/peptides/lammps/build/lmp[0x48fbb6]
[g1101:2612852] [12] /projappl/peptides/lammps/build/lmp[0x48fe9e]
[g1101:2612852] [13] /projappl/lammps/build/lmp[0x44fdad]
[g1101:2612852] [14] /lib64/libc.so.6(__libc_start_main+0xf3)[0x7fff618cfcf3]
[g1101:2612852] [15] /projappl/lammps/build/lmp[0x47102e]
[g1101:2612852] *** End of error message ***

I'm using a deployed.pth allegro model trained on nequip==0.5.6, and thus I use "pair_style allegro3232" as seen in #12.

anjohan · 2023-04-07T12:56:12Z

Hi,

With gdb you can pinpoint the line where it fails. Something like gdb -ex=r -ex=where --args /path/to/lammps/build/lmp -in in.script.

Keep in mind that if you use the stress branch branch of pair_allegro and ask LAMMPS for pressure/stress, the model needs to be trained with stress support:

pair_allegro/pair_allegro_kokkos.cpp

Lines 304 to 315 in 176db81

    
           if(vflag){ 
        
             torch::Tensor v_tensor = output.at("virial").toTensor().cpu(); 
        
             auto v = v_tensor.accessor<outputtype, 3>(); 
        
             // Convert from 3x3 symmetric tensor format, which NequIP outputs, to the flattened form LAMMPS expects 
        
             // First [0] index on v is batch 
        
             this->virial[0] = v[0][0][0]; 
        
             this->virial[1] = v[0][1][1]; 
        
             this->virial[2] = v[0][2][2]; 
        
             this->virial[3] = v[0][0][1]; 
        
             this->virial[4] = v[0][0][2]; 
        
             this->virial[5] = v[0][1][2]; 
        
           }

hzone3898 · 2023-04-08T11:17:18Z

This is the error when running with gdb:

Setting up Verlet run ...
  Unit style    : lj
  Current step  : 0
  Time step     : 1
[New Thread 0x7ffd25fff000 (LWP 3156788)]
terminate called after throwing an instance of 'std::out_of_range'
  what():  Argument passed to at() was not in the map.

Thread 1 "lmp" received signal SIGABRT, Aborted.
0x00007fff618e3a9f in raise () from /lib64/libc.so.6
#0  0x00007fff618e3a9f in raise () from /lib64/libc.so.6
#1  0x00007fff618b6e05 in abort () from /lib64/libc.so.6
#2  0x00007fff6208598a in ?? ()
   from /appl/spack/v017/install-tree/gcc-8.5.0/gcc-11.2.0-zshp2k/lib64/libstdc++.so.6
#3  0x00007fff620911ea in ?? ()
   from /appl/spack/v017/install-tree/gcc-8.5.0/gcc-11.2.0-zshp2k/lib64/libstdc++.so.6
#4  0x00007fff62091255 in std::terminate() ()
   from /appl/spack/v017/install-tree/gcc-8.5.0/gcc-11.2.0-zshp2k/lib64/libstdc++.so.6
#5  0x00007fff620914e9 in __cxa_throw ()
   from /appl/spack/v017/install-tree/gcc-8.5.0/gcc-11.2.0-zshp2k/lib64/libstdc++.so.6
#6  0x0000000000b752b9 in ska_ordered::order_preserving_flat_hash_map<c10::IValue, c10::IValue, c10::detail::DictKeyHash, c10::detail::DictKeyEqualTo, std::allocator<std::pair<c10::IValue, c10::IValue> > >::at (key=...,
    this=<optimized out>) at /local_scratch/tuple:510
#7  c10::Dict<c10::IValue, c10::IValue>::at (this=this@entry=0x7fffffffaae8,
    key=...) at /projappl/lammps/src/ios_base.h:152
#8  0x0000000000b7c65d in LAMMPS_NS::PairAllegro<(Precision)0>::compute (
    this=0x6a5ada0, eflag=<optimized out>, vflag=2)
    at /projappl/lammps/src/stl_uninitialized.h:1144
#9  0x00000000005f571a in LAMMPS_NS::Verlet::setup (this=0x6c38ed0, flag=1)
    at /projappl/lammps/build/atom_vec.h:140
#10 0x000000000058542c in LAMMPS_NS::Run::command (this=0x4008e020, narg=1,
    arg=0xd713ee0)
    at /appl/spack/v017/install-tree/gcc-11.2.0/cuda-11.5.0-mg4ztb/include/crt/basic_string.tcc:171
#11 0x000000000048fbb6 in LAMMPS_NS::Input::execute_command (this=0x6a18420)
    at /projappl/lammps/build/kspace.h:853
#12 0x000000000048fe9e in LAMMPS_NS::Input::file (this=0x6a18420)
    at /projappl/lammps/build/kspace.h:302
#13 0x000000000044fdad in main (
    argc=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>,
    argv=<error reading variable: dwarf2_find_location_expression: Corrupted DWARF expression.>) at /projappl/lammps/src/main.cpp:105
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-189.5.el8_6.x86_64 hwloc-libs-2.2.0-3.el8.x86_64 libibverbs-56mlnx40-1.56103.x86_64 libjpeg-turbo-1.5.3-12.el8.x86_64 libnl3-3.5.0-1.el8.x86_64 libpng-1.6.34-5.el8.x86_64 librdmacm-56mlnx40-1.56103.x86_64 nvidia-driver-cuda-libs-525.85.12-1.el8.x86_64 openssl-libs-1.1.1k-7.el8_6.x86_64 zlib-1.2.11-19.el8_6.x86_64
(gdb) quit
A debugging session is active.

        Inferior 1 [process 3156674] will be killed.

hzone3898 · 2023-04-13T07:21:02Z

After unsuccessful debugging, turns out @anjohan was right and my input .yaml file was just wrong.
I did have stresses in the training data, but I forgot to change "ForceOutput" to "StressForceOutput" in the .yaml!

hzone3898 closed this as completed Apr 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More trouble in LAMMPS compilation due to "LAMMPS_NS" #19

More trouble in LAMMPS compilation due to "LAMMPS_NS" #19

hzone3898 commented Apr 5, 2023

Linux-cpp-lisp commented Apr 5, 2023

hzone3898 commented Apr 6, 2023

anjohan commented Apr 7, 2023

hzone3898 commented Apr 8, 2023

hzone3898 commented Apr 13, 2023

More trouble in LAMMPS compilation due to "LAMMPS_NS" #19

More trouble in LAMMPS compilation due to "LAMMPS_NS" #19

Comments

hzone3898 commented Apr 5, 2023

Linux-cpp-lisp commented Apr 5, 2023

hzone3898 commented Apr 6, 2023

anjohan commented Apr 7, 2023

hzone3898 commented Apr 8, 2023

hzone3898 commented Apr 13, 2023