New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some updates to the GPU package #926

Merged
merged 7 commits into from Jun 13, 2018

Conversation

Projects
None yet
6 participants
@ndtrung81
Copy link
Contributor

ndtrung81 commented May 22, 2018

Purpose

  • Fixed a bug with MPI_Allgather when used for node_names dynamically allocated in lal_device.cpp
  • Added pair lj/expand/coul/long to USER-MISC and its /gpu version
  • Added pair lj/cut/dipole/long/gpu
  • Added example Makefile's (Makefile.linux_multi) for lib/gpu to build libgpu.a with multiple architectures

Author(s)

Trung Nguyen (Northwestern)

Backward Compatibility

The changes should not affect other parts of LAMMPS.

Implementation Notes

The new pair styles and their gpu version were tested by minimal input decks derived from examples/dipole.

Post Submission Checklist

Please check the fields below as they are completed

  • The feature or features in this pull request is complete
  • Suitable new documentation files and/or updates to the existing docs are included
  • One or more example input decks are included
  • The source code follows the LAMMPS formatting guidelines

Further Information, Files, and Links

Fixes #939

ndtrung81 added some commits May 20, 2018

Added pair styles lj/expand/coul/long to USER-MISC and its gpu versio…
…n to GPU package, added lj/cut/dipole/long/gpu; and added an example Makfile to build GPU package for multi-arch
@junghans

This comment has been minimized.

Copy link
Member

junghans commented May 22, 2018

@rbberger can you test this in CMake?

@ndtrung81 ndtrung81 self-assigned this May 22, 2018

@rbberger

This comment has been minimized.

Copy link
Member

rbberger commented May 23, 2018

@ndtrung81 I'm adding a small fix to the CMake build of this. We had this bug for some time.

@ndtrung81

This comment has been minimized.

Copy link
Contributor

ndtrung81 commented May 23, 2018

@rbberger Thanks! I didn't notice that.

$(OBJ_DIR)/cudpp_plan.o: cudpp_mini/cudpp_plan.cpp
$(CUDR) -o $@ -c cudpp_mini/cudpp_plan.cpp -Icudpp_mini

$(OBJ_DIR)/cudpp_maximal_launch.o: cudpp_mini/cudpp_maximal_launch.cpp

This comment has been minimized.

@junghans

junghans May 23, 2018

Member

Can we use a pattern rule?

@rbberger

This comment has been minimized.

Copy link
Member

rbberger commented May 23, 2018

@junghans we also have other problems with CMake. I tested yesterday on Ubuntu, CUDA 9 and Volta and $<$<>> doesn't expand properly, causes an NVCC compilation error and even after manually fixing it the final binary segfaults. Doesn't seem to be related to this PR. I'm looking into it.

@rbberger

This comment has been minimized.

Copy link
Member

rbberger commented May 24, 2018

ok, correction. The segfault disappears with this PR. It was the MPI_Allgather bug. I also can't reproduce the CMake issue on Arch with CUDA 9.1 and the TITAN X, so it must be something about the Ubuntu setup.

@rbberger

This comment has been minimized.

Copy link
Member

rbberger commented May 24, 2018

@junghans looks like this is only an issue with cmake 3.5.1 on Ubuntu 16.04 LTS. Installing cmake 3.11.2 fixes the issue. It wasn't expanding the conditional compilation options passed to cuda_compile and even puts some other options in quotes, which then leads to a subsequent error due to a second source file.

-- Generating dependency file: /home/rberger/GitHub/lammps/build/CMakeFiles/cuda_compile.dir/__/lib/gpu/cudpp_mini/cuda_compile_generated_scan_app.cu.o.NVCC-depend
/data/opt/tools/cuda-9.0.176/bin/nvcc -M -D__CUDACC__ /home/rberger/GitHub/lammps/cmake/../lib/gpu/cudpp_mini/scan_app.cu -o /home/rberger/GitHub/lammps/build/CMakeFiles/cuda_compile.dir/__/lib/gpu/cudpp_mini/cuda_compile_generated_scan_app.cu.o.NVCC-depend -ccbin /usr/bin/cc -m64 -DLAMMPS_SMALLBIG -DLAMMPS_MEMALIGN=64 -DLAMMPS_GZIP -Xcompiler ,\"-fopenmp\",\"-O3\",\"-DNDEBUG\" $<$<BOOL:OFF>:-Xcompiler=-fPIC> -DUNIX -O3 -Xptxas -v --use_fast_math -DUCL_CUDADR -arch=sm_70 -D_SINGLE_DOUBLE -DNVCC -I/home/rberger/GitHub/lammps/cmake/../lib/gpu -I/home/rberger/GitHub/lammps/build/lib/gpu -I/home/rberger/GitHub/lammps/cmake/../lib/gpu/cudpp_mini -I/data/opt/tools/cuda-9.0.176/include -I/home/rberger/GitHub/lammps/cmake/../src/STUBS -I/home/rberger/GitHub/lammps/cmake/../src/DIPOLE
nvcc fatal   : A single input file is required for a non-link phase when an outputfile is specified
CMake Error at cuda_compile_generated_scan_app.cu.o.cmake:207 (message):
  Error generating
  /home/rberger/GitHub/lammps/build/CMakeFiles/cuda_compile.dir/__/lib/gpu/cudpp_mini/./cuda_compile_generated_scan_app.cu.o

@ndtrung81 ndtrung81 assigned akohlmey and unassigned ndtrung81 May 30, 2018

@@ -80,7 +80,7 @@ int DeviceT::init_device(MPI_Comm world, MPI_Comm replica, const int first_gpu,
char node_name[MPI_MAX_PROCESSOR_NAME];
char *node_names = new char[MPI_MAX_PROCESSOR_NAME*_world_size];
MPI_Get_processor_name(node_name,&name_length);
MPI_Allgather(&node_name,MPI_MAX_PROCESSOR_NAME,MPI_CHAR,&node_names,
MPI_Allgather(&node_name,MPI_MAX_PROCESSOR_NAME,MPI_CHAR,&node_names[0],

This comment has been minimized.

@ndtrung81

ndtrung81 Jun 1, 2018

Contributor

@stanmoore1 I think the segfault reported in #939 may be resolved with this change in lal_device.cpp. Can you test this for me? Thanks!

This comment has been minimized.

@stanmoore1

stanmoore1 Jun 1, 2018

Contributor

Yes, confirmed. Thanks

This comment has been minimized.

@akohlmey

akohlmey Jun 1, 2018

Member

That one was introduced by me when we removed the non-standard variable length arrays. Sorry about that.

@lammps lammps deleted a comment from stanmoore1 Jun 1, 2018

@akohlmey
Copy link
Member

akohlmey left a comment

@ndtrung81 can you please apply some cosmetic changes for consistency with the rest of LAMMPS that we also applied to the GPU library:

#include <math.h> becomes #include <cmath>
#include <stdio.h> becomes #include <cstdio>
#include <string.h> becomes #include <cstring>

and a few more. please note, that some of these changes are c++-11 only (e.g. #include <cinttypes>) and that would make the GPU package require c++-11, which i would prefer to avoid.

@ndtrung81

This comment has been minimized.

Copy link
Contributor

ndtrung81 commented Jun 2, 2018

@akohlmey I have made the changes to the source files in lib/gpu and src/GPU.

@rbberger rbberger added the cmake label Jun 4, 2018

@stanmoore1

This comment has been minimized.

Copy link
Contributor

stanmoore1 commented Jun 8, 2018

@ndtrung81 can you also update the header files for pair lj/expand/coul/long in USER-MISC? Thanks
@akohlmey I don't see where <cinttypes> is included

@ndtrung81

This comment has been minimized.

Copy link
Contributor

ndtrung81 commented Jun 9, 2018

@stanmoore1 sure, just did in commit 3c781af

@stanmoore1

This comment has been minimized.

Copy link
Contributor

stanmoore1 commented Jun 11, 2018

@akohlmey the only place I see <cinttypes> used is in Kokkos:

grep -I -r cinttypes *
lib/kokkos/core/src/impl/Kokkos_Profiling_Interface.hpp:#include <cinttypes>
lib/kokkos/core/unit_test/TestTeamVector.hpp:#include <cinttypes>
src/dump_dcd.cpp:#include <inttypes.h> // <cinttypes> requires C++-11
src/lmptype.h:#include <inttypes.h> // <cinttypes> requires C++-11

So I'm assigning this to @sjplimp to be merged.

@sjplimp sjplimp merged commit 6618481 into lammps:master Jun 13, 2018

4 checks passed

lammps/pull-requests/build-docs-pr head run ended
Details
lammps/pull-requests/kokkos_omp head run ended
Details
lammps/pull-requests/openmpi-pr head run ended
Details
lammps/pull-requests/serial-pr head run ended
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment