Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyMemory View error and muliti cuda flag errors #404

Closed
trontrytel opened this issue Jun 29, 2020 · 14 comments
Closed

PyMemory View error and muliti cuda flag errors #404

trontrytel opened this issue Jun 29, 2020 · 14 comments

Comments

@trontrytel
Copy link
Collaborator

I was trying to compile the current master version of libcloud on the singularity image. I'm running into 2 problems:

  1. This flag -DLIBCLOUDPHXX_FORCE_MULTI_CUDA=1 hangs the compilation

  2. Without the above flag I get the following error:

/libcloudphxx/bindings/python/lgrngn.hpp:53:63: error: 'PyMemoryView_FromMemory' was not declared in this scope
         return bp::object(bp::handle<>(PyMemoryView_FromMemory(
                                        ~~~~~~~~~~~~~~~~~~~~~~~^
           reinterpret_cast<char *>(arg->outbuf()),
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~             
           sizeof(real_t)
           ~~~~~~~~~~~~~~                                       
           * std::max(1, arg->opts_init->nx)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                    
           * std::max(1, arg->opts_init->ny)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                    
           * std::max(1, arg->opts_init->nz),
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                   
           PyBUF_READ
           ~~~~~~~~~~                                           
         ))); // TODO: this assumes Python 2 -> make it compatible with P3 or require P2 in CMake
         ~                                                      
libcloudphxx/bindings/python/lgrngn.hpp:53:63: note: suggested alternative: 'PyMemoryView_FromBuffer'
         return bp::object(bp::handle<>(PyMemoryView_FromMemory(
                                        ~~~~~~~~~~~~~~~~~~~~~~~^
                                        PyMemoryView_FromBuffer
           reinterpret_cast<char *>(arg->outbuf()),
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~             
                                        PyMemoryView_FromBuffer
           sizeof(real_t)
           ~~~~~~~~~~~~~~                                       
                                        PyMemoryView_FromBuffer
           * std::max(1, arg->opts_init->nx)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                    
                                        PyMemoryView_FromBuffer
           * std::max(1, arg->opts_init->ny)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                    
                                        PyMemoryView_FromBuffer
           * std::max(1, arg->opts_init->nz),
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                   
                                        PyMemoryView_FromBuffer
           PyBUF_READ
           ~~~~~~~~~~                                           
                                        PyMemoryView_FromBuffer
         ))); // TODO: this assumes Python 2 -> make it compatible with P3 or require P2 in CMake
         ~                                                      
                                        PyMemoryView_FromBuffer

@pdziekan
Copy link
Contributor

I think that the issue is linked to the retirement of Python 2.
Python bindings now require Python 3.
I'm surprised that CMake didn't complain about Python version.

To fix the issue, try building a singularity image that has Python 3.
An updated recipe is in the UWLCM repo.

@trontrytel
Copy link
Collaborator Author

Hi @pdziekan . Thank you for taking a look. The singularity image works for me mostly. cmake still detects python2.7 by default but using something like cmake -DPYTHON_EXECUTABLE:FILEPATH=/usr/bin/python3 -DPYTHON_INCLUDE_DIR:PATH=/usr/include/python3.6 .. helped to point it in the right direction.

I'm actually trying to run parcel model. But I'm running into issues with MPI and python bindings: igfuw/parcel#81 Could you take a look there as well?

@trontrytel
Copy link
Collaborator Author

Actually I'm getting the same error if I run unit tests for libcloudphxx inside the singularity image: RuntimeError: The Python bindings of libcloudph++ Lagrangian microphysics can't be used in MPI runs.

@claresinger
Copy link
Contributor

I just stumbled into this problem with Python bindings running the tests for libcloudph++ inside my singularity image as well.

@trontrytel
Copy link
Collaborator Author

@claresinger I think that the issue is more with our cluster setting that run mpi silently even if you don't ask for it. To get rid of that you should type unset PMI_RANK in terminal.

That clears the env variable that libcloud is using to check if the simulation is run with mpi or not

@claresinger
Copy link
Contributor

@trontrytel fyi libcloudph is hanging again during compilation on central... noticed that the singularity scripts got updated in UWLCM so I will try and remake that image and see if that works...

@trontrytel
Copy link
Collaborator Author

sounds good! LMK if it doesnt work

@pdziekan
Copy link
Contributor

pdziekan commented Sep 4, 2020

@claresinger does the compilation hang without any output?

@trontrytel could you check if using this branch of libcloudph++ https://github.com/pdziekan/libcloudphxx/tree/mpi_detection you still need to unset PMI_RANK env var?

Also, I think that libcloudph++ is detecting Python2.7, because of some environmental variables set at your cluster.
Did you try running a singularity shell in a clean environment?
You can do this with:
env -i singularity shell --nv sng_ubuntu_18_04_cuda_10_0.sif

@claresinger
Copy link
Contributor

claresinger commented Sep 4, 2020

@pdziekan yes, it starts compiling and then hangs at this step with no further output for 30+ minutes.

Singularity sng_ubuntu_18_04_cuda_10_0.sif:~/microphys/libcloudphxx/build> make -j64
Scanning dependencies of target git_revision.h
Scanning dependencies of target cloudphxx_lgrngn
[  0%] Built target git_revision.h
[  4%] Building CXX object CMakeFiles/cloudphxx_lgrngn.dir/src/lib.cpp.o
[ 19%] Building CUDA object CMakeFiles/cloudphxx_lgrngn.dir/src/lib_multicuda.cu.o
[ 19%] Building CXX object CMakeFiles/cloudphxx_lgrngn.dir/src/lib_omp.cpp.o
[ 19%] Building CUDA object CMakeFiles/cloudphxx_lgrngn.dir/src/lib_cuda.cu.o
[ 23%] Building CXX object CMakeFiles/cloudphxx_lgrngn.dir/src/lib_cpp.cpp.o

Here is the output from cmake. Should it say -- Detecting if the compiler is an MPI wrapper... - FALSE?

Singularity sng_ubuntu_18_04_cuda_10_0.sif:~/microphys> cd libcloudphxx/build/
Singularity sng_ubuntu_18_04_cuda_10_0.sif:~/microphys/libcloudphxx/build> rm -rf *
Singularity sng_ubuntu_18_04_cuda_10_0.sif:~/microphys/libcloudphxx/build> unset PMI_RANK
Singularity sng_ubuntu_18_04_cuda_10_0.sif:~/microphys/libcloudphxx/build> cmake -DPYTHON_EXECUTABLE:FILEPATH=/usr/bin/python3 -DPYTHON_INCLUDE_DIR:PATH=/usr/include/python3.6 -DLIBCLOUDPHXX_FORCE_MULTI_CUDA=1 ..
-- The CXX compiler identification is GNU 7.5.0
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - /usr/local/cuda/bin/nvcc
-- The CUDA compiler identification is NVIDIA 10.0.130
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Found OpenMP: TRUE (found version "4.5")  
--  OpenMP found
-- Detecting if the compiler is an MPI wrapper...
-- Detecting if the compiler is an MPI wrapper... - FALSE
-- Trying to obtain CUDA capability of local hardware...
-- Detected more than 1 GPU or LIBCLOUDPHXX_FORCE_MULTI_CUDA set, the multi_CUDA backend will be built.
-- CUDA capability: 60
-- Found Thrust: /usr/local/include (found version "1.9.910") 
-- Boost version: 1.65.1
-- Testing if Boost ODEINT version >= 1.58
-- Found PythonInterp: /usr/bin/python3 (found suitable version "3.6.9", minimum required is "3") 
-- Found PythonLibs: /usr/lib/x86_64-linux-gnu/libpython3.6m.so (found version "3.6.9") 
-- boost numpy as numpy3
-- Boost version: 1.65.1
-- Found the following Boost libraries:
--   numpy3
--   python3
-- Performing Test BLITZ_FOUND
-- Performing Test BLITZ_FOUND - Success
-- Configuring done
-- Generating done
-- Build files have been written to: /home/csinger/microphys/libcloudphxx/build

@trontrytel I'm happy to check this new branch to see if it means we don't have to unset PMI_RANK after the compiling works with unsetting it!

@pdziekan
Copy link
Contributor

pdziekan commented Sep 8, 2020

@claresinger compilation may hang because you are running out of RAM.
Try assigning more memory or running compilation with a single thread only (make -j1).

-- Detecting if the compiler is an MPI wrapper... - FALSE
is fine.
If you wanted to run a simulation on a distributed memory system, you would need to compile the library with an MPI compiler.

@claresinger
Copy link
Contributor

Thanks @pdziekan using make -j1 works!

@claresinger
Copy link
Contributor

Tested this branch and it works without having to unset PMI_RANK https://github.com/pdziekan/libcloudphxx/tree/mpi_detection. Thanks @pdziekan! I see you already have PR #406 created to merge this fix.

@pdziekan
Copy link
Contributor

@claresinger great! Can I close this issue, or is there something else to resolve?

@claresinger
Copy link
Contributor

@pdziekan Yes, you can close this issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants