Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warp Drive PyCuda Error #28

Closed
mhelabd opened this issue Feb 17, 2022 · 2 comments
Closed

Warp Drive PyCuda Error #28

mhelabd opened this issue Feb 17, 2022 · 2 comments

Comments

@mhelabd
Copy link

mhelabd commented Feb 17, 2022

I am currently running a training script using warp-drive.

I have my environment initialized in this dockerfile.

When running my training_script, I get the following error:

python training_script.py --env simple_wood_and_stone

Inside training_script.py: 1 GPUs are available.
Inside env_wrapper.py: 1 GPUs are available.
/home/miniconda/lib/python3.7/site-packages/torch/cuda/__init__.py:120: UserWarning:
    Found GPU%d %s which is of cuda capability %d.%d.
    PyTorch no longer supports this GPU because it is too old.
    The minimum cuda capability supported by this library is %d.%d.

  warnings.warn(old_gpu_warn.format(d, name, major, minor, min_arch // 10, min_arch % 10))
Initializing the CUDA data manager...
Initializing the CUDA function manager...
WARNING:root:the destination header file /home/miniconda/lib/python3.7/site-packages/warp_drive/cuda_includes/env_config.h already exists; remove and rebuild.
WARNING:root:the destination runner file /home/miniconda/lib/python3.7/site-packages/warp_drive/cuda_includes/env_runner.cu already exists; remove and rebuild.
Traceback (most recent call last):
  File "training_script.py", line 109, in <module>
    customized_env_registrar=env_registry,
  File "/home/miniconda/lib/python3.7/site-packages/ai_economist/foundation/env_wrapper.py", line 208, in __init__
    self.cuda_function_manager.initialize_functions([step_function])
  File "/home/miniconda/lib/python3.7/site-packages/warp_drive/managers/function_manager.py", line 330, in initialize_functions
    self._cuda_functions[fname] = self._CUDA_module.get_function(fname)
pycuda._driver.LogicError: cuModuleGetFunction failed: named symbol not found

was wondering if someone ran into this before or has any idea how to fix it?

@Emerald01
Copy link
Collaborator

Emerald01 commented Feb 18, 2022

I think your running env looks good. The error is here, basically warpdrive does not find your environment step() source code in .cu so it cannot initialize your step function

self._cuda_functions[fname] = self._CUDA_module.get_function(fname)
pycuda._driver.LogicError: cuModuleGetFunction failed: named symbol not found

Inside the code, it happens here
In the Foundation wrapper you had the following. You have to have a step kernel function called f"Cuda{self.name}Step" in a .cu source code file, and registered under env_registra with its absolute path.

           self.cuda_function_manager.compile_and_load_cuda(
                env_name=self.name,
                template_header_file="template_env_config.h",
                template_runner_file="template_env_runner.cu",
                customized_env_registrar=customized_env_registrar,
            )
            print("initialize_functions...")
            
            step_function = f"Cuda{self.name}Step"
            self.cuda_function_manager.initialize_functions([step_function])
            self.env.cuda_step = self.cuda_function_manager.get_function(step_function)

Please let me know if you have any problem, I am more than happy to help.

@sunil-s
Copy link
Contributor

sunil-s commented Feb 19, 2022

Thanks for your question, @mhelabd
Adding on @Emerald01 's response
For running the simple-wood-and-stone environment with WarpDrive, you would first need to create a CUDA version of the environment. To get started, please see our tutorial: https://github.com/salesforce/ai-economist/blob/master/tutorials/multi_agent_gpu_training_with_warp_drive.ipynb. That shows how to build and train your environment end-to-end with WarpDrive, and also points out nuances like how to name your GPU kernels.

Also, in your current training script, you are pointing to "../foundation/scenarios/covid19/covid19_build.cu", which only contains the paths to the source files for the covid and economy environment, but not the simple_wood_and_stone.

In fact, we do not yet have a CUDA C version of the wood-and-stone environment that can run on a GPU with WarpDrive. If you would like to contribute to that environment, we would love to add it to the repository.
Happy to answer any other questions. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants