-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault libxc.so #188
Comments
The following things can be helpful to identify the issue:
import pyscf
from pyscf.dft import rks
atom ='''
O 0.0000000000 -0.0000000000 0.1174000000
H -0.7570000000 -0.0000000000 -0.4696000000
H 0.7570000000 0.0000000000 -0.4696000000
'''
mol = pyscf.M(atom=atom, basis='def2-tzvpp')
mf = rks.RKS(mol, xc='LDA').density_fit()
e_dft = mf.kernel() # compute total energy
|
Thanks for the quick response.
/var/log/messages:
Thanks, |
@markperri Thanks for the info. I tried to create a similar environment, but I was not able to reproduce the issue. If possible, could you please share your docker file? And you probably have tried, but sometimes it is helpful to reinstall or create a new conda environment to avoid some possible conflict. |
@wxj6000 Here is a minimal dockerfile that gives the same error. I wonder if there's something about the way this system is setup. I'll see if I can find another CUDA application to test the installation in general tomorrow.
/var/log/messages:
|
@wxj6000 I ran a NAMD container from NVIDIA NGC and it runs fine on the GPU, so at least we know the docker / GPU setup is working. I'm not sure what else to test.
|
@markperri I tried the docker file you provided. The docker container works fine on my side. Let me check if there is a memory leak in the modules.
|
@wxj6000 I compiled gpu4pyscf from source and it still gives the same error. I'll contact the Jetstream2 staff and see if they have any ideas. Thanks, |
@markperri I went through the code related to libxc, and improved the interface related to memory allocation in libxc. But I am not sure if it is helpful on your side. |
Thanks for trying. I compiled from source with 8fdfaa8, but I get the same segfault:
|
@markperri Can you check if this PR resolves the issue please? #180 |
Thanks, is that the libxc_overhead branch? I installed it, but it doesn't seem to help:
|
Right. It is the And I registered an account on ChemCompute. But I don't have the access to JupyterHub as I don't have academic emails anymore. Is there any chance to have a development environment for debugging? |
Yes, this is without any gpu4pyscf installed. |
Oh and @wxj6000 you should have Jupyter Notebook access now. |
@markperri Thank you for giving me the permission for debugging. It seems that the unified memory, which is required by libxc.so, is disabled on this device. Please checkout the We can switch to libxc on CPU if the unified memory is not supported on the device. We will let you know the progress.
|
Oh I see. The way their hypervisor works with vGPUs doesn't allow unified memory. Looks like this package won't be compatible with their system then. |
@markperri The issue has been fixed in v1.0.1. Most tasks can be executed on ChemCompute now. But, due to the limited memory of a slice of GPU, it may raise an error of out of memory for some tasks such as Hessian calculations. Thank you for your feedback and your cooperation! |
Thanks! It works great now. I increased the instance size to use the entire GPU and the out of memory problems are fixed. But, I had to install it from github. There is something wrong with the package on pypi. It just downloads all versions and then gives up.
It continues to download older versions of gpu4pyscf and then errors out. |
@markperri |
Oh yes, sorry. Forgot that part! |
I installed pyscf into my environment in a Jupyter Notebook docker container running ubuntu 22.04 and python 3.11
pip install pyscf gpu4pyscf-cuda12x cutensor-cu12
When I test with the given example I get a segfault:
Segmentation fault
It looks like I have two libxc.so:
Do you have any thoughts on how to fix the segfault?
The text was updated successfully, but these errors were encountered: