Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault #155

Closed
Zhennan-Wu opened this issue Feb 6, 2023 · 9 comments
Closed

Segmentation fault #155

Zhennan-Wu opened this issue Feb 6, 2023 · 9 comments
Labels
bug Something isn't working wontfix This will not be worked on

Comments

@Zhennan-Wu
Copy link

Hi

I was trying to run the demo, the demo terminated with the following output.

episode ended with reward -24845.0
Segmentation fault

I am wondering how to solve the "Segmentation fault" part? I am using wsl2 with Ubuntu 20.04, is that related?

Thank you.

@ataitler
Copy link
Collaborator

ataitler commented Feb 6, 2023

Hi,
Can you please post here the exact code you are running (Wildfire?) and the stack trace if possible?
We have not tested with wsl so it might be related, we have tested with native linux, windows and Apple silicon without errors.

@mike-gimelfarb
Copy link
Collaborator

mike-gimelfarb commented Feb 6, 2023

Hi,
This is also an issue I have observed recently with wsl as well. I believe the error can be traced to the visualizer internals, e.g. matplotlib or pillow, so it is very likely the error is on their end. We will do more tests and let you know if we come up with a solution. In the meantime, a simple solution may be to run the GymExample without the visualization, which should (hopefully) not raise this error. If you still receive the error, can you please share the code and the trace with us as above?

@Zhennan-Wu
Copy link
Author

Hi,
Yes, running it without rendering gets rid of the error. Thank you for the help.

@mike-gimelfarb
Copy link
Collaborator

Thanks for your report. We will keep this issue open for now until we can find a better solution for wsl.

@Zhennan-Wu
Copy link
Author

I did have the stack trace available, just posting it here for reference.

(gdb) run /home/leo/ipc2023/demo.py
Starting program: /home/leo/miniconda3/envs/pyrddlgym/bin/python /home/leo/ipc2023/demo.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff486e700 (LWP 1591)]
[New Thread 0x7ffff206d700 (LWP 1592)]
[New Thread 0x7fffef86c700 (LWP 1593)]
[New Thread 0x7fffed06b700 (LWP 1594)]
[New Thread 0x7fffea86a700 (LWP 1595)]
[New Thread 0x7fffe8069700 (LWP 1596)]
[New Thread 0x7fffe5868700 (LWP 1597)]
[New Thread 0x7fffe3067700 (LWP 1598)]
[New Thread 0x7fffe2866700 (LWP 1599)]
[New Thread 0x7fffde065700 (LWP 1600)]
[New Thread 0x7fffdd864700 (LWP 1601)]
[Thread 0x7fffe2866700 (LWP 1599) exited]
[Thread 0x7fffdd864700 (LWP 1601) exited]
[Thread 0x7fffe3067700 (LWP 1598) exited]
[Thread 0x7fffe8069700 (LWP 1596) exited]
[Thread 0x7fffed06b700 (LWP 1594) exited]
[Thread 0x7fffea86a700 (LWP 1595) exited]
[Thread 0x7fffef86c700 (LWP 1593) exited]
[Thread 0x7ffff486e700 (LWP 1591) exited]
[Thread 0x7fffde065700 (LWP 1600) exited]
[Thread 0x7fffe5868700 (LWP 1597) exited]
[Thread 0x7ffff206d700 (LWP 1592) exited]
[Detaching after fork from child process 1602]
warning: Loadable section ".note.gnu.property" outside of ELF segments
[New Thread 0x7fffdd864700 (LWP 1604)]
[New Thread 0x7fffde065700 (LWP 1605)]
[New Thread 0x7fffe2866700 (LWP 1606)]
[New Thread 0x7fffe3067700 (LWP 1607)]
[New Thread 0x7fffd42e5700 (LWP 1608)]
[New Thread 0x7fffd3ae4700 (LWP 1609)]
[New Thread 0x7fffd32e3700 (LWP 1610)]
[New Thread 0x7fffd2ae2700 (LWP 1611)]
[New Thread 0x7fffd22e1700 (LWP 1612)]
[New Thread 0x7fffd1ae0700 (LWP 1613)]
[New Thread 0x7fffd12df700 (LWP 1614)]
[Thread 0x7fffd42e5700 (LWP 1608) exited]
[Thread 0x7fffd32e3700 (LWP 1610) exited]
[Thread 0x7fffd2ae2700 (LWP 1611) exited]
[Thread 0x7fffd1ae0700 (LWP 1613) exited]
[Thread 0x7fffd12df700 (LWP 1614) exited]
[Thread 0x7fffd22e1700 (LWP 1612) exited]
[Thread 0x7fffd3ae4700 (LWP 1609) exited]
[Thread 0x7fffe3067700 (LWP 1607) exited]
[Thread 0x7fffe2866700 (LWP 1606) exited]
[Thread 0x7fffde065700 (LWP 1605) exited]
[Thread 0x7fffdd864700 (LWP 1604) exited]
[Detaching after fork from child process 1615]
[New Thread 0x7fffd12df700 (LWP 1616)]
[Detaching after vfork from child process 1617]
[New Thread 0x7fffd1ae0700 (LWP 1619)]
[New Thread 0x7fffd22e1700 (LWP 1620)]
[New Thread 0x7fffd2ae2700 (LWP 1621)]
[New Thread 0x7fffd42e5700 (LWP 1622)]
[New Thread 0x7fffd3ae4700 (LWP 1623)]
[New Thread 0x7fffc2d32700 (LWP 1624)]
[New Thread 0x7fffc2531700 (LWP 1625)]
[New Thread 0x7fffc1d30700 (LWP 1626)]
[New Thread 0x7fffc152f700 (LWP 1627)]
[New Thread 0x7fffc0d2e700 (LWP 1628)]
[New Thread 0x7fffbbfff700 (LWP 1629)]
[New Thread 0x7fffbb7fe700 (LWP 1630)]
[New Thread 0x7fffbaffd700 (LWP 1631)]
[New Thread 0x7fffba7fc700 (LWP 1632)]
[New Thread 0x7fffb9ffb700 (LWP 1633)]
[New Thread 0x7fffb97fa700 (LWP 1634)]
episode ended with reward -24310.0
[Thread 0x7fffd12df700 (LWP 1616) exited]
NoneType: None
[Thread 0x7fffbb7fe700 (LWP 1630) exited]
[Thread 0x7fffba7fc700 (LWP 1632) exited]
[Thread 0x7fffbaffd700 (LWP 1631) exited]
[Thread 0x7fffb97fa700 (LWP 1634) exited]
[Thread 0x7fffb9ffb700 (LWP 1633) exited]
[Thread 0x7fffbbfff700 (LWP 1629) exited]
[Thread 0x7fffc0d2e700 (LWP 1628) exited]
[Thread 0x7fffc152f700 (LWP 1627) exited]
[Thread 0x7fffc1d30700 (LWP 1626) exited]
[Thread 0x7fffc2531700 (LWP 1625) exited]
[Thread 0x7fffc2d32700 (LWP 1624) exited]
--Type for more, q to quit, c to continue without paging--c

Thread 25 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd1ae0700 (LWP 1619)]
0x00007fffc5ac9efa in ?? () from /usr/lib/wsl/drivers/nvamig.inf_amd64_d36b3e14914fc88f/libnvwgf2umx.so
(gdb) backtrace
#0 0x00007fffc5ac9efa in ?? () from /usr/lib/wsl/drivers/nvamig.inf_amd64_d36b3e14914fc88f/libnvwgf2umx.so
#1 0x00007fffc5ac8c5e in ?? () from /usr/lib/wsl/drivers/nvamig.inf_amd64_d36b3e14914fc88f/libnvwgf2umx.so
#2 0x00007fffc5ac8bd6 in ?? () from /usr/lib/wsl/drivers/nvamig.inf_amd64_d36b3e14914fc88f/libnvwgf2umx.so
#3 0x00007ffff7fa3609 in start_thread (arg=) at pthread_create.c:477
#4 0x00007ffff7d6e133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

The demo code I ran is Wildfire, also posting it here.

from pyRDDLGym import RDDLEnv
from pyRDDLGym import ExampleManager
from pyRDDLGym.Policies.Agents import RandomAgent

ENV = 'Wildfire'

get the environment infos

EnvInfo = ExampleManager.GetEnvInfo(ENV)

set up the environment class, choose instance 0 because every example has at least one example instance

myEnv = RDDLEnv.RDDLEnv(domain=EnvInfo.get_domain(), instance=EnvInfo.get_instance(0))

set up the environment visualizer

myEnv.set_visualizer(EnvInfo.get_visualizer())

set up an example aget

agent = RandomAgent(action_space=myEnv.action_space, num_actions=myEnv.numConcurrentActions)

total_reward = 0
state = myEnv.reset()

for step in range(myEnv.horizon):
# myEnv.render()
action = agent.sample_action()
next_state, reward, done, info = myEnv.step(action)
total_reward += reward
state = next_state
if done:
break

print("episode ended with reward {}".format(total_reward))
myEnv.close()

@mike-gimelfarb
Copy link
Collaborator

Thanks for the stack trace. Are you by any chance using a dedicated gpu (e.g. nvidia) and/or any graphical extension for wsl?

This is almost surely a graphical/driver error with matplotlib/pil and wsl/wsl2.
We will look into this, and get back to you if we find a solution.

@Zhennan-Wu
Copy link
Author

I do have nvidia gpu installed but I am not sure about wsl. I tried to print out the driver info and the results are in the following

lspci
3ed2:00:00.0 3D controller: Microsoft Corporation Basic Render Driver
64a1:00:00.0 SCSI storage controller: Red Hat, Inc. Virtio filesystem (rev 01)
686a:00:00.0 SCSI storage controller: Red Hat, Inc. Virtio filesystem (rev 01)
9f09:00:00.0 SCSI storage controller: Red Hat, Inc. Virtio filesystem (rev 01)
a96b:00:00.0 System peripheral: Red Hat, Inc. Virtio file system (rev 01)
cae7:00:00.0 SCSI storage controller: Red Hat, Inc. Virtio console (rev 01)
e7c8:00:00.0 3D controller: Microsoft Corporation Basic Render Driver

@mike-gimelfarb
Copy link
Collaborator

Hi,
Thanks for providing this info.

We've investigated this and were able to reproduce the problem in WSL2 using hyper-v virtual machine.
It is likely related to a well-known problem between pygame and wsl2. You can try the solutions posted there:

PyGame issue 3260

You can try updating drivers as suggested there, or (ideally) use a different vm (e.g. vmware) if you need visualization on linux.
Since we cannot provide a definitive solution as of yet, we can leave this problem open for now.

@Zhennan-Wu
Copy link
Author

Thank you!

@mike-gimelfarb mike-gimelfarb added the bug Something isn't working label Feb 24, 2024
@mike-gimelfarb mike-gimelfarb pinned this issue Feb 24, 2024
@mike-gimelfarb mike-gimelfarb added the wontfix This will not be worked on label Feb 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants