Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulation models compiled with -shared abort upon failure when loaded from Python on Linux #1623

Open
umarcor opened this issue Jan 27, 2021 · 6 comments

Comments

@umarcor
Copy link
Member

umarcor commented Jan 27, 2021

Description

This is a follow-up of #803 and ghdl/ghdl-cosim#15.

After the fixes in #803, all examples work on MSYS2 (MINGW64 with LLVM backend). Shared libraries produced by GHDL can be dynamically loaded from either C or Python, and ghdl_main returns cleanly regardless of simulation failures and standard version.

However, on Linux, Python examples "crash" with an Abortion. More precisely, Python examples crash if some Python calls another script which loads shared libs generated by GHDL from Python. Surprisingly, if a bash script calls the Python script directly, the abortion is not produced.

Expected behaviour

Examples on Linux should work as on MSYS2.

How to reproduce?

See subdir test-abort from branch test/abort of my fork of setup-ghdl-ci: https://github.com/umarcor/setup-ghdl-ci/tree/test/abort/test-abort

See logs: https://github.com/umarcor/setup-ghdl-ci/actions/runs/514033629. Compare the outputs of steps "Test Abort" and "Pytest", on Linux and Windows.

With pytest (step "Pytest"):

...
> [08] Build tb-fail.so

> [08] C load and run tb-fail.so
Call entry
Hello entry!
tb.vhd:15:5:@0ms:(report failure): Hello wrapping/exitcb [fail]!
ghdl:error: report failed
in process .tb(fail).P0
ghdl:error: simulation failed
Bye entry <1>!
Return from entry: 1
This is the exit handler.

> [08] Python load and run tb-fail.so
Hello entry!
tb.vhd:15:5:@0ms:(report failure): Hello wrapping/exitcb [fail]!
ghdl:error: report failed
in process .tb(fail).P0
SIGABRT caught 6!
Aborted (core dumped)
FAILED

Without pytest (step "Test Abort"):

> [08] Build tb-fail.so

> [08] C load and run tb-fail.so
Call entry
Hello entry!
tb.vhd:15:5:@0ms:(report failure): Hello wrapping/exitcb [fail]!
ghdl:error: report failed
in process .tb(fail).P0
ghdl:error: simulation failed
Bye entry <1>!
Return from entry: 1
This is the exit handler.

> [08] Python load and run tb-fail.so
PY RUN ENTER
PY RUN EXIT <1>
Hello entry!
tb.vhd:15:5:@0ms:(report failure): Hello wrapping/exitcb [fail]!
ghdl:error: report failed
in process .tb(fail).P0
ghdl:error: simulation failed
Bye entry <1>!
This is the exit handler.

Context

The MWE uses nightly packages.

I tested it with Python 3.8 and 3.9 on CI.

Additional context

This is a MWE of an issue found in ghdl-cosim. See the tests marked as XFAIL on jobs "🛳️ ghdl/cosim:matplotlib" and "🐧Ubuntu · nightly LLVM": https://github.com/ghdl/ghdl-cosim/actions/runs/513879907

When executed inside the container, test.py::TestExamples::test_vhpidirect_shared_py is successful. Surprisingly, in test.py::TestExamples::test_vhpidirect_shared_py_vunit the example works with VHDL 2008, but it produces an abortion with VHDL 1993.

When executed natively, test.py::TestExamples::test_vhpidirect_shared_py fails with an abortion. test.py::TestExamples::test_vhpidirect_shared_py_vunit produces the same result as inside the container (08 works, 93 aborts).

@umarcor
Copy link
Member Author

umarcor commented Jan 27, 2021

This is the backtrace obtained in gdb when running the crashing VUnit example in a local container:

Starting program: /usr/bin/python3 cosim.py
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
PY RUN ENTER
Hello entry!
               0 fs - default              -    INFO - Hello shared/py/vunit!
/usr/local/lib/python3.7/dist-packages/vunit/vhdl/core/src/stop_body_93-2002.vhd:10:5:@0ms:(report failure): Stopping simulation with status 0
/src/vhpidirect/shared/py/vunit/vunit_out/test_output/lib.tb_vunit.all_dcaca1bf596fbf17c268003bd00053a997affd85/ghdl/tb_vunit-tb:error: report failed
in process .tb_vunit(tb).main

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) backtrace
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f576745b535 in __GI_abort () at abort.c:79
#2  0x00007f5767011f59 in __gnat_last_chance_handler ()
   from /src/vhpidirect/shared/py/vunit/vunit_out/test_output/lib.tb_vunit.all_dcaca1bf596fbf17c268003bd00053a997affd85/ghdl/tb_vunit-tb
#3  0x00007f5767037c02 in grt.backtraces.put_err_backtrace ()
   from /src/vhpidirect/shared/py/vunit/vunit_out/test_output/lib.tb_vunit.all_dcaca1bf596fbf17c268003bd00053a997affd85/ghdl/tb_vunit-tb
#4  0x00007f5767038fe8 in grt.errors_exec.error_e_call_stack ()
   from /src/vhpidirect/shared/py/vunit/vunit_out/test_output/lib.tb_vunit.all_dcaca1bf596fbf17c268003bd00053a997affd85/ghdl/tb_vunit-tb
#5  0x00007f576700e6e7 in grt.lib.do_report ()
   from /src/vhpidirect/shared/py/vunit/vunit_out/test_output/lib.tb_vunit.all_dcaca1bf596fbf17c268003bd00053a997affd85/ghdl/tb_vunit-tb
#6  0x00007f576700eb1e in __ghdl_report ()
   from /src/vhpidirect/shared/py/vunit/vunit_out/test_output/lib.tb_vunit.all_dcaca1bf596fbf17c268003bd00053a997affd85/ghdl/tb_vunit-tb
#7  0x00007f5766fc5a51 in vunit_lib__stop_pkg__stop ()
    at /usr/local/lib/python3.7/dist-packages/vunit/vhdl/core/src/stop_body_93-2002.vhd:10
#8  0x00007f5766fc5f19 in vunit_lib__core_pkg__stop ()
    at /usr/local/lib/python3.7/dist-packages/vunit/vhdl/core/src/core_pkg.vhd:130
#9  0x00007f5766feb331 in vunit_lib__run_pkg__test_runner_cleanup ()
    at /usr/local/lib/python3.7/dist-packages/vunit/vhdl/run/src/run.vhd:133
#10 0x00007f5766fee62c in lib__tb_vunit__ARCH__tb__main__PROC ()

So, the abort comes from

abort ();
after an error in put_err_backtrace.

@tgingold
Copy link
Member

tgingold commented Jan 27, 2021 via email

@umarcor
Copy link
Member Author

umarcor commented Jan 27, 2021

For the tests that are failing in https://github.com/umarcor/setup-ghdl-ci/actions/runs/514033629, the "python program" is a single line in the run.sh file: https://github.com/umarcor/setup-ghdl-ci/blob/test/abort/test-abort/run.sh#L31

python3 -c 'from pyaux import run; run("./tb-'"${item}.${_ext}"'", 0, None)'

pyaux.py is in the same directory: https://github.com/umarcor/setup-ghdl-ci/blob/test/abort/test-abort/pyaux.py

Unfortunately, that fails on CI, but not in a container :(. Hence, I had to use the other example for getting the backtrace, the one using VUnit.


For the VUnit example, if you have Docker/Podman:

git clone https://github.com/ghdl/ghdl-cosim
cd ghdl-cosim

docker run --rm -itv $(pwd):/src -w /src/vhpidirect/shared/py/vunit/ ghdl/cosim:matplotlib bash

Then, inside the container, execute run.sh:

root@c41d79803f07:/src/vhpidirect/shared/py/vunit# ./run.sh
> [2008] VUnit compile

> [2008] VUnit run
Re-compile not needed

Running test: lib.tb_vunit.all
Running 1 tests
...

> [93] VUnit cosim
PY RUN ENTER
Hello entry!
               0 fs - default              -    INFO - Hello shared/py/vunit!
/usr/local/lib/python3.7/dist-packages/vunit/vhdl/core/src/stop_body_93-2002.vhd:10:5:@0ms:(report failure): Stopping simulation with status 0
/src/vhpidirect/shared/py/vunit/vunit_out/test_output/lib.tb_vunit.all_dcaca1bf596fbf17c268003bd00053a997affd85/ghdl/tb_vunit-tb:error: report failed
in process .tb_vunit(tb).main
SIGABRT caught 6!
Aborted

Now, let's rerun the last step only ([93] VUnit cosim) in gdb. We install gdb in the container, we set PYTHONPATH, then we start gdb python3:

apt update -qq && apt install -y gdb

root@c41d79803f07:/src/vhpidirect/shared/py/vunit# export PYTHONPATH=$(pwd)/..

root@c41d79803f07:/src/vhpidirect/shared/py/vunit# gdb python3
GNU gdb (Debian 8.2.1-2+b3) 8.2.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python3...(no debugging symbols found)...done.
(gdb) 

Inside gdb, we run cosim.py:

(gdb) run cosim.py 
Starting program: /usr/bin/python3 cosim.py
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
PY RUN ENTER
Hello entry!
               0 fs - default              -    INFO - Hello shared/py/vunit!
/usr/local/lib/python3.7/dist-packages/vunit/vhdl/core/src/stop_body_93-2002.vhd:10:5:@0ms:(report failure): Stopping simulation with status 0
/src/vhpidirect/shared/py/vunit/vunit_out/test_output/lib.tb_vunit.all_dcaca1bf596fbf17c268003bd00053a997affd85/ghdl/tb_vunit-tb:error: report failed
in process .tb_vunit(tb).main

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

There you are. where or backtrace will provide the output I showed in the previous comment.

Optionally, you might want to set break __gnat_last_chance_handler before run cosim.py. I found it to produce the same backtrace. According to https://ghdl.github.io/ghdl/development/Debugging.html#gnu-debugger-gdb, break __ghdl_fatal should be triggered, but it is not.

I think that something is crashing in put_err_backtrace, which is jumping to __gnat_last_chance_handler. So, in this case, the abort() might not need to be changed, but it should not be reached. However, as a general solution, maybe abort() should be exit(134).

If you don't have docker/podman, the only dependency of the vhpidirect/shared/py/vunit example from ghdl-cosim is having an up to date VUnit, which you can install through pip3 install vunit_hdl.

@tgingold
Copy link
Member

tgingold commented Jan 27, 2021 via email

@umarcor
Copy link
Member Author

umarcor commented Jan 27, 2021

Does executing run.sh from https://github.com/umarcor/setup-ghdl-ci/tree/test/abort/test-abort work on your Linux host?

That is successful on Windows and on local containers, but CI (Ubuntu) fails. Unfortunately, I don't have a Debian/Ubuntu host for trying. I will test on Fedora.

There should be no difference between what I do locally, inside the container or in CI. In all cases, exactly the same script/command is executed and the same sources are used. The difference is the "environment" only.

@umarcor
Copy link
Member Author

umarcor commented Jan 27, 2021

I run https://github.com/umarcor/setup-ghdl-ci/tree/test/abort/test-abort on 6 containers: https://github.com/umarcor/setup-ghdl-ci/actions/runs/516447375

  • Debian Bullseye, Fedora 32 and Fedora 33 fail with an abortion.
  • Debian Buster, Ubuntu 18 and Ubuntu 20 are successful.

It's exactly the same code/test, since it's defined in a matrix: https://github.com/umarcor/setup-ghdl-ci/blob/3d7bf50b54a182529f3598e22b2892da2238dc89/.github/workflows/test.yml#L11-L44

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants