Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]Segmentaion fault on ARM64 upon startup #2906

Closed
laci200270 opened this issue May 18, 2019 · 9 comments

Comments

@laci200270
Copy link

commented May 18, 2019

Prerequisites

Describe the bug
The program gets a segfault if its not started with the --help or the -v switch
Backtrack for the bug
(gdb) bt full #0 0x0000ffff99939a90 in rspamd_rcl_jinja_handler (parser=<optimized out>, source=0xffff99ba7000 <error: Cannot access memory at address 0xffff99ba7000>, source_len=1331, destination=0xffffee3ffed8, dest_len=0xffffee3ffee0, user_data=0xffff996f1160) at /work/community/rspamd/src/rspamd-1.9.3/src/libserver/cfg_rcl.c:3594 tb = 0x0 cfg = 0xffff996f1160 L = 0x65e296fe2378 err_idx = 4 __func__ = "rspamd_rcl_jinja_handler" #1 0x0000ffff99a201f0 in ucl_parser_add_chunk_full (parse_type=UCL_PARSE_UCL, strat=UCL_DUPLICATE_APPEND, priority=0, len=1331, data=0xffff99ba7000 <error: Cannot access memory at address 0xffff99ba7000>, parser=0xaaaabc249ea0) at /work/community/rspamd/src/rspamd-1.9.3/contrib/libucl/ucl_parser.c:2884 ndata = 0x0 nlen = 0 nchain = <optimized out> chunk = 0xaaaabc248c40 special_handler = 0xffff968d42b8 chunk = <optimized out> special_handler = <optimized out> ndata = <optimized out> nlen = <optimized out> nchain = <optimized out> #2 ucl_parser_add_chunk_full (parser=0xaaaabc249ea0, data=<optimized out>, len=<optimized out>, priority=0, strat=UCL_DUPLICATE_APPEND, parse_type=UCL_PARSE_UCL) at /work/community/rspamd/src/rspamd-1.9.3/contrib/libucl/ucl_parser.c:2851 chunk = <optimized out> special_handler = <optimized out> ndata = <optimized out> nlen = <optimized out> nchain = <optimized out> #3 0x0000ffff9993d278 in rspamd_config_parse_ucl (cfg=cfg@entry=0xffff996f1160, filename=filename@entry=0xaaaaaeb3f57b "/etc/rspamd/rspamd.conf", vars=vars@entry=0x0, inc_trace=inc_trace@entry=0x0, trace_data=trace_data@entry=0x0, skip_jinja=0, err=0xaaaabc2494a0, err@entry=0xffffee401080) at /work/community/rspamd/src/rspamd-1.9.3/src/libserver/cfg_rcl.c:3763 st = {st_dev = 64768, st_ino = 2382160, st_mode = 33188, st_nlink = 1, st_uid = 0, st_gid = 0, st_rdev = 0, __pad = 0, st_size = 1331, st_blksize = 4096, __pad2 = 0, st_blocks = 8, st_atim = {tv_sec = 1558177284, tv_nsec = 0}, st_mtim = {tv_sec = 1558177284, tv_nsec = 0}, st_ctim = {tv_sec = 1558178843, tv_nsec = 787569357}, __unused = {0, 0}} fd = <optimized out> parser = 0xaaaabc249ea0 keypair_path = "/etc/rspamd/rspamd.conf.key\000/usr/local/lib/lua/5.1/?.so;/usr/lib/lua/5.1/?.so;/usr/local/lib/lua/5.1/loadall.so\000it.lua;./?.lua;/usr/share/luajit-2.1.0-beta3/?.lua;/usr/local/share/lua/5.1/?.lua;/usr/l"... decrypt_keypair = 0x0 data = 0xffff99ba7000 <error: Cannot access memory at address 0xffff99ba7000> __func__ = "rspamd_config_parse_ucl" #4 0x0000ffff99940024 in rspamd_config_read (cfg=0xffff996f1160, filename=0xaaaaaeb3f57b "/etc/rspamd/rspamd.conf", logger_fin=0xaaaaaeb360c0 <config_logger>, logger_ud=0xffff99bb1060, vars=0x0, skip_jinja=0, lua_env=0x0) at /work/community/rspamd/src/rspamd-1.9.3/src/libserver/cfg_rcl.c:3802 err = 0x0 top = <optimized out> logger_section = <optimized out> logger_obj = <optimized out> __func__ = "rspamd_config_read" #5 0x0000aaaaaeb35ff4 in load_rspamd_config (rspamd_main=0xffff99bb1060, cfg=0xffff996f1160, init_modules=1, opts=(RSPAMD_CONFIG_INIT_URL | RSPAMD_CONFIG_INIT_LIBS | RSPAMD_CONFIG_INIT_SYMCACHE | RSPAMD_CONFIG_INIT_VALIDATE | RSPAMD_CONFIG_INIT_PRELOAD_MAPS), reload=0) at /work/community/rspamd/src/rspamd-1.9.3/src/rspamd.c:932 __func__ = "load_rspamd_config" #6 0x0000aaaaaeb293f8 in main (argc=<optimized out>, argv=<optimized out>, env=<optimized out>) at /work/community/rspamd/src/rspamd-1.9.3/src/rspamd.c:1385 i = <optimized out> res = 0 signals = {__sa_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__bits = {0 <repeats 16 times>}}, sa_flags = 0, sa_restorer = 0x0} sigpipe_act = {__sa_handler = {sa_handler = 0xffff99bb4000, sa_sigaction = 0xffff99bb4000}, sa_mask = {__bits = {3, 281474678920872, 187650052099408, 288, 0, 281473260933120, 0, 281473260929024, 281473260929024, 128, 281473260933120, 5, 281473260924476, 0, 0, 281474678920800}}, sa_flags = -1716353352, sa_restorer = 0xffffee4016c8} pworker = <optimized out> type = 1 control_addr = 0x0 ev_base = <optimized out> term_ev = {ev_evcallback = {evcb_active_next = {tqe_next = 0xffff99683000, tqe_prev = 0xffffee4012c0}, evcb_flags = 2508, evcb_pri = 179 '\263', evcb_closure = 153 '\231', evcb_cb_union = {evcb_callback = 0xffff98d36400, evcb_selfcb = 0xffff98d36400, evcb_evfinalize = 0xffff98d36400, evcb_cbfinalize = 0xffff98d36400}, evcb_arg = 0x28}, ev_timeout_pos = {ev_next_with_common_timeout = {tqe_next = 0xfffffffffffffc00, tqe_prev = 0xffff99682000}, min_heap_idx = -1024}, ev_fd = -297790736, ev_base = 0xffff99b3042c <malloc+600>, ev_ = {ev_io = {ev_io_next = {le_next = 0xffff99571a40, le_prev = 0x8}, ev_timeout = {tv_sec = -256, tv_usec = 281473260920832}}, ev_signal = {ev_signal_next = { le_next = 0xffff99571a40, le_prev = 0x8}, ev_ncalls = -256, ev_pncalls = 0xffff99bb2000}}, ev_events = 24, ev_res = 0, ev_timeout = {tv_sec = 281473260923728, tv_usec = 281473260933120}} int_ev = {ev_evcallback = {evcb_active_next = {tqe_next = 0x318, tqe_prev = 0x21}, evcb_flags = 10288, evcb_pri = 187 '\273', evcb_closure = 153 '\231', evcb_cb_union = {evcb_callback = 0x308, evcb_selfcb = 0x308, evcb_evfinalize = 0x308, evcb_cbfinalize = 0x308}, evcb_arg = 0xffff99bb5000}, ev_timeout_pos = {ev_next_with_common_timeout = { tqe_next = 0xffffee401340, tqe_prev = 0xffff99b300f4}, min_heap_idx = -297790656}, ev_fd = -1730556016, ev_base = 0xffff99bb2830, ev_ = {ev_io = {ev_io_next = {le_next = 0xffff99bb2000, le_prev = 0xffff99bb1720}, ev_timeout = {tv_sec = 448, tv_usec = 0}}, ev_signal = {ev_signal_next = {le_next = 0xffff99bb2000, le_prev = 0xffff99bb1720}, ev_ncalls = 448, ev_pncalls = 0x0}}, ev_events = 160, ev_res = 0, ev_timeout = {tv_sec = 28, tv_usec = 281473246154192}} cld_ev = {ev_evcallback = {evcb_active_next = {tqe_next = 0xffff99bb4000, tqe_prev = 0x21}, evcb_flags = 28, evcb_pri = 0 '\000', evcb_closure = 0 '\000', evcb_cb_union = {evcb_callback = 0xffffee4013d0, evcb_selfcb = 0xffffee4013d0, evcb_evfinalize = 0xffffee4013d0, evcb_cbfinalize = 0xffffee4013d0}, evcb_arg = 0xffff99b3042c <malloc+600>}, ev_timeout_pos = {ev_next_with_common_timeout = {tqe_next = 0xffffee4013e0, tqe_prev = 0xffff99b3042c <malloc+600>}, min_heap_idx = -297790496}, ev_fd = -297790480, ev_base = 0xffff99b3042c <malloc+600>, ev_ = {ev_io = {ev_io_next = {le_next = 0xffff98d36b40, le_prev = 0x0}, ev_timeout = {tv_sec = -1, tv_usec = 281473260920832}}, ev_signal = { ev_signal_next = {le_next = 0xffff98d36b40, le_prev = 0x0}, ev_ncalls = -1, ev_pncalls = 0xffff99bb2000}}, ev_events = 24, ev_res = 0, ev_timeout = {tv_sec = 281473260923104, tv_usec = 281473260933120}} hup_ev = {ev_evcallback = {evcb_active_next = {tqe_next = 0xa8, tqe_prev = 0xffff98d36b40}, evcb_flags = -15264, evcb_pri = 217 '\331', evcb_closure = 152 '\230', evcb_cb_union = {evcb_callback = 0x2, evcb_selfcb = 0x2, evcb_evfinalize = 0x2, evcb_cbfinalize = 0x2}, evcb_arg = 0xffff98d36b40}, ev_timeout_pos = {ev_next_with_common_timeout = { tqe_next = 0x0, tqe_prev = 0x0}, min_heap_idx = 0}, ev_fd = -297790432, ev_base = 0xffff995a8b18, ev_ = {ev_io = {ev_io_next = {le_next = 0xffff98d9c460, le_prev = 0xc8}, ev_timeout = {tv_sec = 0, tv_usec = 0}}, ev_signal = {ev_signal_next = {le_next = 0xffff98d9c460, le_prev = 0xc8}, ev_ncalls = 0, ev_pncalls = 0x0}}, ev_events = 27424, ev_res = -26413, ev_timeout = {tv_sec = 5, tv_usec = 281474678920304}} usr1_ev = {ev_evcallback = {evcb_active_next = {tqe_next = 0xffff995a8f80, tqe_prev = 0xffff99682000}, evcb_flags = -15264, evcb_pri = 217 '\331', evcb_closure = 152 '\230', evcb_cb_union = {evcb_callback = 0xffff98d36b40, evcb_selfcb = 0xffff98d36b40, evcb_evfinalize = 0xffff98d36b40, evcb_cbfinalize = 0xffff98d36b40}, evcb_arg = 0xffff99bb3fa8}, ev_timeout_pos = {ev_next_with_common_timeout = {tqe_next = 0x3, tqe_prev = 0xffff99bb459c}, min_heap_idx = 3}, ev_fd = 3, ev_base = 0x1c, ev_ = {ev_io = {ev_io_next = {le_next = 0xffffee4014e0, le_prev = 0xffff998e530c <rspamd_logger_add_debug_module+216>}, ev_timeout = {tv_sec = 281474678920416, tv_usec = 281473260628100}}, ev_signal = { --Type <RET> for more, q to quit, c to continue without paging-- ev_signal_next = {le_next = 0xffffee4014e0, le_prev = 0xffff998e530c <rspamd_logger_add_debug_module+216>}, ev_ncalls = 5344, ev_pncalls = 0xffff99b6a884}}, ev_events = 0, ev_res = 0, ev_timeout = {tv_sec = 281473260929024, tv_usec = 281473260916992}} control_ev = {ev_evcallback = {evcb_active_next = {tqe_next = 0x0, tqe_prev = 0x3}, evcb_flags = 17820, evcb_pri = 187 '\273', evcb_closure = 153 '\231', evcb_cb_union = {evcb_callback = 0x3, evcb_selfcb = 0x3, evcb_evfinalize = 0x3, evcb_cbfinalize = 0x3}, evcb_arg = 0x1c}, ev_timeout_pos = {ev_next_with_common_timeout = {tqe_next = 0xaaaaaeb53c50, tqe_prev = 0x99b6a878}, min_heap_idx = -1363854256}, ev_fd = -297789888, ev_base = 0xffff99b6b498, ev_ = {ev_io = {ev_io_next = {le_next = 0xffff99bb4000, le_prev = 0x3}, ev_timeout = {tv_sec = 281474678920872, tv_usec = 187650052099408}}, ev_signal = {ev_signal_next = {le_next = 0xffff99bb4000, le_prev = 0x3}, ev_ncalls = 5800, ev_pncalls = 0xaaaaaeb28d50 <main>}}, ev_events = 288, ev_res = 0, ev_timeout = {tv_sec = 0, tv_usec = 281473260933120}} term_tv = {tv_sec = 281473260924476, tv_usec = 281473255481344} rspamd_main = <optimized out> skip_pid = 0 valgrind_mode = 0 __func__ = "main"

Steps to Reproduce

  1. Start rspamd
  2. Segfault

Expected behavior
Normal startup sequence

Versions

Rspamd daemon version 1.9.3

Additional Information

It worked around in december now upgraded and crashes.
Might be releated to commit 812dfbb
CPU Arch is ARM64, runs on Scaleway

@laci200270 laci200270 added the bug label May 18, 2019
@vstakhov

This comment has been minimized.

Copy link
Member

commented May 18, 2019

I don't have any arm64 hardware to test. So let's say it's unsupported.

@laci200270

This comment has been minimized.

Copy link
Author

commented May 19, 2019

If I give you access to my server, would you able to test it?

@vielmetti

This comment has been minimized.

Copy link

commented May 24, 2019

It's possible to build arm64 codes natively on CI with Drone Cloud - see https://cloud.drone.io - that should provide a suitable test environment with a modest amount of setup effort.

Also, there are some known issues to be aware of with Lua and LuaJIT on arm64, notable the use of a "lightuserdata" to store pointers. The Intel world has a 47-bit data type here, and arm64 has a 48-bit data type. There is way too much to wade through at LuaJIT/LuaJIT#49 but given that the relevant code change has Lua in it this is the first that comes to mind.

@vstakhov

This comment has been minimized.

Copy link
Member

commented May 24, 2019

That's exactly what I'm working on.

@vstakhov

This comment has been minimized.

Copy link
Member

commented May 24, 2019

The easiest workaround looks like disabling luajit on arm64 in fact.

vstakhov added a commit that referenced this issue May 24, 2019
LuaJIT limits lightuserdata usage to 47 bits. On Arm64, this leads to
break of the C <-> Lua interoperability using this type.

This rework has changed traceback function behaviour from lightuserdata
opaque pointer (GString * in particular) to luaL_Buffer.

Issue: #2906
vstakhov added a commit that referenced this issue May 24, 2019
Issue: #2906
@a16bitsysop

This comment has been minimized.

Copy link

commented May 24, 2019

I have it running on an arm64 platform with libluajit-5.1 just upgraded to 1.9.4 with no changes to configs just checked out 1.9.4 and ran cmake in build folder. Have been running it for about a year on arm64 rk3328, if you only have 2gb ram clamav will use pretty much all the ram

@a16bitsysop

This comment has been minimized.

Copy link

commented May 25, 2019

Do you have worker-normal disabled as per self scan setup? I had lots of problems until I enabled it and it didn't use any more noticeable resources. I setup worker-controller worker-normal and worker-proxy in override.d not local.d so it uses non overridden config from main config files

vstakhov added a commit that referenced this issue Jun 3, 2019
Issue: #2906
@TCB13

This comment has been minimized.

Copy link

commented Jul 1, 2019

I was hitting segfault as well but apparently I wasn't compiling it properly. Installing the dependencies in this specific order and versions (to avoid issues) works:

apt install devscripts fakeroot debhelper dh-systemd libjemalloc-dev libunwind-dev cmake ragel libevent-dev lua5.1 liblua5.1-dev libsqlite3-0=3.16.2-5+deb9u1 libsqlite3-dev sqlite3 libmagic1=1:5.30-1+deb9u2 libmagic-dev libfann-dev libfann2 libluajit-5.1-common=2.1.0~beta3+dfsg-5.1~bpo9+1 luajit libluajit-5.1

Then

git clone --recursive https://github.com/vstakhov/rspamd.git
cd rspamd
git checkout tags/1.9.4
mkdir rspamd.build
cd rspamd.build
cmake ../rspamd
make dist
tar xvf rspamd-1.9.4.tar.xz
cd rspamd-1.9.4
debuild -uc -us
cd ..
dpkg -i *.deb

And it runs:

service rspamd status                                                                                                                           ● rspamd.service - rapid spam filtering system
   Loaded: loaded (/lib/systemd/system/rspamd.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2019-07-01 16:03:07 UTC; 6s ago                                                                                                                            Docs: https://rspamd.com/doc/
 Main PID: 14799 (rspamd)
    Tasks: 7 (limit: 4915)
   CGroup: /system.slice/rspamd.service

Also marked this package on hold, so apt wont ever replace it with: apt-mark hold rspamd

System: Debian GNU/Linux 9 (stretch) 4.4.179-rk3399
Related to: #2953

c-rosenberg added a commit to HeinleinSupport/rspamd that referenced this issue Jul 17, 2019
@stale

This comment has been minimized.

Copy link

commented Aug 30, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Aug 30, 2019
@stale stale bot closed this Sep 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.