Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External eimp process has terminated unexpectedly #3150

Closed
protist opened this issue Jan 25, 2020 · 8 comments
Closed

External eimp process has terminated unexpectedly #3150

protist opened this issue Jan 25, 2020 · 8 comments

Comments

@protist
Copy link

protist commented Jan 25, 2020

Environment

  • ejabberd version: 20.01
  • Erlang version: erl +V -> Erlang (SMP,ASYNC_THREADS,HIPE) (BEAM) emulator version 10.6
  • OS: Linux (ArchARM)
  • Installed from: distro package

Configuration (only if needed): grep -Ev '^$|^\s*#' ejabberd.yml

hosts:
  - "foo.dynu.com"
loglevel: info
certfiles:
  - /etc/ejabberd/ejabberd.pem
listen:
  -
    port: 5222
    ip: "::"
    module: ejabberd_c2s
    max_stanza_size: 262144
    shaper: c2s_shaper
    access: c2s
    starttls_required: true
  -
    port: 5269
    ip: "::"
    module: ejabberd_s2s_in
    max_stanza_size: 524288
  -
    port: 5443
    ip: "::"
    module: ejabberd_http
    tls: true
    request_handlers:
      /admin: ejabberd_web_admin
      /api: mod_http_api
      /bosh: mod_bosh
      /captcha: ejabberd_captcha
      /upload: mod_http_upload
      /ws: ejabberd_http_ws
  -
    port: 5280
    ip: "::"
    module: ejabberd_http
    request_handlers:
      /admin: ejabberd_web_admin
      /.well-known/acme-challenge: ejabberd_acme
  -
    port: 1883
    ip: "::"
    module: mod_mqtt
    backlog: 1000
s2s_use_starttls: optional
acl:
  local:
    user_regexp: ""
  loopback:
    ip:
      - 127.0.0.0/8
      - ::1/128
access_rules:
  local:
    allow: local
  c2s:
    deny: blocked
    allow: all
  announce:
    allow: admin
  configure:
    allow: admin
  muc_create:
    allow: local
  pubsub_createnode:
    allow: local
  trusted_network:
    allow: loopback
api_permissions:
  "console commands":
    from:
      - ejabberd_ctl
    who: all
    what: "*"
  "admin access":
    who:
      access:
        allow:
          acl: loopback
          acl: admin
      oauth:
        scope: "ejabberd:admin"
        access:
          allow:
            acl: loopback
            acl: admin
    what:
      - "*"
      - "!stop"
      - "!start"
  "public commands":
    who:
      ip: 127.0.0.1/8
    what:
      - status
      - connected_users_number
shaper:
  normal: 1000
  fast: 50000
shaper_rules:
  max_user_sessions: 10
  max_user_offline_messages:
    5000: admin
    100: all
  c2s_shaper:
    none: admin
    normal: all
  s2s_shaper: fast
modules:
  mod_adhoc: {}
  mod_admin_extra: {}
  mod_announce:
    access: announce
  mod_avatar: {}
  mod_blocking: {}
  mod_bosh: {}
  mod_caps: {}
  mod_carboncopy: {}
  mod_client_state: {}
  mod_configure: {}
  mod_disco: {}
  mod_fail2ban: {}
  mod_http_api: {}
  mod_http_upload:
    put_url: https://@HOST@:5443/upload
  mod_last: {}
  mod_mam:
    assume_mam_usage: true
    default: always
  mod_mqtt: {}
  mod_muc:
    access:
      - allow
    access_admin:
      - allow: admin
    access_create: muc_create
    access_persistent: muc_create
    access_mam:
      - allow
    default_room_options:
      mam: true
  mod_muc_admin: {}
  mod_offline:
    access_max_user_messages: max_user_offline_messages
  mod_ping: {}
  mod_privacy: {}
  mod_private: {}
  mod_proxy65:
    access: local
    max_connections: 5
  mod_pubsub:
    access_createnode: pubsub_createnode
    plugins:
      - flat
      - pep
    force_node_config:
      storage:bookmarks:
        access_model: whitelist
  mod_push: {}
  mod_push_keepalive: {}
  mod_register:
    ip_access: trusted_network
  mod_roster:
    versioning: true
  mod_s2s_dialback: {}
  mod_shared_roster: {}
  mod_stream_mgmt:
    resend_on_timeout: if_offline
  mod_vcard: {}
  mod_vcard_xupdate: {}
  mod_version:
    show_os: false

Errors from /var/log/ejabberd/error.log

2020-01-25 13:23:10.052145+11:00 [error] <0.180.0> External eimp process (pid=3039) has terminated unexpectedly, restarting in a few seconds
2020-01-25 13:23:10.053712+11:00 [error] <0.184.0> External eimp process (pid=3041) has terminated unexpectedly, restarting in a few seconds
2020-01-25 13:23:10.055076+11:00 [error] <0.186.0> External eimp process (pid=3042) has terminated unexpectedly, restarting in a few seconds
2020-01-25 13:23:10.057256+11:00 [error] <0.182.0> External eimp process (pid=3040) has terminated unexpectedly, restarting in a few seconds
2020-01-25 13:23:10.067424+11:00 [error] <0.477.0>@gen_server:error_info/7:889 ** Generic server memsup terminating 
** Last message in was {'EXIT',<0.478.0>,{port_died,normal}}
** When Server state == [{data,[{"Timeout",60000}]},
                         {items,{"Memory Usage",
                                 [{"Allocated",3307241472},
                                  {"Total",4104364032}]}},
                         {items,{"Worst Memory User",
                                 [{"Pid",<0.49.0>},{"Memory",27622}]}}]
** Reason for termination ==
** {port_died,normal}

2020-01-25 13:23:10.069892+11:00 [error] <0.477.0>@proc_lib:crash_report/4:508 CRASH REPORT:
  crasher:
    initial call: memsup:init/1
    pid: <0.477.0>
    registered_name: memsup
    exception exit: {port_died,normal}
      in function  gen_server:handle_common_reply/8 (gen_server.erl, line 751)
    ancestors: [os_mon_sup,<0.475.0>]
    message_queue_len: 0
    messages: []
    links: [<0.476.0>]
    dictionary: [{system_memory_high_watermark,set}]
    trap_exit: true
    status: running
    heap_size: 2586
    stack_size: 27
    reductions: 41064
  neighbours:

2020-01-25 13:23:10.082059+11:00 [error] <0.476.0>@supervisor:do_restart/3:701 SUPERVISOR REPORT:
    supervisor: {local,os_mon_sup}
    errorContext: child_terminated
    reason: {port_died,normal}
    offender: [{pid,<0.477.0>},
               {id,memsup},
               {mfargs,{memsup,start_link,[]}},
               {restart_type,permanent},
               {shutdown,2000},
               {child_type,worker}]
2020-01-25 13:23:12.145859+11:00 [error] <0.86.0>@supervisor:do_terminate/2:838 SUPERVISOR REPORT:
    supervisor: {local,sasl_safe_sup}
    errorContext: shutdown_error
    reason: killed
    offender: [{pid,<0.87.0>},
               {id,alarm_handler},
               {mfargs,{alarm_handler,start_link,[]}},
               {restart_type,permanent},
               {shutdown,2000},
               {child_type,worker}]

Bug description

I can't get ejabberd to start. I previously had it working on my server, but after moving from Debian to Arch, it no longer starts. This also involved upgrading versions, and moving from a self-signed certificate to Let's Encrypt.

Troubleshooting

I found several other issues here mentioning the eimp errors.

I checked for missing libraries, but all appeared to be present.

$ ldd /usr/lib/eimp-1.0.13/priv/bin/eimp
	linux-vdso.so.1 (0xbeedf000)
	libwebp.so.7 => /usr/lib/libwebp.so.7 (0xb6f00000)
	libpng16.so.16 => /usr/lib/libpng16.so.16 (0xb6ec3000)
	libjpeg.so.8 => /usr/lib/libjpeg.so.8 (0xb6e77000)
	libgd.so.3 => /usr/lib/libgd.so.3 (0xb6e10000)
	libpthread.so.0 => /usr/lib/libpthread.so.0 (0xb6de5000)
	libc.so.6 => /usr/lib/libc.so.6 (0xb6c96000)
	/lib/ld-linux-armhf.so.3 => /usr/lib/ld-linux-armhf.so.3 (0xb6f90000)
	libm.so.6 => /usr/lib/libm.so.6 (0xb6c27000)
	libz.so.1 => /usr/lib/libz.so.1 (0xb6c02000)
	libfontconfig.so.1 => /usr/lib/libfontconfig.so.1 (0xb6bb8000)
	libfreetype.so.6 => /usr/lib/libfreetype.so.6 (0xb6b03000)
	libXpm.so.4 => /usr/lib/libXpm.so.4 (0xb6ae4000)
	libX11.so.6 => /usr/lib/libX11.so.6 (0xb69c3000)
	libtiff.so.5 => /usr/lib/libtiff.so.5 (0xb693d000)
	libexpat.so.1 => /usr/lib/libexpat.so.1 (0xb68f9000)
	libbz2.so.1.0 => /usr/lib/libbz2.so.1.0 (0xb68d9000)
	libharfbuzz.so.0 => /usr/lib/libharfbuzz.so.0 (0xb67dd000)
	libxcb.so.1 => /usr/lib/libxcb.so.1 (0xb67ad000)
	libdl.so.2 => /usr/lib/libdl.so.2 (0xb679a000)
	libzstd.so.1 => /usr/lib/libzstd.so.1 (0xb66fa000)
	liblzma.so.5 => /usr/lib/liblzma.so.5 (0xb66c9000)
	libglib-2.0.so.0 => /usr/lib/libglib-2.0.so.0 (0xb65b6000)
	libgraphite2.so.3 => /usr/lib/libgraphite2.so.3 (0xb658b000)
	libXau.so.6 => /usr/lib/libXau.so.6 (0xb6578000)
	libXdmcp.so.6 => /usr/lib/libXdmcp.so.6 (0xb6563000)
	libpcre.so.1 => /usr/lib/libpcre.so.1 (0xb64ea000)

I can confirm that the file is executable.

I tried running eimp directly from the binary, but this just stalled for several minutes until I hit ctrl+c.

$ /usr/lib/eimp-1.0.13/priv/bin/eimp

I tried replacing the eimp binary with one I manually built, but this didn't allow ejabberd to start, nor work when launched manually.

@licaon-kter
Copy link
Contributor

licaon-kter commented Jan 25, 2020

Does it work if you build ejabberd (not just eimp) separately (from git)?

@protist
Copy link
Author

protist commented Jan 26, 2020

Thanks @licaon-kter. I build the last stable version from source (but installed it using the package manager). i.e. I basically followed the commands here. Ejabberd still fails for me as above with eimp.

@prefiks
Copy link
Member

prefiks commented Feb 25, 2020

Hello,

You could try removing mod_avatar: {} from modules section. This will disable most of usages of eimp.

You could also try building "dummy" eimp binary if you replace vars.config file inisde eimp deps with this one:

{cflags, "-g -O2 -Wall    "}.
{ldflags, "  -lpthread"}.
{with_gcov, "false"}.
{gd, ""}.
{webp, ""}.
{jpeg, ""}.
{png, ""}.

This should build an eimp binary which will pretty much reject all image to convert, and i think this will make that binary workin in all situations

@protist
Copy link
Author

protist commented Feb 29, 2020

Thank you @prefiks . I tried commenting out mod_avatar: {} in /etc/ejabberd/ejabberd.yml, then restarted the service, but unfortunately the bug still persisted, with similar errors.

For the second suggestion, I presumed you were talking about deps/eimp/vars.config.in, so I modified that file as recommended before building the package (as an aside, it also updated from 20.01 to 20.02). ejabberd is still failing to start, but I'm getting some different errors in /var/log/ejabberd/error.log now:

2020-02-29 16:27:11.186947+11:00 [error] <0.115.0>@ejabberd_app:start_elixir_application/0:192 Elixir application not started.
2020-02-29 16:29:00.871562+11:00 [error] <0.474.0>@gen_server:error_info/7:889 ** Generic server memsup terminating 
** Last message in was {'EXIT',<0.475.0>,{port_died,normal}}
** When Server state == [{data,[{"Timeout",60000}]},
                         {items,{"Memory Usage",
                                 [{"Allocated",3287175168},
                                  {"Total",4104364032}]}},
                         {items,{"Worst Memory User",
                                 [{"Pid",<0.49.0>},{"Memory",44366}]}}]
** Reason for termination ==
** {port_died,normal}

2020-02-29 16:29:00.874063+11:00 [error] <0.474.0>@proc_lib:crash_report/4:508 CRASH REPORT:
  crasher:
    initial call: memsup:init/1
    pid: <0.474.0>
    registered_name: memsup
    exception exit: {port_died,normal}
      in function  gen_server:handle_common_reply/8 (gen_server.erl, line 751)
    ancestors: [os_mon_sup,<0.472.0>]
    message_queue_len: 0
    messages: []
    links: [<0.473.0>]
    dictionary: [{system_memory_high_watermark,set}]
    trap_exit: true
    status: running
    heap_size: 2586
    stack_size: 27
    reductions: 41087
  neighbours:

2020-02-29 16:29:00.885714+11:00 [error] <0.473.0>@supervisor:do_restart/3:701 SUPERVISOR REPORT:
    supervisor: {local,os_mon_sup}
    errorContext: child_terminated
    reason: {port_died,normal}
    offender: [{pid,<0.474.0>},
               {id,memsup},
               {mfargs,{memsup,start_link,[]}},
               {restart_type,permanent},
               {shutdown,2000},
               {child_type,worker}]
2020-02-29 16:29:10.228058+11:00 [error] <0.115.0>@ejabberd_app:start_elixir_application/0:192 Elixir application not started.

N.B. there is a long pause between the first two lines.

I'm not entirely sure what memsup means, but I have plenty of free RAM (2.7 G).

@prefiks
Copy link
Member

prefiks commented Mar 2, 2020

Hello,

Ok, so it looks like some issue with your system, looks like drivers aren't working in your erlang, processes that are spawned that way do die or aren't able to communicate with parent processs. I really don't know what can cause this, possibly rule in selinux or apparmor or something similar?

@protist
Copy link
Author

protist commented Mar 2, 2020

Thanks again @prefiks. Unfortunately I don't have either selinux nor apparmor installed. I'm running Arch ARM, and neither is enabled by default.

Just to confirm, I tried running sestatus, getenforce, aa-enabled, and apparmor_status, but none of those commands were found.

@protist
Copy link
Author

protist commented May 2, 2020

I thought I might do what limited troubleshooting I was capable of, although I'm on the edge of my understanding.

I tried downgrading to ejabberd 19.02, 18.12.1, 18.01, and 17.12. The first two failed with the same eimp issue. (In all cases I had commented out mod_avatar: {}.)

With 18.01 and 17.12 however, I see no errors in /var/log/ejabberd/error.log, although I'm not convinced they started correctly, because I can't connect to the server. sudo systemctl status ejabberd.service reports them as working though.

I also independently tried downgrading erlang-nox and erlang-unixodbc from 22.2.7 to 22.0, which resulted in the same eimp errors (and perhaps more). 21.3.2 resulted in some kind of crash dump.

EDIT: I've tried the new (current) erlang-nox and erlang-unixodbc 22.3.4, but it's still failing. The errors look slightly different again, but eimp is still in there.

2020-05-11 13:38:05.920654+10:00 [error] <0.116.0>@ejabberd_app:start_elixir_application/0:192 Elixir application not started.
2020-05-11 13:39:52.884663+10:00 [error] <0.181.0> External eimp process (pid=31228) has terminated unexpectedly, restarting in a few seconds
2020-05-11 13:39:52.885657+10:00 [error] <0.185.0> External eimp process (pid=31230) has terminated unexpectedly, restarting in a few seconds
2020-05-11 13:39:52.885780+10:00 [error] <0.187.0> External eimp process (pid=31231) has terminated unexpectedly, restarting in a few seconds
2020-05-11 13:39:52.886775+10:00 [error] <0.183.0> External eimp process (pid=31229) has terminated unexpectedly, restarting in a few seconds
2020-05-11 13:39:52.907226+10:00 [error] <0.478.0>@gen_server:error_info/7:889 ** Generic server memsup terminating 
** Last message in was {'EXIT',<0.479.0>,{port_died,normal}}
** When Server state == [{data,[{"Timeout",60000}]},
                         {items,{"Memory Usage",
                                 [{"Allocated",3016982528},
                                  {"Total",4024434688}]}},
                         {items,{"Worst Memory User",
                                 [{"Pid",<0.50.0>},{"Memory",44366}]}}]
** Reason for termination ==
** {port_died,normal}

2020-05-11 13:39:52.909240+10:00 [error] <0.478.0>@proc_lib:crash_report/4:508 CRASH REPORT:
  crasher:
    initial call: memsup:init/1
    pid: <0.478.0>
    registered_name: memsup
    exception exit: {port_died,normal}
      in function  gen_server:handle_common_reply/8 (gen_server.erl, line 751)
    ancestors: [os_mon_sup,<0.476.0>]
    message_queue_len: 0
    messages: []
    links: [<0.477.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 4185
    stack_size: 27
    reductions: 40841
  neighbours:

2020-05-11 13:39:52.925595+10:00 [error] <0.477.0>@supervisor:do_restart/3:701 SUPERVISOR REPORT:
    supervisor: {local,os_mon_sup}
    errorContext: child_terminated
    reason: {port_died,normal}
    offender: [{pid,<0.478.0>},
               {id,memsup},
               {mfargs,{memsup,start_link,[]}},
               {restart_type,permanent},
               {shutdown,2000},
               {child_type,worker}]

@protist
Copy link
Author

protist commented May 26, 2020

I'm going to close this issue. I appreciate the help, but I've been stuck for four months now, with no obvious fix. I've installed prosody instead (which works perfectly and was super-easy to setup, especially the certbot and Let's Encrypt integration). Unfortunately I'm not in a position to troubleshoot ejabberd any more. Thank you for the help rendered and I'm sorry we couldn't work this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants