Skip to content

FS#3177 - procd fails to start rpcd on 18.06.8 because of a libubox regression #8048

Closed
@openwrt-bot

Description

@openwrt-bot

bjonglez:

I've been trying to debug this regression in 18.06.8: in some circumstances, rpcd fails to start.

This is mostly visible as it breaks LuCI, see e.g. openwrt/luci#3773 or https://forum.openwrt.org/t/luci-error-after-upgrade-to-r10949-or-r10951-etc-config-luci-seems-to-be-corrupt/56880

To reproduce on 18.06.8:

  • remove the ''rpcd'' section in ''/etc/config/rpcd''
  • reboot (this is important)
  • result: ''rpcd'' is not started, and the following log message is printed in ''logread'':
Thu Feb 27 21:26:37 2020 daemon.info procd: Not starting instance rpcd::instance1, command not set

More details:

When this issue happens, it becomes impossible to start ''rpcd'' with ''procd'', even when adding back the ''rpcd'' section:

root@OpenWrt:~# PROCD_DEBUG=1 /etc/init.d/rpcd start { "name": "rpcd", "script": "\/etc\/init.d\/rpcd", "instances": { "instance1": { "command": [ "\/sbin\/rpcd" ] } }, "triggers": [ ], "data": { } } root@OpenWrt:~# ps | grep rpc 1614 root 1200 S grep rpc

root@OpenWrt:# uci add rpcd rpcd
cfg027c4e
root@OpenWrt:
# uci set rpcd.@rpcd[-1].timeout=30
root@OpenWrt:# uci commit
root@OpenWrt:
# PROCD_DEBUG=1 /etc/init.d/rpcd start
{ "name": "rpcd", "script": "/etc/init.d/rpcd", "instances": { "instance1": { "command": [ "/sbin/rpcd", "-t", "30" ] } }, "triggers": [ ], "data": { } }
root@OpenWrt:~# ps | grep rpc
1636 root 1200 S grep rpc

root@OpenWrt:# uci set rpcd.@rpcd[-1].socket=/var/run/ubus.sock
root@OpenWrt:
# uci commit
root@OpenWrt:# PROCD_DEBUG=1 /etc/init.d/rpcd start
{ "name": "rpcd", "script": "/etc/init.d/rpcd", "instances": { "instance1": { "command": [ "/sbin/rpcd", "-s", "/var/run/ubus.sock", "-t", "30" ] } }, "triggers": [ ], "data": { } }
root@OpenWrt:
# ps | grep rpc
1680 root 1200 S grep rpc

However, running ''rpcd'' manually works perfectly well (and fixes LuCI):

root@OpenWrt:~# rpcd

Workaround:

To workaround the issue, it is necessary to:

  • add a ''rpcd'' section with either a ''socket'' or ''timeout'' option
  • reboot

At this point, ''rpcd'' is started correctly, and everything works fine. It is even possible to delete the ''rpcd'' section and restart ''rpcd'', it will still start correctly.

Finding the root cause:

There are very few commits between 18.06.7 and 18.06.8. None of these commits is touching ''procd'' or ''rpcd''.

However, there has been a libubox fix in 82fbd85. This is currently the prime suspect: I will try to revert this commit, and also try with the further libubox fixes that have not yet been backported.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions