Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Neuron crashes when I create number of nodes (160 ~ 200). #1566

Open
itwasme-ulrich opened this issue Sep 26, 2023 · 6 comments
Open

Neuron crashes when I create number of nodes (160 ~ 200). #1566

itwasme-ulrich opened this issue Sep 26, 2023 · 6 comments
Assignees

Comments

@itwasme-ulrich
Copy link

  • System: Ubuntu 22.04
  • Architecture: amd64 x86_64 - Core I5 11300H - 8GB RAM

I've just run a few performance tests because the project I'm working on requires at least 1000 Nodes. However, I encountered some issues when creating the necessary number of nodes (just creating, without any groups or tags).

Firstly, I installed Neuron by downloading the neuron-2.5.2-linux-amd64-en.deb package from the official website. Then, I added Modbus-TCP Node Plugins using a Template via both the Post API and manually on the dashboard. When creating about 165~168 Nodes, Neuron started crashing, and I couldn't access Neuron again through any means, even after completely pkill and restarting it. If I want to use Neuron, I'm forced to delete all data and reinstall it. The last lines of the neuron.log don't seem to provide any useful information.:

2023-09-22 12:34:10 [ERROR] storage.c:128 load node164 setting fail
2023-09-22 12:34:10 [NOTICE] event_linux.c:256 add io, fd: 1007, epoll: 1006, ret: 0
2023-09-22 12:34:10 [NOTICE] event_linux.c:213 add timer, second: 30, millisecond: 0, timer: 1010 in epoll 1006, ret: 0
2023-09-22 12:34:10 [NOTICE] adapter.c:198 Success to create adapter: node164
2023-09-22 12:34:10 [NOTICE] storage.c:371 load adapter success type:1, name:node164 plugin:Modbus TCP state:1
2023-09-22 12:34:10 [NOTICE] manager.c:283 bind node node164 to pipe(681851086)
2023-09-22 12:34:10 [WARN] persist.c:1257 SQL SELECT setting FROM settings WHERE node_name=? with node165 fail: no more rows available
2023-09-22 12:34:10 [ERROR] storage.c:128 load node165 setting fail
2023-09-22 12:34:10 [NOTICE] event_linux.c:256 add io, fd: 1013, epoll: 1012, ret: 0
2023-09-22 12:34:10 [NOTICE] event_linux.c:213 add timer, second: 30, millisecond: 0, timer: 1016 in epoll 1012, ret: 0
2023-09-22 12:34:10 [NOTICE] adapter.c:198 Success to create adapter: node165
2023-09-22 12:34:10 [NOTICE] storage.c:371 load adapter success type:1, name:node165 plugin:Modbus TCP state:1
2023-09-22 12:34:10 [NOTICE] manager.c:283 bind node node165 to pipe(681851089)
2023-09-22 12:34:10 [WARN] persist.c:1257 SQL SELECT setting FROM settings WHERE node_name=? with node166 fail: no more rows available
2023-09-22 12:34:10 [ERROR] storage.c:128 load node166 setting fail
2023-09-22 12:34:10 [NOTICE] event_linux.c:256 add io, fd: 1018, epoll: 1019, ret: 0
2023-09-22 12:34:10 [NOTICE] event_linux.c:213 add timer, second: 30, millisecond: 0, timer: 1022 in epoll 1019, ret: 0
2023-09-22 12:34:10 [NOTICE] adapter.c:198 Success to create adapter: node166
2023-09-22 12:34:10 [NOTICE] storage.c:371 load adapter success type:1, name:node166 plugin:Modbus TCP state:1

I'm assuming the issue lies with the database, so I used 'Build From Source' from the GitHub repository. With Neuron version 2.6.0, I created nodes using the same method mentioned earlier, and the number of nodes stopped at 200, then it crashed completely. Here are the last lines of the neuron.log. :

2023-09-26 12:43:40:759 [WARN] persist.c:1294 SQL SELECT setting FROM settings WHERE node_name=? with node198 fail: no more rows available
2023-09-26 12:43:40:759 [WARN] storage.c:128 load node198 setting fail
2023-09-26 12:43:40:759 [NOTICE] event_linux.c:293 add io, fd: 1011, epoll: 1010, ret: 0
2023-09-26 12:43:40:759 [NOTICE] manager_internal.c:590 instantiate tmpl:tcp-template node:node198 success
2023-09-26 12:43:40:760 [NOTICE] manager.c:257 bind node node198 to pipe(106500235)
2023-09-26 12:43:40:760 [NOTICE] http.c:54 <0x6150000a0800> POST /api/v2/template/inst [200]
2023-09-26 12:43:40:828 [NOTICE] http.c:54 <0x6150000a0800> GET /api/v2/node?type=1&node=&plugin= [200]
2023-09-26 12:43:40:929 [NOTICE] http.c:54 <0x615000010700> GET /api/v2/plugin [200]
2023-09-26 12:43:40:931 [NOTICE] http.c:54 <0x6150000a0800> GET /api/v2/node?type=1 [200]
2023-09-26 12:43:40:946 [NOTICE] http.c:54 <0x615000010700> GET /api/v2/node/setting?node=node198 [200]
2023-09-26 12:43:40:946 [NOTICE] http.c:54 <0x6150000a0800> GET /api/v2/schema?schema_name=modbus-tcp [200]
2023-09-26 12:43:43:016 [NOTICE] http.c:54 <0x6150000b0700> GET /api/v2/plugin [200]
2023-09-26 12:43:43:017 [NOTICE] http.c:54 <0x6150000a0800> GET /api/v2/node?type=1&node=&plugin= [200]
2023-09-26 12:43:43:032 [NOTICE] http.c:54 <0x6150000a0800> GET /api/v2/node/state [200]
2023-09-26 12:43:43:733 [NOTICE] http.c:54 <0x6150000a0800> GET /api/v2/plugin [200]
2023-09-26 12:43:48:260 [NOTICE] http.c:54 <0x6150000a0800> GET /api/v2/template [200]
2023-09-26 12:43:51:840 [WARN] persist.c:1294 SQL SELECT setting FROM settings WHERE node_name=? with node199 fail: no more rows available
2023-09-26 12:43:51:840 [WARN] storage.c:128 load node199 setting fail
2023-09-26 12:43:51:840 [NOTICE] event_linux.c:293 add io, fd: 1016, epoll: 1015, ret: 0
2023-09-26 12:43:51:840 [NOTICE] manager_internal.c:590 instantiate tmpl:tcp-template node:node199 success
2023-09-26 12:43:51:842 [NOTICE] manager.c:257 bind node node199 to pipe(106500237)
2023-09-26 12:43:51:842 [NOTICE] http.c:54 <0x6150000a0800> POST /api/v2/template/inst [200]
2023-09-26 12:43:51:901 [NOTICE] http.c:54 <0x6150000a0800> GET /api/v2/node?type=1&node=&plugin= [200]
2023-09-26 12:43:51:969 [NOTICE] http.c:54 <0x6150000a0800> GET /api/v2/node?type=1 [200]
2023-09-26 12:43:51:972 [NOTICE] http.c:54 <0x6150000b0700> GET /api/v2/plugin [200]
2023-09-26 12:43:51:990 [NOTICE] http.c:54 <0x6150000a0800> GET /api/v2/schema?schema_name=modbus-tcp [200]
2023-09-26 12:43:51:991 [NOTICE] http.c:54 <0x6150000b0700> GET /api/v2/node/setting?node=node199 [200]
2023-09-26 12:43:54:058 [NOTICE] http.c:54 <0x6150000b0700> GET /api/v2/node?type=1&node=&plugin= [200]
2023-09-26 12:43:54:058 [NOTICE] http.c:54 <0x615000010700> GET /api/v2/plugin [200]
2023-09-26 12:43:54:072 [NOTICE] http.c:54 <0x6150000b0700> GET /api/v2/node/state [200]
2023-09-26 12:43:55:053 [NOTICE] http.c:54 <0x6150000b0700> GET /api/v2/plugin [200]
2023-09-26 12:43:57:818 [NOTICE] http.c:54 <0x6150000b0700> GET /api/v2/template [200]

I also encounter a similar issue when using it on Raspberry Pi 3B+.

On the Docs page, I can see performance tests with up to 1000 nodes, so I'm wondering if you have any solutions to this problem.

@fengzeroz fengzeroz added the bug Something isn't working label Sep 28, 2023
@fengzeroz
Copy link
Member

This issue may have been fixed in the 2.6 branch, and can be tested with the latest code from the 2.6 branch.
There may be a problem with nng pair1, which drops the message.

@baobuibk
Copy link

baobuibk commented Oct 8, 2023

Hi,

In version 2.6, is there a limit of 30 tag for modbus TCP?

@fengzeroz
Copy link
Member

Yes, the release package Modbus RTU/TCP will also be limited by license, you can compile it yourself using the source code.

@itwasme-ulrich
Copy link
Author

I 've just git clone -b v2.6, it seems like it hasn't been fixed yet. First, I noticed that the test results are quite similar to the compiled version 2.5.2 that I downloaded from neugates.io/downloads?os=Linux. This means that I can only create up to Node 167, and Node 168 will crash." From what I've observed, every time I run ./neuron, it reloads everything in the database - */neuron/build/persistence/sqlite.db . So, I created a prepopulated database similar to the ones Neuron generates during operation. This time, I stored around 170 Nodes, and here are the log results when running it.

2023-10-09 18:23:14:835 [NOTICE] adapter.c:252 Success to create adapter: node165, fd: 1002
2023-10-09 18:23:14:836 [NOTICE] storage.c:383 load adapter success type:1, name:node165 plugin:Modbus TCP state:1
2023-10-09 18:23:14:836 [NOTICE] manager.c:278 bind node node165 to src addr(49823)
2023-10-09 18:23:14:836 [WARN] persist.c:1294 SQL SELECT setting FROM settings WHERE node_name=? with node166 fail: no more rows available
2023-10-09 18:23:14:836 [WARN] storage.c:128 load node166 setting fail
2023-10-09 18:23:14:836 [NOTICE] event_linux.c:293 add io, fd: 1008, epoll: 1010, ret: 0
2023-10-09 18:23:14:836 [NOTICE] event_linux.c:245 add timer, second: 30, millisecond: 0, timer: 1012 in epoll 1010, ret: 0
2023-10-09 18:23:14:836 [NOTICE] adapter.c:252 Success to create adapter: node166, fd: 1008
2023-10-09 18:23:14:836 [NOTICE] storage.c:383 load adapter success type:1, name:node166 plugin:Modbus TCP state:1
2023-10-09 18:23:14:836 [NOTICE] manager.c:278 bind node node166 to src addr(51087)
2023-10-09 18:23:14:836 [WARN] persist.c:1294 SQL SELECT setting FROM settings WHERE node_name=? with node167 fail: no more rows available
2023-10-09 18:23:14:836 [WARN] storage.c:128 load node167 setting fail
2023-10-09 18:23:14:836 [NOTICE] event_linux.c:293 add io, fd: 1014, epoll: 1016, ret: 0
2023-10-09 18:23:14:836 [NOTICE] event_linux.c:245 add timer, second: 30, millisecond: 0, timer: 1018 in epoll 1016, ret: 0
2023-10-09 18:23:14:836 [NOTICE] adapter.c:252 Success to create adapter: node167, fd: 1014
2023-10-09 18:23:14:836 [NOTICE] storage.c:383 load adapter success type:1, name:node167 plugin:Modbus TCP state:1
2023-10-09 18:23:14:836 [NOTICE] manager.c:278 bind node node167 to src addr(57048)
2023-10-09 18:23:31:856 [NOTICE] rest.c:150 Uninitialize plugin: neuron-default-dashboard
2023-10-09 18:23:31:856 [NOTICE] event_linux.c:305 del io: 10 from epoll: 12
2023-10-09 18:23:31:856 [NOTICE] event_linux.c:257 del timer: 16 from epoll: 12
2023-10-09 18:23:31:856 [NOTICE] adapter.c:1228 Stop the adapter(default-dashboard)
2023-10-09 18:23:31:856 [NOTICE] adapter.c:1087 default-dashboard send uninit msg to manager failed
2023-10-09 18:23:31:856 [NOTICE] adapter.c:1179 adapter default-dashboard destroy
2023-10-09 18:23:31:856 [NOTICE] rest.c:127 Success to free plugin: neuron-default-dashboard
2023-10-09 18:23:31:857 [WARN] persist.c:1294 SQL SELECT setting FROM settings WHERE node_name=? with default-dashboard fail: no more rows available
2023-10-09 18:23:31:857 [WARN] storage.c:128 load default-dashboard setting fail
2023-10-09 18:23:31:857 [WARN] log.c:59 directory logs contains no log files of default-dashboard
2023-10-09 18:23:32:609 [WARN] main.c:54 recv sig: 2
2023-10-09 18:23:32:609 [NOTICE] event_linux.c:257 del timer: -1 from epoll: 8
2023-10-09 18:23:32:611 [WARN] main.c:54 recv sig: 6
2023-10-09 18:23:32:647 [WARN] event_linux.c:126 event loop exit, errno: Bad file descriptor(9), stop: 1
2023-10-09 18:23:32:648 [WARN] event_linux.c:126 event loop exit, errno: Bad file descriptor(9), stop: 1

2023-10-09 18:23:32:649 [WARN] event_linux.c:126 event loop exit, errno: Bad file descriptor(9), stop: 1
2023-10-09 18:23:32:649 [WARN] event_linux.c:126 event loop exit, errno: Bad file descriptor(9), stop: 1
2023-10-09 18:23:32:649 [WARN] event_linux.c:126 event loop exit, errno: Bad file descriptor(9), stop: 1
2023-10-09 18:23:32:649 [WARN] event_linux.c:126 event loop exit, errno: Bad file descriptor(9), stop: 1
2023-10-09 18:23:32:649 [WARN] event_linux.c:126 event loop exit, errno: Bad file descriptor(9), stop: 1
2023-10-09 18:23:32:649 [WARN] event_linux.c:126 event loop exit, errno: Bad file descriptor(9), stop: 1
2023-10-09 18:23:32:649 [WARN] event_linux.c:126 event loop exit, errno: Bad file descriptor(9), stop: 1
2023-10-09 18:23:32:652 [NOTICE] event_linux.c:305 del io: 264 from epoll: 266
2023-10-09 18:23:32:652 [NOTICE] event_linux.c:305 del io: 96 from epoll: 98
2023-10-09 18:23:32:652 [NOTICE] event_linux.c:305 del io: 204 from epoll: 206

As you can see, Neuron can only load up to Node 167, after which it cannot load anymore, resulting in a crash. In a manual approach, meaning adding Node 166 , Node 167 then Node 168 still leads to similar results. However, in these recent attempts, although Neuron still crashes, the issue of 200% CPU usage no longer occurs.

I also tried git clone -b main , and with this compilation, I could still only create up to 200 nodes. However, the difference is when I tried it with a pre-created database containing approximately 205 nodes. The log looks like this:

2023-10-09 18:48:12:778 [WARN] [neuron] persist.c:1296 SQL SELECT setting FROM settings WHERE node_name=? with Node200 fail: no more rows available
2023-10-09 18:48:12:778 [WARN] [neuron] storage.c:128 load Node200 setting fail
2023-10-09 18:48:12:778 [NOTICE] [neuron] event_linux.c:293 add io, fd: 1012, epoll: 1014, ret: 0
2023-10-09 18:48:12:778 [NOTICE] [neuron] storage.c:383 load adapter success type:1, name:Node200 plugin:Modbus TCP state:1
2023-10-09 18:48:12:778 [NOTICE] [neuron] manager.c:253 bind node Node200 to src addr(46732)
2023-10-09 18:48:12:778 [NOTICE] [Node201] modbus_tcp.c:110 Node201 init success
2023-10-09 18:48:12:778 [WARN] [neuron] persist.c:1296 SQL SELECT setting FROM settings WHERE node_name=? with Node201 fail: no more rows available
2023-10-09 18:48:12:778 [WARN] [neuron] storage.c:128 load Node201 setting fail
2023-10-09 18:48:12:778 [NOTICE] [neuron] event_linux.c:293 add io, fd: 1017, epoll: 1019, ret: 0
2023-10-09 18:48:12:779 [NOTICE] [neuron] storage.c:383 load adapter success type:1, name:Node201 plugin:Modbus TCP state:1
2023-10-09 18:48:12:779 [NOTICE] [neuron] manager.c:253 bind node Node201 to src addr(52978)
2023-10-09 18:48:12:779 [WARN] [neuron] event_linux.c:126 event loop exit, errno: Bad file descriptor(9), stop: 0
2023-10-09 18:48:12:779 [NOTICE] [node202] modbus_tcp.c:110 node202 init success
2023-10-09 18:48:12:779 [WARN] [neuron] persist.c:1296 SQL SELECT setting FROM settings WHERE node_name=? with node202 fail: no more rows available
2023-10-09 18:48:12:779 [WARN] [neuron] storage.c:128 load node202 setting fail
2023-10-09 18:48:12:779 [NOTICE] [neuron] event_linux.c:293 add io, fd: 1022, epoll: -1, ret: -1
2023-10-09 18:48:12:779 [WARN] [neuron] event_linux.c:126 event loop exit, errno: Bad file descriptor(9), stop: 0
2023-10-09 18:48:12:779 [NOTICE] [neuron] storage.c:383 load adapter success type:1, name:node202 plugin:Modbus TCP state:1
2023-10-09 18:48:12:779 [NOTICE] [neuron] manager.c:253 bind node node202 to src addr(58539)
2023-10-09 18:48:12:779 [NOTICE] [neuron] storage.c:383 load adapter success type:1, name:node203 plugin:Modbus TCP state:1
2023-10-09 18:48:12:779 [WARN] [neuron] event_linux.c:126 event loop exit, errno: Bad file descriptor(9), stop: 0
2023-10-09 18:48:12:779 [NOTICE] [neuron] event_linux.c:245 add timer, second: 0, millisecond: 10, timer: -1 in epoll 8, ret: -1
2023-10-09 18:48:12:779 [NOTICE] [neuron] manager.c:155 manager start

I thinks epoll = -1 means that Neuron has stopped functioning logically.

In additional, all of the above means that if I delete the database down to Node 166 (v2.6) or Node 199 (main), Neuron can still start up / work normally."

@fengzeroz
Copy link
Member

fengzeroz commented Oct 10, 2023

It is possible that there are not enough file descriptors, try to change the file descriptor limits. Use ulimit -n to view the current fd limit, /proc/$pid/fd directory to see the file descriptors used by processes.

@eeff
Copy link
Member

eeff commented Oct 10, 2023

It is possible that there are not enough file descriptors, try to change the file descriptor limits. Use ulimit -n to view the current fd limit, /proc/$pid/fd directory to see the file descriptors used by processes.

@itwasme-ulrich your problem is highly probable due to the file descriptor resource limit.

@fengzeroz fengzeroz removed the bug Something isn't working label Oct 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants