Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suricata: regular lock ups in IPS mode #710

Closed
oparoz opened this issue Jan 26, 2016 · 14 comments
Closed

Suricata: regular lock ups in IPS mode #710

oparoz opened this issue Jan 26, 2016 · 14 comments

Comments

@oparoz
Copy link
Contributor

oparoz commented Jan 26, 2016

Env

  • 16.1-pre
  • LAN with multiple VLANs
  • Firewall rules to route via different WANs
  • IPS enabled on re0
  • Squid enabled. Both transparent mode and with user logins

Problem

Serious lockups which block the entire LAN. The only solutions is to have physical access and restart suricata.

Interface

re0@pci0:2:0:0: class=0x020000 card=0x012310ec chip=0x816810ec rev=0x0c hdr=0x00
    vendor     = 'Realtek Semiconductor Co., Ltd.'
    device     = 'RTL8111/8168B PCI Express Gigabit Ethernet controller'
    class      = network
    subclass   = ethernet
re0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=182088<VLAN_MTU,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE,NETMAP>

Logs

suricata.log

26/1/2016 -- 16:31:45 - <Notice> - This is Suricata version 3.0RC3 RELEASE
26/1/2016 -- 16:31:45 - <Warning> - [ERRCODE: SC_ERR_SYSCALL(50)] - Failure when trying to get MTU via ioctl for 're0+': Device not configured (6)
26/1/2016 -- 16:31:52 - <Warning> - [ERRCODE: SC_ERR_NO_RULES(42)] - No rules loaded from /usr/local/etc/suricata/opnsense.rules/rbn-malvertisers.rules
26/1/2016 -- 16:31:59 - <Notice> - all 2 packet processing threads, 4 management threads initialized, engine started.

procstat when "locked up"

  PID    TID COMM             TDNAME           CPU  PRI STATE   WCHAN
31782 100174 suricata         -                  2  120 sleep   nanslp
31782 100259 suricata         NetmapPktre01      3    8 sleep   select
31782 100260 suricata         NetmapPktre0+1     3  120 sleep   select
31782 100261 suricata         FlowManagerThre    3  120 sleep   uwait
31782 100262 suricata         FlowRecyclerThr    1  120 sleep   uwait
31782 100263 suricata         StatsWakeupThre    1  120 sleep   uwait
31782 100264 suricata         StatsMgmtThread    1  120 sleep   uwait
@oparoz oparoz changed the title Suricata IPS not properly Suricata IPS: Device not properly configured Jan 26, 2016
@fichtner
Copy link
Member

Please stop posting issues without proper descriptions. I'm going to delete this in 10 minutes if there is no hint of what this actually refers to.

@oparoz
Copy link
Contributor Author

oparoz commented Jan 26, 2016

It was a keystroke mistake which accidentally created the issue before it was ready...

@fichtner
Copy link
Member

At least SC_ERR_SYSCALL is harmless and already reported upstream. https://redmine.openinfosecfoundation.org/issues/1606

Did you indeed select no rules but still enabled IPS?

@oparoz
Copy link
Contributor Author

oparoz commented Jan 26, 2016

OK about SC_ERR_SYSCALL

I do have rules enabled, there just isn't one named rbn-malvertisers. There is a rbn ruleset in the list, but it's not enabled.

How do I trace the lockups? process jumps from 2% to 10% and there seems to be nothing in the logs and it's not related to a rule either.

That's the stats before the last restart

-------------------------------------------------------------------
Date: 1/26/2016 -- 16:28:52 (uptime: 0d, 00h 56m 43s)
-------------------------------------------------------------------
Counter                   | TM Name                   | Value
-------------------------------------------------------------------
capture.kernel_packets    | Total                     | 215375
capture.kernel_drops      | Total                     | 74
decoder.pkts              | Total                     | 215375
decoder.bytes             | Total                     | 100771964
decoder.invalid           | Total                     | 396
decoder.ipv4              | Total                     | 214502
decoder.ipv6              | Total                     | 456
decoder.ethernet          | Total                     | 215375
decoder.tcp               | Total                     | 208678
decoder.udp               | Total                     | 6150
decoder.icmpv4            | Total                     | 58
decoder.icmpv6            | Total                     | 21
decoder.vlan              | Total                     | 214468
decoder.teredo            | Total                     | 1
decoder.avg_pkt_size      | Total                     | 467
decoder.max_pkt_size      | Total                     | 2044
tcp.sessions              | Total                     | 3548
tcp.pseudo                | Total                     | 823
tcp.invalid_checksum      | Total                     | 9
tcp.syn                   | Total                     | 3695
tcp.synack                | Total                     | 3523
tcp.rst                   | Total                     | 721
tcp.stream_depth_reached  | Total                     | 3
tcp.reassembly_gap        | Total                     | 2
detect.alert              | Total                     | 4
flow_mgr.closed_pruned    | Total                     | 3386
flow_mgr.new_pruned       | Total                     | 914
flow_mgr.est_pruned       | Total                     | 589
flow.spare                | Total                     | 10000
tcp.memuse                | Total                     | 393216
tcp.reassembly_memuse     | Total                     | 12320544
http.memuse               | Total                     | 54670
flow.memuse               | Total                     | 6783232

@oparoz
Copy link
Contributor Author

oparoz commented Jan 26, 2016

procstat when "locked up" is not different than when it's running properly.

  PID    TID COMM             TDNAME           CPU  PRI STATE   WCHAN
31782 100174 suricata         -                  2  120 sleep   nanslp
31782 100259 suricata         NetmapPktre01      3    8 sleep   select
31782 100260 suricata         NetmapPktre0+1     3  120 sleep   select
31782 100261 suricata         FlowManagerThre    3  120 sleep   uwait
31782 100262 suricata         FlowRecyclerThr    1  120 sleep   uwait
31782 100263 suricata         StatsWakeupThre    1  120 sleep   uwait
31782 100264 suricata         StatsMgmtThread    1  120 sleep   uwait

@oparoz oparoz changed the title Suricata IPS: Device not properly configured Suricata: regular lock ups in IPS mode Jan 26, 2016
@oparoz
Copy link
Contributor Author

oparoz commented Jan 28, 2016

I'm still testing this. I've tried using polling, but apart from burning my CPUs, it didn't solve anything.
The last time it happened I had intr using more CPU than it should, but I don't know if it's because it was waiting on Suricata or blocking it.

@oparoz
Copy link
Contributor Author

oparoz commented Jan 29, 2016

I'm not experiencing the issue any more on 16.1 with Suricata 3.0 (release) compiled with clang37, so closing this.

@oparoz oparoz closed this as completed Jan 29, 2016
@fichtner
Copy link
Member

Good to know, thank you. 10.2 has clang 3.4 in base. I suspect any gcc would be fine as well?

@oparoz
Copy link
Contributor Author

oparoz commented Jan 29, 2016

Of course, as I wrote this, the problem happened again :D.

I've got a bit more info this time, but I don't think that's something you can fix as it's probably driver related. Any hints of a tunable which may help would be appreciated though. I've turned powerd off and I'll see if it changes anything.

So it's an interrupt which "locks up": irq259: re0

  PID    TID COMM             TDNAME           KSTACK
   12 100007 intr             swi3: vm
   12 100008 intr             swi4: clock      mi_switch+0xe1 ithread_loop+0x190 fork_exit+0x9a fork_trampoline+0xe
   12 100009 intr             swi4: clock
   12 100010 intr             swi4: clock
   12 100011 intr             swi4: clock
   12 100012 intr             swi1: netisr 0   mi_switch+0xe1 ithread_loop+0x190 fork_exit+0x9a fork_trampoline+0xe
   12 100025 intr             swi5: fast taskq
   12 100032 intr             swi6: task queue mi_switch+0xe1 ithread_loop+0x190 fork_exit+0x9a fork_trampoline+0xe
   12 100033 intr             swi6: Giant task mi_switch+0xe1 ithread_loop+0x190 fork_exit+0x9a fork_trampoline+0xe
   12 100034 intr             irq16: sdhci_pci mi_switch+0xe1 ithread_loop+0x190 fork_exit+0x9a fork_trampoline+0xe
   12 100035 intr             irq256: ahci0    mi_switch+0xe1 ithread_loop+0x190 fork_exit+0x9a fork_trampoline+0xe
   12 100036 intr             irq257: xhci0    mi_switch+0xe1 ithread_loop+0x190 fork_exit+0x9a fork_trampoline+0xe
   12 100041 intr             irq258: hdac0    mi_switch+0xe1 ithread_loop+0x190 fork_exit+0x9a fork_trampoline+0xe
   12 100042 intr             irq259: re0      mi_switch+0xe1 ithread_loop+0x190 fork_exit+0x9a fork_trampoline+0xe
   12 100043 intr             irq260: re1      mi_switch+0xe1 ithread_loop+0x190 fork_exit+0x9a fork_trampoline+0xe
   12 100046 intr             swi0: uart uart
   12 100049 intr             swi1: pf send
   12 100051 intr             swi1: pfsync     mi_switch+0xe1 ithread_loop+0x190 fork_exit+0x9a fork_trampoline+0xe
interrupt                          total       rate
irq16: sdhci_pci0                      1          0
cpu0:timer                        975907       1136
irq256: ahci0                      13009         15
irq257: xhci0                       4619          5
irq258: hdac0                          3          0
irq259: re0                      7172214       8349
irq260: re1                       383069        445
cpu3:timer                        287266        334
cpu1:timer                        307986        358
cpu2:timer                        285682        332
Total                            9429756      10977

@oparoz
Copy link
Contributor Author

oparoz commented Jan 29, 2016

At one point, restarting Suricata wouldn't help. All I got was

re0 Watchdog Timeout Error

A reboot was necessary.

To test I'm using torrents which create lots of connections.

@oparoz
Copy link
Contributor Author

oparoz commented Jan 30, 2016

I've played with a lot of tunables and it's always the same symptoms. At some point Suricata simply can't talk to the network card any more. It doesn't matter if the NIC is using interrupts, a fast queue or polling. Although the drivers can be blamed (I'm going to compile the ones from Realtek to see if it makes a difference), maybe netmap needs to be updated (I'v noticed lots of movement in the repository these past months) or simply Suricata is not ready yet for netmap.

@fichtner
Copy link
Member

I'm still suspecting re(4) issues over everything else.

@oparoz
Copy link
Contributor Author

oparoz commented Feb 2, 2016

It's a good thing netmap comes with a packed generator, it makes things easier to test (100% reproducible).
I've tested the drivers and netmap which come with 11-CURRENT and get the same results. I've tried to patch the official Realtek driver, but it doesn't work as expected.

The problem has been reported upstream since the driver modifications are provided by the netmap package. I suspect netmap is not piloting the card properly and is flooding it with more information that it can handle.

@oparoz
Copy link
Contributor Author

oparoz commented Feb 6, 2016

This can be temporarily fixed by enabling jumbo frames on that interface.

I suspect this workaround in the netmap header is not working

  • This device uses all the buffers in the ring, so we need
  • another termination condition in addition to RL_RDESC_STAT_OWN
  • cleared (all buffers could have it cleared). The easiest one is to stop right before nm_hwcur.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants