Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

setsockopt out of memory causes babeld failure #24

Open
bennlich opened this issue Mar 27, 2018 · 7 comments
Open

setsockopt out of memory causes babeld failure #24

bennlich opened this issue Mar 27, 2018 · 7 comments

Comments

@bennlich
Copy link
Collaborator

bennlich commented Mar 27, 2018

Thanks to https://peoplesopen.net/monitor it is now easier to track this. See #8 for early observations.

The Bug

On a fresh boot of the psychz exit node, home nodes dig tunnels, babel babels, and everybody's routing tables get filled with mesh routes. But...

Over time (after about 24-48 hours), routes start to slowly disappear from the routing table, and they don't return until babeld and tunneldigger-broker are restarted on the exit node.

Debugging

This appears to be due to a memory leak in babeld. When the exit node is in the bad state, looking at /var/log/babeld.log during a tunnel connect shows:

Warning: cannot save old configuration for l2tp4061.
setsockopt(IPV6_JOIN_GROUP): Cannot allocate memory
setsockopt(IPV6_LEAVE_GROUP): Cannot assign requested address
Warning: cannot restore old configuration for l2tp4061.

i.e. babeld tries to add the socket to its ipv6 broadcast group and fails due to a memory allocation error.

When the exit node is in a healthy state, no such errors get logged to /var/log/babeld.log, and the mesh routes get added to the routing table as expected.

Conclusion

It looks like there's a socket option memory leak in babeld. I think we're only seeing this bug now in the last month because someone happens to be running a weird node that disconnects and reconnects its tunnel every 5 minutes. You can see this behavior by watching /var/log/syslog on the psychz node for 5 minutes.

Every time the rogue node destroys and recreates a tunnel, the tunneldigger up and down hooks are run, the old tunnel interface is removed from babeld (babeld -x $ifname) and the new tunnel interface is added (babeld -a $ifname).

It seems that removing an interface from babeld does not properly clean up all used memory, and eventually babeld is unable to setsockopt on new sockets.

Todo

Look into socket option memory allocation? Halp!

@jhpoelen
Copy link
Contributor

jhpoelen commented Mar 27, 2018

@bennlich awesome! Perhaps we can hack on reproducing this in an isolated babeld stress test so we can easily know when future fixes are resolving the issue. Happy to help with this, although @Juul and others might have more experience with this.

jhpoelen pushed a commit to sudomesh/exitnode that referenced this issue Apr 1, 2018
jhpoelen pushed a commit to sudomesh/exitnode that referenced this issue Apr 1, 2018
@jhpoelen
Copy link
Contributor

jhpoelen commented Apr 2, 2018

I have installed a babeld-monitor on both the HE and Psychz exit node to detect and apply a workaround for the issue reported in #24 . Using a systemd timer babeld-monitor.timer , babeld log is scanned for specific memory error every 10 minutes. If detected, babeld is restarted and all active tunnel interfaces are re-added to babeld. All things can be observed in the systemd logs. All this is now also added when using create_exitnode via sudomesh/exitnode repo. Please see https://github.com/sudomesh/exitnode/tree/master/src/opt/babeld-monitor and https://github.com/sudomesh/exitnode/tree/master/src/etc/systemd/system if you'd like to learn more about this.

@jhpoelen
Copy link
Contributor

jhpoelen commented Apr 2, 2018

I hope we can remove this hack once the root cause of the babeld error can be found and fixed.

@bennlich
Copy link
Collaborator Author

bennlich commented Apr 5, 2018

@jhpoelen nice haxxx! I'm reading up on systemd now... Would love to figure out the root cause too. Raw socket land seems like a daunting land tho. Maybe need to use a phone-a-friend.

@bennlich
Copy link
Collaborator Author

bennlich commented Jun 6, 2018

@bennlich bennlich changed the title Over time, routes disappear from the exit node routing table setsockopt out of memory causes babeld failure Sep 27, 2018
@bennlich
Copy link
Collaborator Author

Tried to write a dead-simple stress test today at the software working group with @eenblam and @squeeesh, but we were unable to reproduce the bug. I think our test did not go quite deep enough--an strace of babeld showed that babeld was rarely calling setsockopt(IPV6_JOIN_GROUP).

Probably a better test would involve creating fresh network interfaces and adding to babeld instead of adding/removing my computer's default interface over and over again :-P I'm not sure what's a good way to create a bunch of functional network interfaces...

Also, @eenblam noticed that in the re6stnet commit, they seem to suggest that their fix was to clean up their tunnels less aggressively. So: maybe babeld needs to setsockopt(IPV6_LEAVE_GROUP) /before/ tunneldigger obliterates the network interface. This would make some sense, as setsockopt(IPV6_LEAVE_GROUP) does expect to be passed an interface index (see https://tools.ietf.org/html/rfc3493#section-5.2).

And @squeeesh found this cool and terrifying network stress test lib https://github.com/dtaht/rtod.

@bennlich
Copy link
Collaborator Author

Oh, and if it /is/ a matter of giving babeld a chance to LEAVE_GROUP before tunneldigger destroys the interface, tunneldigger's pre-down hook seems promising, except for the fact that:

the pre-down hook is not guaranteed to complete before the tunnel is shut down.

from https://github.com/wlanslovenija/tunneldigger/blob/master/HISTORY.rst.

(Hook scripts are executed in their own processes.)

jkilpatr added a commit to althea-net/rita that referenced this issue Feb 7, 2020
This resolves the issue described here sudomesh/bugs#24
Where babel will be uanble to free it's resources for the interface and run out of
memory
jkilpatr added a commit to althea-net/rita that referenced this issue Feb 7, 2020
This resolves the issue described here sudomesh/bugs#24
Where babel will be uanble to free it's resources for the interface and run out of
memory
jkilpatr added a commit to althea-net/babeld that referenced this issue Mar 31, 2021
This pulls the latest version of the kernel_setup_interface function
in from upstream with the hope that it fixes some obscure issues we're
having.

setsockopt(IPV6_JOIN_GROUP): Out of memory
setsockopt(IPV6_LEAVE_GROUP): Address not available
Warning: cannot restore old configuration for wgA.
Warning: cannot save old configuration for wgB.

We keep seeing these sorts of error messages on long running production
nodes, presumably due to the race condition outlined here

sudomesh/bugs#24

Obviously it would be best if we could recover from these errors in
Babel rather than having to try and reduce them on the side of the
interfacing application.

That being said this isn't a well consdiered change, it may be that we
have to cleanup old_if in this error case in a way upstream has not
considered.
thosmos pushed a commit to thosmos/babeld that referenced this issue Aug 13, 2021
This pulls the latest version of the kernel_setup_interface function
in from upstream with the hope that it fixes some obscure issues we're
having.

setsockopt(IPV6_JOIN_GROUP): Out of memory
setsockopt(IPV6_LEAVE_GROUP): Address not available
Warning: cannot restore old configuration for wgA.
Warning: cannot save old configuration for wgB.

We keep seeing these sorts of error messages on long running production
nodes, presumably due to the race condition outlined here

sudomesh/bugs#24

Obviously it would be best if we could recover from these errors in
Babel rather than having to try and reduce them on the side of the
interfacing application.

That being said this isn't a well consdiered change, it may be that we
have to cleanup old_if in this error case in a way upstream has not
considered.
thosmos pushed a commit to thosmos/babeld that referenced this issue Aug 13, 2021
This pulls the latest version of the kernel_setup_interface function
in from upstream with the hope that it fixes some obscure issues we're
having.

setsockopt(IPV6_JOIN_GROUP): Out of memory
setsockopt(IPV6_LEAVE_GROUP): Address not available
Warning: cannot restore old configuration for wgA.
Warning: cannot save old configuration for wgB.

We keep seeing these sorts of error messages on long running production
nodes, presumably due to the race condition outlined here

sudomesh/bugs#24

Obviously it would be best if we could recover from these errors in
Babel rather than having to try and reduce them on the side of the
interfacing application.

That being said this isn't a well consdiered change, it may be that we
have to cleanup old_if in this error case in a way upstream has not
considered.
thosmos pushed a commit to thosmos/babeld that referenced this issue Aug 13, 2021
This pulls the latest version of the kernel_setup_interface function
in from upstream with the hope that it fixes some obscure issues we're
having.

setsockopt(IPV6_JOIN_GROUP): Out of memory
setsockopt(IPV6_LEAVE_GROUP): Address not available
Warning: cannot restore old configuration for wgA.
Warning: cannot save old configuration for wgB.

We keep seeing these sorts of error messages on long running production
nodes, presumably due to the race condition outlined here

sudomesh/bugs#24

Obviously it would be best if we could recover from these errors in
Babel rather than having to try and reduce them on the side of the
interfacing application.

That being said this isn't a well consdiered change, it may be that we
have to cleanup old_if in this error case in a way upstream has not
considered.
thosmos pushed a commit to thosmos/babeld that referenced this issue Aug 13, 2021
This pulls the latest version of the kernel_setup_interface function
in from upstream with the hope that it fixes some obscure issues we're
having.

setsockopt(IPV6_JOIN_GROUP): Out of memory
setsockopt(IPV6_LEAVE_GROUP): Address not available
Warning: cannot restore old configuration for wgA.
Warning: cannot save old configuration for wgB.

We keep seeing these sorts of error messages on long running production
nodes, presumably due to the race condition outlined here

sudomesh/bugs#24

Obviously it would be best if we could recover from these errors in
Babel rather than having to try and reduce them on the side of the
interfacing application.

That being said this isn't a well consdiered change, it may be that we
have to cleanup old_if in this error case in a way upstream has not
considered.
thosmos pushed a commit to thosmos/babeld that referenced this issue Aug 17, 2021
Update CHANGES.

Implement mandatory bits in all TLVs.

Big fixes while parsing sub-TLVs.

- Hello is not ignored if there is a mandatory sub-TLV,
- non-wildcard Updates also,
- Duplicated check for Requests,
- wrong size for the beginning of sub-TLVs for Seqno Requests,
- wrong size for source specific Requests and Seqno Requests.

Fix unlikely corner-cases (not bugs).

Fix parsing of sub-TLVs.

Update handling of sub-TLVs to comply with latest spec.

Remove keep_unfeasible, in compliance with rfc6126bis.

Ignore unicast Hellos (for now).

Implement unscheduled Hellos.

This also removes special casing of late Hellos.

Move hello history into a separate structure.

Maintain unicast Hello history.

Use unicast Hellos for reachability on wired links.

Update CHANGES.

Fix forgotten call to send_request_resend.

Take unicast Hellos into account when scheduling neighbours check.

Update CHANGES.

Fix typo in send_request.

Remove calls to send wildcard requests.

Since send_request was buggy, these weren't doing anything.  Don't change
the behaviour, sending wildcard requests at startup is not a good idea.

Fix parsing of source prefix length in filters for IPv4 routes.

Fix parser memory leaks.

Fix: ignore peer address of point-to-point interfaces.

Point-to-point interfaces are bound to two link-local addresses: the
local address and the peer address [1].  The former is advertised with
an IFA_LOCAL TLV and the latter with an IFA_ADDRESS TLV.

[1] $ip addr show [...]
    inet6 fe80::1234 peer fe80::5678/128 scope link

Improve the test scripts in tests/:

This commit improves the tests by making the rtt test almost identical
to multihop-gdb (formerly known as multihop-hand). Also, it fixes a
small grep problem in multihop-smoketest (formerly known as
multihop-basic).

Another thing which happens here is some name changes to more
descriptive names.

travis tests

A quick set of compilation, linting and integration tests to run
on patches

Improve the price/quality multiplier:

This commit intends to harden and document the price/quality tradeoff
knob better. Here's what it does:

* Change its name to quality_multiplier
* Change its type to uint16_t
* Start using strtoul() to parse it from the command line
* Add an entry for -a in the manpage and the usage string

Tidy up the Althea extensions to Babel

This commit aims to make Althea-specific changes to Babel more robust
and integrated with the implementation.

babeld.c:
* Switch the price to the uint32_t type
* Check the price and multiplier more strictly
* Add both of the new flags to the usage message

babeld.man:
* Add entries for the price and the quality multiplier

configuration.c:
* Make getuint() check for errno after the strtoul() call
* Add a config option for the quality multiplier

local.c:
* Explicitly list "full-path-rtt" in dumps (used to be just "rtt")

message.c:
* Correct variable names to explicitly mention the full path RTT

util.c:
* Remove the redundant parse_price function

xroute.c:
* Hardcode a 0 price value in add_xroute so that we don't bill our
neighbours and only for forwarding.

tests/multihop-gdb-rtt.sh:
* typo

Rename price to fee

This commit aims to make the price metric code more intuitive by
renaming a node's profits from "per_byte_cost" to "fee" so that all
price-related variables can be viewed from the running node's
perspective, e.g.:

Alice runs a node which takes a *fee* of $5 for forwarding a byte. It
receives several *prices* from her neighbours Bob, Kevin and Charlotte.
She then computes a new *price* equal to *received_price + fee* for
every non-xroute route she wants to advertise.

The previous approach would name nearly every aspect of the price differently:
* The CLI argument was `-P`
* The socket config value was `price`
* The C code fee variable was `per_byte_cost`

With this commit these names change as follows:
* CLI arg is now `-F`
* The socket config value is now `fee`
* The C code fee variable is now just `fee`

The price-related members of the different route structs around the code
are still named `price`, because a *price* means a "retail" advertised
(to us or by us) value with all fees included.

Weatherproof the tests:

This commit tries to make obvious errors easier to catch both in CI and
manual testing.

Note: multihop-smoketest.sh contains commented out suboptimal routes.
Even though Babel is capable of converging on the best routes in mere
seconds, the suboptimal paths in the graph are often incorrect or
possibly riddled with cycles. The suboptimal routes will get
uncommented/deleted once the problem is resolved/explained.

tests/multihop-gdb-rtt.sh:
tests/multihop-gdb.sh:
* Change netlab-4 fee to 7 - no two prices give the same sum anymore

tests/multihop-smoketest.sh:
* Extend the node layout to a 4-node diamond
* Minimize hello and update intervals to speed up convergence
* Decrease the delay to 5 seconds (12 times quicker, baby!)
* Strengthen route checking - installed route optimality is now
precisely verified

Useless initialization (do_filter do the job).

Rename price to fee in the usage string

Account for the 0% loss breaking change in netem

test.sh: typo: cppchecki -> cppcheck

Add a script for backwards compatibility testing:

This commit introduces a simple script based on multihop-smoketest.sh
which takes any two revisions of Babel and Checks whether they can talk
to each other.

.gitignore:
* Ignore test-time temporary repos

test.sh:
* Add the compat test

Add reachability testing to the smoketest

Make all statements follow debugging rules

This debugging statement was wrapped in a printf instead of a debugf
resulting in noisy normal operation

Useless initialization (do_filter do the job).

Update CHANGES for 1.8.1.

Fix parsing of source length in filters.

This fixes a bug that was introduced in commit 4f4e3cb, and prevented
non-source-specific IPv4 routes from being redistributed.  Thanks
to Niklas Yann Wettengel for the detective work.

Update CHANGES for 1.8.2.

Tests: change ports to Rita integration test ones

Make test scripts automatically cd into the test dir

Fix runtime fee changes

The problem was an old name for the "fee" config value (used to be
"price") which was included in a parser if statement which would exclude
unrecognized keywords.

Harden and rename the getuint() function (getuint32_t() from now on)

Modify price/quality behavior

This commit implements the log2()-based price/quality metric:

metric(p, m, f) = log2(p) + log2(m) * f

p: The price
m: Babel's traditional quality-based metric value
f: The metric factor - decides how much we value metric (quality)
improvement vs. price improvement when comparing routes

Makefile:
* Link libm to get math.h to work

babeld.c:
* Change quality_multiplier to metric_factor, widen it to uint32_t and
change the default to 1900 (1.9)
* Change the metric factor's CLI option to 'q'

babeld.h:
disambiguation.c:
local.c:
message.c:
neighbour.c:
resend.c:
route.h:
source.c:
xroute.c
* INFINITY -> BABEL_INFINITY to resolve a conflict with math.h

configuration.c:
* INFINITY -> BABEL_INFINITY to resolve a conflict with math.h
* Add metric-factor to the weird blacklist if in parse_option()

route.c:
* INFINITY -> BABEL_INFINITY to resolve a conflict with math.h
* Implement the new metric formula

tests/multihop-smoketest.sh:
* Account for the metric factor option name change
* Add a debug mode stop before test assertions too

Don't use %d for unsigned numbers

This caused our unsigned integer prices to be interpreted and
printed as signed integers.

Increase allowed management socket connections

FIX: NO SUCH DEVICE when adding routes

When adding unreachable routes and setting the RTNH_F_ONLINK flag, a
device is required to be specified. In Linux kernel 4.16 support for
this flag was added. Until now it was ignored.
If RTNH_F_ONLINK is specified while the device is missing, newer kernels
will respond with No such device.
The result is:
* spam in the log file
* missing routes for both ipv4 and ipv6

Pull in upstream kernel_setup_interface

This pulls the latest version of the kernel_setup_interface function
in from upstream with the hope that it fixes some obscure issues we're
having.

setsockopt(IPV6_JOIN_GROUP): Out of memory
setsockopt(IPV6_LEAVE_GROUP): Address not available
Warning: cannot restore old configuration for wgA.
Warning: cannot save old configuration for wgB.

We keep seeing these sorts of error messages on long running production
nodes, presumably due to the race condition outlined here

sudomesh/bugs#24

Obviously it would be best if we could recover from these errors in
Babel rather than having to try and reduce them on the side of the
interfacing application.

That being said this isn't a well consdiered change, it may be that we
have to cleanup old_if in this error case in a way upstream has not
considered.

Set MAX_INTERFACES = 2

increase max interfaces to 10000
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants