Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gluon-core: babel: broken DNS #1500

Closed
blocktrron opened this issue Aug 3, 2018 · 20 comments
Closed

gluon-core: babel: broken DNS #1500

blocktrron opened this issue Aug 3, 2018 · 20 comments
Assignees
Labels
0. type: bug This is a bug 3. topic: babel Topic: Babel Layer 3 Routing
Milestone

Comments

@blocktrron
Copy link
Member

On-node DNS resolving is broken as of b3d7011 - quick fix applied by reverting said commit in 78ed75e

Solution by @christf is pending.

@christf christf added the 3. topic: babel Topic: Babel Layer 3 Routing label Aug 3, 2018
@christf
Copy link
Member

christf commented Aug 3, 2018

Due to the revert batman-based nodes are unaffected, babel-nodes with ipv6 on their wan-ports may be affected:
DNS-requests may leave the node through the wrong interface, leading to not working DNS.
As a consequence, clients will not experience a working network and the autoupdater will not work.

Besides this hack in commit b3d7011 we could consider adding src stanzas to all the babel-routes. I am working on a patch for this.

@christf
Copy link
Member

christf commented Aug 4, 2018

A babel patch exists now. This issue will be fixed by using a babel version based on https://github.com/christf/babeld/tree/PREFSRC and an appropriate babeld configuration.

  • merge PREFSRC support in babeld https://github.com/christf/babeld/tree/PREFSRC (in the meantime use babeldev package from ffffm package repository.)
  • implement configuration for gluon-mesh-babel utilizing the new pref-src babel feature

@rotanid rotanid changed the title gluon-core: broken DNS gluon-core: babel: broken DNS Aug 7, 2018
@rotanid
Copy link
Member

rotanid commented Aug 19, 2018

@christf maybe you can post a short update on your progress and plans, i think we should talk about removing the feature again if there's no chance to get it fixed completely soon?

@christf
Copy link
Member

christf commented Aug 26, 2018

Following a conversation on the babeld mailing list a patch was prepared by myself that is ready and working. A PR is raised upstream in babeld - see here: jech/babeld#18. Unfortunately it is not merged yet. The only symptom by this feature not being merged is that on multi-homed nodes (having ipv6 on WAN) the kernel decides which outbound IP address is used when doiing requests sometimes causing DNS leaks.

imho not a big deal - still it is nice to get this fixed which will happen after prefsrc is merged. in babeld there are tests going on for preparing a merge of the unicast branch on which this is all based on.

There is no need to revert anything in gluon.

@rotanid rotanid added 1. severity: blocker This issue/pr is required for the next release and removed 0. type: regression labels Aug 26, 2018
@rotanid
Copy link
Member

rotanid commented Aug 26, 2018

ok, thanks for the update! i removed the "regression" label, as nothing else beside the new feature is broken anymore. i added a release blocker label instead.

@neocturne neocturne added this to the 2018.2 milestone Oct 9, 2018
@neocturne neocturne removed the 1. severity: blocker This issue/pr is required for the next release label Oct 9, 2018
@rotanid rotanid modified the milestones: 2018.2, 2019.1 Dec 15, 2018
@rotanid rotanid modified the milestones: 2019.1, 2019.2 Jun 19, 2019
@rotanid
Copy link
Member

rotanid commented Jun 25, 2019

as far as i see it, the fix has been pushed to master branch of babeld

@christf
Copy link
Member

christf commented Aug 29, 2019

The fix was pushed to babeld and the required configuration has been tested for two months now. Waiting for modules update to openwrt master where babeld 1.9.1, containing the patch, has just been included.

@rotanid
Copy link
Member

rotanid commented Nov 23, 2019

@christf are we now on a openwrt master status which includes this?
i think we are, as far as i can see it.

what does this mean, there is no issue anymore?

and what about the initial change which wanted to make DNS requests to leave via specific interfaces?
@NeoRaider

@rotanid
Copy link
Member

rotanid commented Dec 13, 2019

@christf @NeoRaider ping?

@neocturne
Copy link
Member

The original change was broken (the 'loopback' interface doesn't even have the primary node address in non-Babel setups), but it is also too specific - all protocols need to use the correct interfaces, DNS is not special in any way.

@rotanid
Copy link
Member

rotanid commented Dec 18, 2019

@NeoRaider so "someone" needs to work this out "sometime" ? 🗡️

@christf are we now on a openwrt master status which includes this?
i think we are, as far as i can see it.

what does this mean, there is no issue anymore?

@christf
Copy link
Member

christf commented Jan 12, 2020

tfor babeld we now have the ability to specify the correct src-address. This was merged in november with #1877. This leads to ipv6 traffic leaving the correct interface without other hacks.
For Batman we might still have an issue when a node has ipv6 on its wan interface.

@rotanid
Copy link
Member

rotanid commented Jan 12, 2020

if you think this affects batman too, do we need a new issue? what exactly is the issue?
because this one here is about babel.

do we need something like b3d7011 for batman? @NeoRaider

@christf
Copy link
Member

christf commented Jan 12, 2020

We do not need something like b3d7011. If we need something, it should be like 2389679 in a sense that pref-src should be set on the routes if a fix is required at all.

@rotanid
Copy link
Member

rotanid commented Jan 12, 2020

as @christf and i just weren't able to reproduce a problem with the src address running Gluon v2019.1.1 with batman-adv, maybe there is no issue left. waiting for @NeoRaider opinion though...

The x86 node we tested with had a public IPv6 address on both br-client and br-wan and two default routes similar to this:

default via fe80::7eff:4dff:XXXX:1db1 dev br-wan  metric 512 
default via fe80::4459:c0ff:XXXX:4fd0 dev br-client  metric 512 

The node also had IPv6 nameservers for both br-wan and br-client.
We ran tcpdump -pni br-wan src $ipv6-adresse-from-br-client and tcpdump -pni br-client src $ipv6-adresse-from-br-wan in parallel and then issued multiple commands in a loop to trigger local-origin traffic like "autoupdater -f" and restarting sysntpd

@neocturne
Copy link
Member

@rotanid Are these default routes in different routing tables? The WAN default route should never appear in the main table.

I'm aware of only one serious issue regarding source address / interface selection in current Gluon (with batman-adv, but likely also affects babel): #1132. Is there still anything to fix for babel?

@rotanid
Copy link
Member

rotanid commented Feb 2, 2020

@NeoRaider i did not query any routing table on purpose, afair i only did ip ro - does this help with your question?

regarding babel, as far as i understand the comment from @christf it's only about batman now

@mweinelt mweinelt modified the milestones: 2020.1, 2020.2 Mar 8, 2020
@neocturne
Copy link
Member

@rotanid Hmm, I think that means that the br-wan default route was in the default table as well, weird - but I don't fully trust the busybox ip, so I might be wrong. If things are working as intended, only the mesh routes should end up in the main table (ip -6 r s table 254), while br-wan should be handled in a separate table (ip -6 r s table 1). (For some reason, "table" needs to be written out with busybox ip ...)

@rotanid
Copy link
Member

rotanid commented Apr 6, 2020

@NeoRaider with busybox ip there are 27 routes. all in the table 254 output, none in table 1 output.
with "ip-full" installed, there are 12 routes in table 254, 4 in table 1 and 22 in table "local".
one default route is in table 254, one in table 1
all other routes look like they are in the correct table, too.
(note: everything with IPv6 parameter -6 only)

any other test cases to be able to say this issue does not exist with batman-adv setups?

@mweinelt mweinelt modified the milestones: 2020.2, 2020.3 May 2, 2020
@neocturne
Copy link
Member

Okay, I think this can be closed (and indeed Busybox ip is useless when it comes to multiple routing tables...).

The remaining weirdness in our routing setup is tracked in #1132.

@rotanid rotanid modified the milestones: 2020.3, 2020.2 May 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0. type: bug This is a bug 3. topic: babel Topic: Babel Layer 3 Routing
Projects
None yet
Development

No branches or pull requests

5 participants