Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nss-myhostname doesn't scale when using a large main routing table #11384

Open
bjo81 opened this issue Jan 10, 2019 · 12 comments
Open

nss-myhostname doesn't scale when using a large main routing table #11384

bjo81 opened this issue Jan 10, 2019 · 12 comments

Comments

@bjo81
Copy link

bjo81 commented Jan 10, 2019

systemd version the issue has been seen with

240.0
239.370

Used distribution

ArchLinux

Expected behaviour you didn't see

a working mtr

Unexpected behaviour you saw

mtr does not resolve anything, instead it hangs in the background after Ctrl+C and eats all memory

Steps to reproduce the problem

use the stub-resolver of systemd-resolve and run mtr (0.92)

@yuwata yuwata added the resolve label Jan 12, 2019
@poettering
Copy link
Member

What makes you think resolved is at fault? can you provide a stacktrace of the "hang"?

i.e. use "pstack" (from the gdb package) on the tool when it hangs.

What do you mean by "eats all memory"?

is this still an issue with current systemd-resolved versions?

@poettering poettering added the needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer label Nov 6, 2020
@bjo81
Copy link
Author

bjo81 commented Nov 6, 2020

First I thought it's resolved's fault because it seemed to appear with resolved. But it seems it's somehow related to "myhostname mymachines". When running mtr, each resolving of a hop takes about 100MB, so there are 6-7 mtr processes consuming all memory of a 1GB box.
Disabling "myhostname mymachines" in nsswitch.conf seems to resolve it on that box, but it doesn't appear on other boxes, which makes it more confusing.

@poettering
Copy link
Member

if you turn off one of the two, and leave the other in, what happens then?

@poettering
Copy link
Member

how do you determine "consume all memory"? what tool do you use?

@bjo81
Copy link
Author

bjo81 commented Nov 6, 2020

It seems to be related to "myhostname".
I have attached 2 screenhots, you can see the memory usage while running a mtr. The mtr processes also remain after closing it.

Bildschirmfoto zu 2020-11-06 21-58-39
Bildschirmfoto zu 2020-11-06 21-58-57

@poettering
Copy link
Member

Do you have a particular complex network setup? Millions of routes or interfaces or so?

@poettering poettering changed the title systemd-resolved: stub resolver breaks mtr systemd-myhostname breaks mtr Nov 6, 2020
@poettering poettering changed the title systemd-myhostname breaks mtr nss-myhostname breaks mtr Nov 6, 2020
@bjo81
Copy link
Author

bjo81 commented Nov 6, 2020

5 interfaces and a kernel routing table with currently 98120 routes.

@poettering
Copy link
Member

I figure the huge routing table is the problem... nss-myhostname synthesizes a special hostname _gateway that maps to the locally defined default gateway. For that it has a look at the routing table. I figure mtr does a reverse lookup of all IP addresses it sees, and each time nss-myhostname enumerates the routes to determine whether the specified IP address maps to _gateway, and that obviously doesn't scale.

@poettering
Copy link
Member

Also see, #13199

@poettering
Copy link
Member

Actually, let's close this one as duplicate of #13199

@felixonmars
Copy link
Contributor

Sorry for replying to an old issue. But the fix in #13199 didn't fix this for me, unfortunately.

If I understand this correctly, #13199 optimizes it to dump only the main routing table. However, I do have ~40k routes in the main routing table, and it costs 20+s for libnss_myhostname to do an unsuccessful rDNS lookup on my J4125 (should not be that low-end).

Screenshot_20230322_102603

Screenshot_20230322_100526

A simple "ping" command to an IP address without rDNS takes ~25s in both CPU time and user time, and perf shows most time was consumed in libnss_myhostname.so. Other examples: mtr in OP consumes gigabytes of memory and takes up all CPU for as long as a few minutes; iftop continuously takes up two CPU cores as it keeps seeing new IP addresses all the time.

I believe localhost and local hostname resolving is the main purpose of this nss module, while _gateway / _outbound are just a bonus and not really widely used, especially for querying rDNS. Currently I had to remove the module from my nsswitch.conf and fallback to manually updating /etc/hosts as a workaround.

Is it possible to fix this properly by, for example, make the resource consuming _gateway / _outbound resolving functionality optional either at compile time or run time, or even split them as a separate module?

@YHNdnzj YHNdnzj reopened this Mar 23, 2023
@YHNdnzj
Copy link
Member

YHNdnzj commented Mar 23, 2023

Obviously the mentioned fix doesn't cover the case when the main table itself is huge. Let's hijack the thread here.

@YHNdnzj YHNdnzj removed needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer duplicate labels Mar 23, 2023
@YHNdnzj YHNdnzj changed the title nss-myhostname breaks mtr nss-myhostname doesn't scale when using a large main routing table Mar 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants