Skip to content

Conversation

@mangelajo
Copy link
Contributor

mDNS was listening on internal ovn interfaces (pods) and allocating
way too much memory per server, regular mDNS queries fit on unfragmented
regular 1500 MTU packets.

Closes: https://issues.redhat.com/browse/USHIFT-264

Which issue(s) this PR addresses:

Closes #

@mangelajo mangelajo requested a review from zshi-redhat August 10, 2022 12:03
@openshift-ci openshift-ci bot requested review from dhellmann and ggiguash August 10, 2022 12:05
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 10, 2022
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 4K buffer size choice? Can we add a comment?
Is DNS packet size limited to 512 bytes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my understanding is regular dns packet won't exceed 512 bytes, 4K is larger than mtu and considerred safe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DNS (responses) are recommended to stay under 512bytes AFAIK, but not sure about queries.

I did a bit of practical research:

On OSX, the longest host I could try to resolve was: tryingtocontact.a-very-long-host.in-my-network.itsactually-super-super-super-super-super-super-super-long.itsactually-super-super-super-super-super-super-super-long.itsactually-super-super-super-super-super-super-super-longhoooo.local

This resulted on the following packet:

0000   01 00 5e 00 00 fb f0 2f 4b 0a 08 a8 08 00 45 00   ..^..../K.....E.
0010   01 18 21 e5 00 00 ff 11 f4 dd c0 a8 02 6e e0 00   ..!..........n..
0020   00 fb 14 e9 14 e9 01 04 9c 4a 00 00 00 00 00 01   .........J......
0030   00 00 00 00 00 00 0f 74 72 79 69 6e 67 74 6f 63   .......tryingtoc
0040   6f 6e 74 61 63 74 10 61 2d 76 65 72 79 2d 6c 6f   ontact.a-very-lo
0050   6e 67 2d 68 6f 73 74 0d 69 6e 2d 6d 79 2d 6e 65   ng-host.in-my-ne
0060   74 77 6f 72 6b 3a 69 74 73 61 63 74 75 61 6c 6c   twork:itsactuall
0070   79 2d 73 75 70 65 72 2d 73 75 70 65 72 2d 73 75   y-super-super-su
0080   70 65 72 2d 73 75 70 65 72 2d 73 75 70 65 72 2d   per-super-super-
0090   73 75 70 65 72 2d 73 75 70 65 72 2d 6c 6f 6e 67   super-super-long
00a0   3a 69 74 73 61 63 74 75 61 6c 6c 79 2d 73 75 70   :itsactually-sup
00b0   65 72 2d 73 75 70 65 72 2d 73 75 70 65 72 2d 73   er-super-super-s
00c0   75 70 65 72 2d 73 75 70 65 72 2d 73 75 70 65 72   uper-super-super
00d0   2d 73 75 70 65 72 2d 6c 6f 6e 67 3f 69 74 73 61   -super-long?itsa
00e0   63 74 75 61 6c 6c 79 2d 73 75 70 65 72 2d 73 75   ctually-super-su
00f0   70 65 72 2d 73 75 70 65 72 2d 73 75 70 65 72 2d   per-super-super-
0100   73 75 70 65 72 2d 73 75 70 65 72 2d 73 75 70 65   super-super-supe
0110   72 2d 6c 6f 6e 67 68 6f 6f 6f 6f 05 6c 6f 63 61   r-longhoooo.loca
0120   6c 00 00 01 00 01                                 l.....

Longer than that OSX would refuse,

on linux, the nss-mdns stack seems to be cutting down queries:

# tcpdump -i any port 5353 -s 2000 -x
15:49:32.205296 enp12s0u2u1u2 Out IP lenovo.local.mdns > 224.0.0.251.mdns: 0 A (QM)? tryingtocontact.a-very-long-host.in-my-network.itsactually-supe. (81)
        0x0000:  4500 006d 59bc 4000 ff11 7f01 c0a8 011e
        0x0010:  e000 00fb 14e9 14e9 0059 a32c 0000 0000
        0x0020:  0001 0000 0000 0000 0f74 7279 696e 6774
        0x0030:  6f63 6f6e 7461 6374 1061 2d76 6572 792d
        0x0040:  6c6f 6e67 2d68 6f73 740d 696e 2d6d 792d
        0x0050:  6e65 7477 6f72 6b10 6974 7361 6374 7561
        0x0060:  6c6c 792d 7375 7065 0000 0100 01

non-domain related responses (services) were sometimes bigger than that, but we don't handle
that.

So, probably 512 is more than enough for the specific purpose of this mDNS server, we can
add a comment to say that if we plan to handle mDNS responses too at some point, the buffer needs to be bumped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'm remembering now that the MDNS protocol could allow multiple queries on a single request. Which could grow the packet.

I propose 2048 that will be slightly above the 1500 MTU then.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not a big difference in the memory footprint, so any value we deem reasonable would be OK.
Please, just put a comment in the code explaining the choice so that we know when to update the buffer size if necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment with examples of interfaces names matching the regex?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking to match on the driver of interface, but current code changes less and is more clean :-)

^[A-Fa-f0-9]{15} matches the ovs container interface (15 bits of the container sandbox ID)
ovn.* matches the ovn management port ovn-k8s-mp0
br-int matches the ovnk integration bridge
cni.* matches cni0
ovs-system$ matches the ovs system interface.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, do we listen on both br-ex and physical interface (like eth0)?
eth0 is added as ovs port of br-ex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, do we listen on both br-ex and physical interface (like eth0)? eth0 is added as ovs port of br-ex.

Right, it will listen on both. But I don't think we have a simple way of filtering the interface connected to br-ex
without digging into openvswitch database, and I'm not sure if it's safe to only listen on "eth0" and exclude br-ex. Do you want me to try that?

the worst case is that we will respond twice to mdns queries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, do we listen on both br-ex and physical interface (like eth0)? eth0 is added as ovs port of br-ex.

Right, it will listen on both. But I don't think we have a simple way of filtering the interface connected to br-ex without digging into openvswitch database, and I'm not sure if it's safe to only listen on "eth0" and exclude br-ex. Do you want me to try that?
if bonding is used on the OVS side that could fail to work.

I tend to think that duplicated replies is better than no replies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add this list to a comment in the code, explaining the logic of filtering?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh of course, that makes sense

@mangelajo
Copy link
Contributor Author

I updated the patch according to comments

@mangelajo
Copy link
Contributor Author

@zshi-redhat I verified and we don't send duplicate responses although we are listening on both br-ex and eth0:

image

mDNS was listening on internal ovn interfaces (pods) and allocating
way too much memory per server, regular mDNS queries fit on unfragmented
regular 1500 MTU packets.

Closes: https://issues.redhat.com/browse/USHIFT-264
@mangelajo
Copy link
Contributor Author

updated with the additional comments.

@mangelajo
Copy link
Contributor Author

@ggiguash @zshi-redhat does it look ok to lgtm ?

@ggiguash
Copy link
Contributor

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 11, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 11, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ggiguash, mangelajo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 2 against base HEAD 9466daf and 8 for PR HEAD f7542fe in total

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Aug 11, 2022

@mangelajo: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit b10d3c8 into openshift:main Aug 11, 2022
@zshi-redhat
Copy link
Contributor

zshi-redhat commented Aug 15, 2022

@zshi-redhat I verified and we don't send duplicate responses although we are listening on both br-ex and eth0:

image

@mangelajo I observed two reply packets when tcpdump on the physical interface (e.g. enp1s0):

23:10:37.644628 52:54:00:c7:7e:34 > 01:00:5e:00:00:fb, ethertype IPv4 (0x0800), length 74: 192.168.122.250.mdns > 224.0.0.251.mdns: 0 A (QM)? my-nginx.local. (32)
23:10:37.644902 52:54:00:c7:7e:34 > 01:00:5e:00:00:fb, ethertype IPv4 (0x0800), length 84: 192.168.122.250.mdns > 224.0.0.251.mdns: 0*- [0q] 1/0/0 A 192.168.122.250 (42)
23:10:37.644988 52:54:00:c7:7e:34 > 01:00:5e:00:00:fb, ethertype IPv4 (0x0800), length 84: 192.168.122.250.mdns > 224.0.0.251.mdns: 0*- [0q] 1/0/0 A 192.168.122.250 (42)

Which interface did you capture the mdns packet?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants