New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling of NLM_F_DUMP_INTR is apparently missing #874
Comments
Yep, looks like that. Thanks, investigating! |
Yep. What kind of the feedback would you expect here — a simple flag in the response, or an exception raised? |
Either of those will do, though in my opinion it will be fine to simply retry the request until it |
flags added: * NLM_F_DUMP_INTR * NLM_F_DUMP_FILTERED Bug-Url: #874
Things get more complicated. I've reproduced the 0x10 flag By default the library reads the response and returns it to the user as a list, but it's only the default historical behaviour. Having So I leave it as is for now — only adding the flag, thanks — and as a workaround I can propose checking the response's from pyroute2.netlink import NLM_F_DUMP_INTR
for msg in ipr.addr('dump'):
if msg['header']['flags'] & NLM_F_DUMP_INTR:
the_retry_logic() If you can propose any improvements and shortcuts that will not break the compatibility — pls ping me. Thanks! |
I realize that if the procedure that returns the data is a generator, than it is impossible to make retry logic invisible to the caller. However just hoping that all users of the library will know that they need to check an obscure flag, in my opinion, will not work. The next person who, like me, suddenly encounters truncated data will have no idea what's going on. In addition to that, such approach will make it impossible to use list comprehensions like this (taken from real code):
that I, for one, rather like! 😁 I think that it may be more "user-friendly" to raise an exception with a suggestive name. Then when this (rare) situation happens, at least the user will get a clue what happened, rather than struggle with mysteriously truncated result. With exception, I would write the above code as
I my eyes, this looks less ugly than checking the flag in every returned element. What do you think? P.S. I am not 100% sure, but it may be necessary to consume potentially remaining messages waiting in the socket, after the one with raised NLM_F_DUMP_INTR flag, unless you close and reopen the socket. See this patch as an example. "Exception" approach allows to do it in the library, before raising the exception, and deny the user the opportunity to make another error. |
As it seems to work now, are there plans to tag a release?.. We'd build and internal package, it's nicer to have it anchored on a tagged version than on a commit id... Thanks! |
Yep, sure. Let's tag it on Thursday — I would like to have one more feature in the release :) And hope to add it tonight + some time for testing. |
tagged |
Closing the ticket, pls don't hesitate to re-open in the case of any issues. |
Hi, in our CI we see issues with this new exception (NetlinkDumpInterrupted): It seems to be coming from here: This can be fixed with something like this diff:
As I see this will not introduce extra problems, and works in our test env also, what do you think? |
@elajkat , would it make sense to submit a pull request with this test for inclusion in pyroute2 testssuite? I think everyone would benefit from it... |
Thanks, looks perfect. |
@elajkat well, if you get this exception, it means that the data is incomplete. So yes, it's worth and I would say mandatory to retry, if you care about the accuracy of Neutron's view of the landscape. There is some example code in one of my comments above for inspiration. |
Thanks, I will add retry logic then |
pyroute 0.6.7 introduced a new exception NetlinkDumpInterrupted which is raised when NLM_F_DUMP_INTR is set in the flags during dump of devices. Final solution for it should be to use retry as suggested by pyroute2 developers (see [1]), a possible patch for the retry logic is [2]. [1]: svinota/pyroute2#874 (comment) [2]: https://review.opendev.org/c/openstack/neutron/+/833015 Related-Bug: #1962608 Change-Id: I757a2d03e60772c716ddaa5fa63f36cce10af984
* Update requirements from branch 'master' to 9d80db06bcaebea774dca6325a10921221ec31d4 - Merge "Limit pyroute2 to be <0.6.6" - Limit pyroute2 to be <0.6.6 pyroute 0.6.7 introduced a new exception NetlinkDumpInterrupted which is raised when NLM_F_DUMP_INTR is set in the flags during dump of devices. Final solution for it should be to use retry as suggested by pyroute2 developers (see [1]), a possible patch for the retry logic is [2]. [1]: svinota/pyroute2#874 (comment) [2]: https://review.opendev.org/c/openstack/neutron/+/833015 Related-Bug: #1962608 Change-Id: I757a2d03e60772c716ddaa5fa63f36cce10af984
* Update neutron from branch 'master' to ec4cf672f11a812a475d051970069e5b3a304189 - Merge "Add retry for privsep get_link_devices" - Add retry for privsep get_link_devices pyroute 0.6.6 introduced a new exception NetlinkDumpInterrupted which is raised when NLM_F_DUMP_INTR is set in the flags during dump of devices. The suggestion from pyroute developers is to retry in case of this exception (see [1]). [1]: svinota/pyroute2#874 (comment) Closes-Bug: #1962608 Change-Id: Ie195ad596fd148708fc30946bde964d52444afee
pyroute 0.6.6 introduced a new exception NetlinkDumpInterrupted which is raised when NLM_F_DUMP_INTR is set in the flags during dump of devices. The suggestion from pyroute developers is to retry in case of this exception (see [1]). [1]: svinota/pyroute2#874 (comment) Closes-Bug: #1962608 Change-Id: Ie195ad596fd148708fc30946bde964d52444afee
pyroute 0.6.6 introduced a new exception NetlinkDumpInterrupted which is raised when NLM_F_DUMP_INTR is set in the flags during dump of devices. The suggestion from pyroute developers is to retry in case of this exception (see [1]). [1]: svinota/pyroute2#874 (comment) Closes-Bug: #1962608 Change-Id: Ie195ad596fd148708fc30946bde964d52444afee (cherry picked from commit 74a9e83)
pyroute 0.6.6 introduced a new exception NetlinkDumpInterrupted which is raised when NLM_F_DUMP_INTR is set in the flags during dump of devices. The suggestion from pyroute developers is to retry in case of this exception (see [1]). [1]: svinota/pyroute2#874 (comment) Closes-Bug: #1962608 Depends-On: https://review.opendev.org/c/openstack/requirements/+/845268 Change-Id: Ie195ad596fd148708fc30946bde964d52444afee (cherry picked from commit 74a9e83) (cherry picked from commit 541d1fc)
pyroute 0.6.6 introduced a new exception NetlinkDumpInterrupted which is raised when NLM_F_DUMP_INTR is set in the flags during dump of devices. The suggestion from pyroute developers is to retry in case of this exception (see [1]). [1]: svinota/pyroute2#874 (comment) Closes-Bug: #1962608 Depends-On: https://review.opendev.org/c/openstack/requirements/+/845268 Change-Id: Ie195ad596fd148708fc30946bde964d52444afee (cherry picked from commit 74a9e83) (cherry picked from commit 541d1fc) (cherry picked from commit ea85053) Conflicts: neutron/privileged/agent/linux/ip_lib.py requirements.txt
pyroute 0.6.6 introduced a new exception NetlinkDumpInterrupted which is raised when NLM_F_DUMP_INTR is set in the flags during dump of devices. The suggestion from pyroute developers is to retry in case of this exception (see [1]). [1]: svinota/pyroute2#874 (comment) Closes-Bug: #1962608 Depends-On: https://review.opendev.org/c/openstack/requirements/+/845268 Change-Id: Ie195ad596fd148708fc30946bde964d52444afee (cherry picked from commit 74a9e83) (cherry picked from commit 541d1fc)
pyroute 0.6.6 introduced a new exception NetlinkDumpInterrupted which is raised when NLM_F_DUMP_INTR is set in the flags during dump of devices. The suggestion from pyroute developers is to retry in case of this exception (see [1]). [1]: svinota/pyroute2#874 (comment) Closes-Bug: #1962608 Depends-On: https://review.opendev.org/c/openstack/requirements/+/845268 Change-Id: Ie195ad596fd148708fc30946bde964d52444afee (cherry picked from commit 74a9e83) (cherry picked from commit 541d1fc)
pyroute 0.6.6 introduced a new exception NetlinkDumpInterrupted which is raised when NLM_F_DUMP_INTR is set in the flags during dump of devices. The suggestion from pyroute developers is to retry in case of this exception (see [1]). [1]: svinota/pyroute2#874 (comment) Closes-Bug: #1962608 Depends-On: https://review.opendev.org/c/openstack/requirements/+/845268 Change-Id: Ie195ad596fd148708fc30946bde964d52444afee (cherry picked from commit 74a9e83) (cherry picked from commit 541d1fc) (cherry picked from commit ea85053) Conflicts: neutron/privileged/agent/linux/ip_lib.py requirements.txt
pyroute 0.6.6 introduced a new exception NetlinkDumpInterrupted which is raised when NLM_F_DUMP_INTR is set in the flags during dump of devices. The suggestion from pyroute developers is to retry in case of this exception (see [1]). For reference, this is how it was handled in neutron side: [2] [1] svinota/pyroute2#874 (comment) [2] https://review.opendev.org/c/openstack/neutron/+/833015 Change-Id: I6b5604266f29a585bb1e0f8e605ee5df523b5951
We got a hard to reproduce condition in our tests, when a dump of network interfaces and addresses returned incomplete information. Because we have already encountered a related problem with nftables that turned out to be related to improper reaction NLM_F_DUMP_INTR condition, I suspect that the same thing is happening in pythoute2.
The point is, when the kernel is producing a dump of a kernel structure over multiple netlink messages, and the structure changes in the mid-way,
NLM_F_DUMP_INTR
is added to the header flags, that must be treated by userspace as an indication that the requested dump contains inconsistent data and must be re-requested. See functionnl_dump_check_consistent()
in the kernel sourceinclude/net/netlink.h
.I cannot find related functionality in pyroute2, which may explain the error that we encountered. I suspect that the necessary check for the flag
NLM_F_DUMP_INTR
in the received netlink messages is missing.The text was updated successfully, but these errors were encountered: