Skip to content

Conversation

@n-sandeep
Copy link
Contributor

Here select is a blocking call since no timeout is specified, so we have chances of interrupting via EINTR.
When EINTR is recived, switchlink_main thread is exiting. This causes infrap4d to not process any netlink events and lnp.p4 usecases will fail.

With the fix, we are ignoring EINTR error for Select systemcall

Signed-off-by: Sandeep N sandeep.nagapattinam@intel.com


ret = select(num_fds, &read_fds, NULL, NULL, NULL);
if (ret == -1) {
// Select system call being interrupted by a signal EINTR.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good comment. You explain why you are making this test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During automation we saw this issue, where intermittently switchlink_main thread is exited and none of the netlink messages are processed by infrap4d.
Upon debug observed that 'Select' system call is being interrupted by EINTR, this causes thread to exit.

While reading about this issue, figure out this can happen to any Blocking call. In our case we dont have any timeout for Select system call. Hence the chances of getting this interrupt is high.

@n-sandeep n-sandeep force-pushed the select_signal_fix branch 2 times, most recently from de71472 to 2576f0a Compare January 5, 2023 19:09
Here select is a blocking call since no timeout is specified,
so we have chances of interrupting via EINTR.
When EINTR is recived, switchlink_main thread is exiting.
This causes infrap4d to not process any netlink events and lnp.p4
usecases will fail.

With the fix, we are ignoring EINTR error for Select systemcall

Signed-off-by: Sandeep N <sandeep.nagapattinam@intel.com>
Copy link
Contributor

@ffoulkes ffoulkes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay once the indicated change is made

Copy link
Contributor

@ffoulkes ffoulkes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ffoulkes ffoulkes merged commit 031b579 into main Jan 6, 2023
@ffoulkes ffoulkes deleted the select_signal_fix branch January 6, 2023 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants