Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't return -1 if the socket was closed #15771

Merged
merged 3 commits into from Sep 8, 2023

Conversation

moonbreon
Copy link
Contributor

Summary

SSL_read() in netdata_ssl_read() returning 0 bytes is currently being treated as an error, and returning -1 instead. If the connection is closed on the other end, 0 bytes should be returned to the calling method, so the connection is properly closed on both sides.

Test Plan

Check that connector response methods now correctly handle a closed socket and don't treat 0 returns bytes as an error case.

Additional Information
For users: How does this change affect me?

@CLAassistant
Copy link

CLAassistant commented Aug 8, 2023

CLA assistant check
All committers have signed the CLA.

@tkatsoulas
Copy link
Contributor

cc @MrZammler @ktsaou

@ktsaou
Copy link
Member

ktsaou commented Aug 15, 2023

According to man SSL_real():

<=0: The read operation was not successful, because either the connection was closed, an error occurred or action must be taken by the calling process. Call SSL_get_error(3) with the return value ret to find out the reason.

So, I am not sure this PR does the right thing. Returning zero and -1 should both close the socket on our end, it just changes the handling in the code about error conditions and logs.

Still, I see in the code that we never actually return zero (EOF). So, we should do something about it. But I am not sure a zero return from SSL_read means EOF.

@tkatsoulas and @MrZammler please setup a test to see what exactly is happening. If netdata is not cleaning up ssl sockets properly, it is a major problem that needs to be fixed.

@ilyam8 this is related to openssl and I know you are trying to trace an SSL related problem when netdata is compiled under muslc. Please check this. Does it improve your situation?

@moonbreon
Copy link
Contributor Author

moonbreon commented Aug 15, 2023

That's fair. I think the better fix here, at least for problem I'm seeing, would be to add an explicit check for SSL_ERROR_ZERO_RETURN and return bytes (which would be 0 and that would cause the upstream call to close the connection properly).

I can update the PR with those changes if you'd like, or I can hold off until further investigation is completed?

@ilyam8
Copy link
Member

ilyam8 commented Aug 16, 2023

@ilyam8 this is related to openssl and I know you are trying to trace an SSL related problem when netdata is compiled under muslc. Please check this. Does it improve your situation?

I had the same thought and tried it - it doesn't help.

@ktsaou
Copy link
Member

ktsaou commented Aug 16, 2023

That's fair. I think the better fix here, at least for problem I'm seeing, would be to add an explicit check for SSL_ERROR_ZERO_RETURN and return bytes (which would be 0 and that would cause the upstream call to close the connection properly).

@moonbreon yes, this seems more reasonable. And I think it would be immediately mergeable.

@moonbreon
Copy link
Contributor Author

@ktsaou - I added an explicit check for zero SSL_ERROR_ZERO_RETURN. I also closed the socket in the final else case because every time I've tested this I've gotten back an error that would require closing the socket. Apart from explicitly handling every other possible error case, I think this is the best approach. The big risk with not closing it in the final else case is that with such a high probability the socket has already been closed on the other side (and nothing else to read), you basically sit in a while(true) { netdata_log_error() } and the log file blows up very quickly.

@ktsaou
Copy link
Member

ktsaou commented Aug 25, 2023

@moonbreon make this little change and this is ready to be merged

@moonbreon
Copy link
Contributor Author

moonbreon commented Aug 31, 2023

@moonbreon make this little change and this is ready to be merged

I'm not seeing any change requests?

libnetdata/socket/security.c Outdated Show resolved Hide resolved
@ktsaou
Copy link
Member

ktsaou commented Aug 31, 2023

Sorry... I didn't finalize the review.
Just change the return code to zero.

Copy link
Contributor

@thiagoftsm thiagoftsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR is working as expected. Streaming, Cloud and Exporters had expected results when I connected a child and reboot parent and exporting.
@ktsaou the last commit addressed the issue you raised, when you have a time, please, take a look.
Thank you very much @moonbreon !

@ilyam8
Copy link
Member

ilyam8 commented Sep 8, 2023

Do we need the same change for netdata_ssl_write()?

@ktsaou
Copy link
Member

ktsaou commented Sep 8, 2023

Do we need the same change for netdata_ssl_write()?

According to the write() docs:

Zero: It's technically possible for write() to return zero, indicating that zero bytes were written. However, for a
socket, this generally does not happen unless the size of the data to be written is zero.

So, I think is not needed there.

@ktsaou ktsaou merged commit 43b1a2e into netdata:master Sep 8, 2023
137 checks passed
@ktsaou
Copy link
Member

ktsaou commented Sep 8, 2023

Merged! Thank you @moonbreon !

stelfrag pushed a commit to stelfrag/netdata that referenced this pull request Sep 11, 2023
Ferroin pushed a commit that referenced this pull request Sep 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants