-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dedicated event for EPOLLHUP, EPOLLERR #1038
Comments
Hi, good question, but let's look at the existing bits. (all above about
And |
Thanks for the reply -- sorry for the late follow up from my side.
Based on this, I don't think
I understand that However, I don't think this approach will work given the current implementation:
Thus, it appears if only the Based on the above, there doesn't seem to be a way to listen for This can be resolved by either setting the |
@azat Assuming I haven't misunderstood something during my trace, I think there's two options:
Would be great to get your feedback as I'm adding socket timestamp support to a socket library that is also open source, and want to ensure that the change is upstream compatible. One concern about option (1) is possible side effects on existing use cases. In addition, dependencies, such as the open source socket library I'm modifying, will need to be able to determine whether the libevent version includes the modification or they won't be able to operate safely. |
@bschlinker hi!
Yes, please! (actually I missed this paragraph, sorry) Does your changes modifies all backends or only epoll?
Indeed reasonable
Yes this is tricky (actually comment for it says that it requires EV_FEATURE_EARLY_CLOSE, but who should care about comments, everything should just make sense)
Totally agree, let's go with a new event
Yes, any change (especially in the core part) should be backward compatible, so modifying EV_CLOSED is not an option
AFAICS it maybe useful, not for subscribing but for distinguishing in the event callback EPOLERR from EPOLLHUP (although he need to always subscribe to both to check hang up on error), for example polling of sysfs will return Also we need to check how this works in other backends, and if the behavior cannot be common for all we can add some feature like EV_FEATURE_EARLY_CLOSE to give the user information about this. So I would try add two separate events and see how it will looks... |
Thanks for the detailed response! Sounds like we're on the same page.
Unfortunately I have only done this for the epoll backend -- have never looked through the code for the others. I also did my initial patch on an older version of libevent that has a fork to add event I will start creating [1]: Relevant: https://github.com/facebook/folly/blob/master/folly/io/async/EventHandler.h#L47 |
|
@bschlinker by the way do you have any ETA? I'm planning to make a 2.2 alpha release (#1094) and want to include this change too |
Summary: Adding support for write and socket timestamps by introducing `ByteEvent` that can be delivered to observers. `AsyncTransport::WriteFlags` has long had timestamping related flags, such as `TIMESTAMP_TX`, but the code required to act on these flags only existed in proxygen. This diff generalizes the approach so that it works for other use cases of `AsyncSocket`. This diff is long, but much of it is unit tests designed to prevent regressions given the trickiness of socket timestamps and `ByteEvent`. **Each `ByteEvent` contains:** - Type (WRITE, SCHED, TX, ACK) - Byte stream offset that the timestamp is for (relative to the raw byte stream, which means after SSL in the case of AsyncSSLSocket) - `steady_clock` timestamp recorded by AsyncSocket when generating the `ByteEvent` - For SCHED, TX, and ACK events, if available, hardware and software (kernel) timestamps **How `ByteEvent` are used:** - Support is enabled when an observer is attached with the `byteEvents` config flag set. If the socket does not support timestamps, the observer is notified through the `byteEventsUnavailable` callback. Otherwise, `byteEventsEnabled` is called - When the application writes to a socket with `ByteEvent` support enabled and a relevant `WriteFlag`, SCHED/TX/ACK `ByteEvent` are requested from the kernel, and WRITE `ByteEvent` are generated by the socket for the *last byte* in the write. - If the entire write buffer cannot be written at once, then additional `ByteEvent` will also be generated for the last byte in each write. - This means that if the application wants to timestamp a specific byte, it must break up the write buffer before handing it to `AsyncSocket` such that the byte to timestamp is the last byte in the write buffer. - When socket timestamps arrive from the kernel via the socket error queue, they are transformed into `ByteEvent` and passed to observers **Caveats:** 1. Socket timestamps received from the kernel contain the byte's offset in the write stream. This counter is a `uint32_t`, and thus rolls over every ~4GB. When transforming raw timestamp into `ByteEvent`, we correct for this and transform the raw offset into an offset relative to the raw byte offset stored by `AsyncSocket` (returned via `getRawBytesWritten()`). 2. At the moment, a read callback must be installed to receive socket timestamps due to epoll's behavior. I will correct this with a patch to epoll, see libevent/libevent#1038 (comment) for details 3. If a msghdr's ancillary data contains a timestamping flag (such as `SOF_TIMESTAMPING_TX_SOFTWARE`), the kernel will enqueue a socket error message containing the byte offset of the write ( `SO_EE_ORIGIN_TIMESTAMPING`) even if timestamping has not been enabled by an associated call to `setsockopt`. This creates a problem: 1. If an application was to use a timestamp `WriteFlags` such as `TIMESTAMP_TX` without enabling timestamping, and if `AsyncSocket` transformed such `WriteFlags` to ancillary data by default, it could create a situation where epoll continues to return `EV_READ` (due to items in the socket error queue), but `AsyncSocket` would not fetch anything from the socket error queue. 2. To prevent this scenario, `WriteFlags` related to timestamping are not translated into msghdr ancillary data unless timestamping is enabled. This required adding a boolean to `getAncillaryData` and `getAncillaryDataSize`. Differential Revision: D24094832 fbshipit-source-id: e3bec730ddd1fc1696023d8c982ae02ab9b5fb7d
Summary: Adding support for write and socket timestamps by introducing `ByteEvent` that can be delivered to observers. `AsyncTransport::WriteFlags` has long had timestamping related flags, such as `TIMESTAMP_TX`, but the code required to act on these flags only existed in proxygen. This diff generalizes the approach so that it works for other use cases of `AsyncSocket`. This diff is long, but much of it is unit tests designed to prevent regressions given the trickiness of socket timestamps and `ByteEvent`. **Each `ByteEvent` contains:** - Type (WRITE, SCHED, TX, ACK) - Byte stream offset that the timestamp is for (relative to the raw byte stream, which means after SSL in the case of AsyncSSLSocket) - `steady_clock` timestamp recorded by AsyncSocket when generating the `ByteEvent` - For SCHED, TX, and ACK events, if available, hardware and software (kernel) timestamps **How `ByteEvent` are used:** - Support is enabled when an observer is attached with the `byteEvents` config flag set. If the socket does not support timestamps, the observer is notified through the `byteEventsUnavailable` callback. Otherwise, `byteEventsEnabled` is called - When the application writes to a socket with `ByteEvent` support enabled and a relevant `WriteFlag`, SCHED/TX/ACK `ByteEvent` are requested from the kernel, and WRITE `ByteEvent` are generated by the socket for the *last byte* in the write. - If the entire write buffer cannot be written at once, then additional `ByteEvent` will also be generated for the last byte in each write. - This means that if the application wants to timestamp a specific byte, it must break up the write buffer before handing it to `AsyncSocket` such that the byte to timestamp is the last byte in the write buffer. - When socket timestamps arrive from the kernel via the socket error queue, they are transformed into `ByteEvent` and passed to observers **Caveats:** 1. Socket timestamps received from the kernel contain the byte's offset in the write stream. This counter is a `uint32_t`, and thus rolls over every ~4GB. When transforming raw timestamp into `ByteEvent`, we correct for this and transform the raw offset into an offset relative to the raw byte offset stored by `AsyncSocket` (returned via `getRawBytesWritten()`). 2. At the moment, a read callback must be installed to receive socket timestamps due to epoll's behavior. I will correct this with a patch to epoll, see libevent/libevent#1038 (comment) for details 3. If a msghdr's ancillary data contains a timestamping flag (such as `SOF_TIMESTAMPING_TX_SOFTWARE`), the kernel will enqueue a socket error message containing the byte offset of the write ( `SO_EE_ORIGIN_TIMESTAMPING`) even if timestamping has not been enabled by an associated call to `setsockopt`. This creates a problem: 1. If an application was to use a timestamp `WriteFlags` such as `TIMESTAMP_TX` without enabling timestamping, and if `AsyncSocket` transformed such `WriteFlags` to ancillary data by default, it could create a situation where epoll continues to return `EV_READ` (due to items in the socket error queue), but `AsyncSocket` would not fetch anything from the socket error queue. 2. To prevent this scenario, `WriteFlags` related to timestamping are not translated into msghdr ancillary data unless timestamping is enabled. This required adding a boolean to `getAncillaryData` and `getAncillaryDataSize`. Differential Revision: D24094832 fbshipit-source-id: e3bec730ddd1fc1696023d8c982ae02ab9b5fb7d
@azat I started working on this issue and think I have a reasonable cut at it. This raised two questions:
Any other feedback is welcome of course. |
Sorry for such a huge delay.
Where do you see 2.8MB? P.S. it is generated via |
I'd like to add another event type that enables notification of
EPOLLHUP
andEPOLLERR
without requiring a subscription toEV_READ
orEV_WRITE
.I've done a rough cut of this and would like to upstream.
Why this is useful:
EPOLLHUP
andEPOLLERR
are currently delivered as(EV_READ | EV_WRITE)
EV_READ
orEV_WRITE
, but still want to be notified ofEPOLLHUP
/EPOLLERR
:EV_READ
messages to put back pressure on the transport and may not be subscribed toEV_WRITE
as there are no writes pending.Questions:
EV_HUPERR
, that captures EPOLLHUP and EPOLLERR. I could split into two separate events, but I think most use cases will want both. Looking for feedback on whether flexibility should be prioritizedEV_HUPERR
, orEV_ERRHUP
, or anything else. There was some relevant discussion in libevent hides POLLERR #495Seems related to #345; using a new issue since that one is 4 years old.
The text was updated successfully, but these errors were encountered: