From d4985bbb8e2b352c47fbff3168a0ef43cb55d45f Mon Sep 17 00:00:00 2001 From: Amir Shehata Date: Mon, 4 Mar 2024 13:32:29 -0500 Subject: [PATCH] man/linkx: Add the LINKx man page Add the LINKx provider man page. Signed-off-by: Amir Shehata --- man/fi_lnx.7.md | 156 +++++++++++++++++++++++++++++++++++++++++ man/man7/fi_lnx.7 | 173 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 329 insertions(+) create mode 100644 man/fi_lnx.7.md create mode 100644 man/man7/fi_lnx.7 diff --git a/man/fi_lnx.7.md b/man/fi_lnx.7.md new file mode 100644 index 00000000000..ce0bebb6744 --- /dev/null +++ b/man/fi_lnx.7.md @@ -0,0 +1,156 @@ +--- +layout: page +title: fi_lnx(7) +tagline: Libfabric Programmer's Manual +--- +{% include JB/setup %} + +# NAME + +fi_lnx \- The LINKx Provider + +# OVERVIEW + +The LINKx provider is designed to link two or more providers, allowing +applications to seamlessly use multiple providers or NICs. This provider uses +the libfabric peer infrastructure to aid in the use of the underlying providers. +This version of the provider currently supports linking the libfabric +shared memory provider for intra-node traffic and another provider for +inter-node traffic. Future releases of the provider will allow linking any +number of providers and provide the users with the ability to influence +the way the providers are utilized for traffic load. + +# SUPPORTED FEATURES + +This release contains an initial implementation of the LINKx provider that +offers the following support: + +*Endpoint types* +: The provider supports only endpoint type *FI_EP_RDM*. + +*Endpoint capabilities* +: LINKx is a passthrough layer on the send path. On the receive path LINKx + utilizes the peer infrastructure to create shared receive queues (SRQ). + Receive requests are placed on the SRQ instead of on the core provider + receive queue. When the provider receives a message it queries the SRQ for + a match. If one is found the receive request is completed, otherwise the + message is placed on the LINKx shared unexpected queue (SUQ). Further receive + requests query the SUQ for matches. + The first release of the provider only supports tagged and RMA operations. + Other message types will be supported in future releases. + +*Modes* +: The provider does not require the use of any mode bits. + +*Progress* +: LINKx utilizes the peer infrastructure to provide a shared completion + queue. Each linked provider still needs to handle its own progress. + Completion events will however be placed on the shared completion queue, + which is passed to the application for access. + +*Address Format* +: LINKx wraps the linked providers addresses in one common binary blob. + It does not alter or change the linked providers address format. It wraps + them into a LINKx structure which is then flattened and returned to the + application. This is passed between different nodes. The LINKx provider + is able to parse the flattened format and operate on the different links. + This assumes that nodes in the same group are all using the same version of + the provider with the exact same links. IE: you can't have one node linking + SHM+CXI while another linking SHM+RXM. + +*Message Operations* +: LINKx is designed to intercept message operations such as fi_tsenddata + and based on specific criteria forward the operation to the appropriate + provider. For the first release, LINKx will only support linking SHM + provider for intra-node traffic and another provider (ex: CXI) for inter + node traffic. LINKx send operation looks at the destination and based on + whether the destination is local or remote it will select the provider to + forward the operation to. The receive case has been described earlier. + +*Using the Provider* +: In order to use the provider the user needs to set FI_LINKX_PROV_LINKS + environment variable to the linked providers in the following format + shm+. This will allow LINKx to report back to the application in the + fi_getinfo() call the different links which can be selected. Since there are + multiple domains per provider LINKx reports a permutation of all the + possible links. For example if there are two CXI interfaces on the machine + LINKx will report back shm+cxi0 and shm+cxi1. The application can then + select based on its own criteria the link it wishes to use. + The application typically uses the PCI information in the fi_info + structure to select the interface to use. A common selection criteria is + the interface nearest the core the process is bound to. In order to make + this determination, the application requires the PCI information about the + interface. For this reason LINKx forwards the PCI information for the + inter-node provider in the link to the application. + +# LIMITATIONS AND FUTURE WORK + +*Hardware Support* +: LINKx doesn't support hardware offload; ex hardware tag matching. This is + an inherit limitation when using the peer infrastructure. Due to the use + of a shared receive queue which linked providers need to query when + a message is received, any hardware offload which requires sending the + receive buffers to the hardware directly will not work with the shared + receive queue. The shared receive queue provides two advantages; 1) reduce + memory usage, 2) coordinate the receive operations. For #2 this is needed + when receiving from FI_ADDR_UNSPEC. In this case both providers which are + part of the link can race to gain access to the receive buffer. It is + a future effort to determine a way to use hardware tag matching and other + hardware offload capability with LINKx + +*Limited Linking* +: This release of the provider supports linking SHM provider for intra-node + operations and another provider which supports the FI_PEER capability for + inter-node operations. It is a future effort to expand to link any + multiple sets of providers. + +*Memory Registration* +: As part of the memory registration operation, varying hardware can perform + hardware specific steps such as memory pinning. Due to the fact that + memory registration APIs do not specify the source or destination + addresses it is not possible for LINKx to determine which provider to + forward the memory registration to. LINkx, therefore, registers the memory + with all linked providers. This might not be efficient and might have + unforeseen side effects. A better method is needed to support memory + registration. + +*Operation Types* +: This release of LINKx supports tagged and RMA operations only. Future + releases will expand the support to other operation types. + +*Multi-Rail* +: Future design effort is being planned to support utilizing multiple interfaces + for traffic simultaneously. This can be over homogeneous interfaces or over + heterogeneous interfaces. + +# RUNTIME PARAMETERS + +The *LINKx* provider checks for the following environment variables: + +*FI_LINKX_PROV_LINKS* +: This environment variable is used to specify which providers to link. This + must be set in order for the LINKx provider to return a list of fi_info + blocks in the fi_getinfo() call. The format which must be used is: + ++... As mentioned earlier currently LINKx supports linking + only two providers the first of which is SHM followed by one other + provider for inter-node operations + +*FI_LINKX_DISABLE_SHM* +: By default this environment variable is set to 0. However, the user can + set it to one and then the SHM provider will not be used. This can be + useful for debugging and performance analysis. The SHM provider will + naturally be used for all intra-node operations. Therefore, to test SHM in + isolation with LINKx, the processes can be limited to the same node only. + +*FI_LINKX_SRQ_SUPPORT* +: Shared Receive Queues are integral part of the peer infrastructure, but + they have the limitation of not using hardware offload, such as tag + matching. SRQ is needed to support the FI_ADDR_UNSPEC case. If the application + is sure this will never be the case, then it can turn off SRQ support by + setting this environment variable to 0. It is 1 by default. + +# SEE ALSO + +[`fabric`(7)](fabric.7.html), +[`fi_provider`(7)](fi_provider.7.html), +[`fi_getinfo`(3)](fi_getinfo.3.html) diff --git a/man/man7/fi_lnx.7 b/man/man7/fi_lnx.7 new file mode 100644 index 00000000000..0992463bf69 --- /dev/null +++ b/man/man7/fi_lnx.7 @@ -0,0 +1,173 @@ +.\" Automatically generated by Pandoc 2.9.2.1 +.\" +.TH "fi_lnx" "7" "" "" "" +.hy +.PP +{% include JB/setup %} +.SH NAME +.PP +fi_lnx - The LINKx Provider +.SH OVERVIEW +.PP +The LINKx provider is designed to link two or more providers, allowing +applications to seamlessly use multiple providers or NICs. +This provider uses the libfabric peer infrastructure to aid in the use +of the underlying providers. +This version of the provider currently supports linking the libfabric +shared memory provider for intra-node traffic and another provider for +inter-node traffic. +Future releases of the provider will allow linking any number of +providers and provide the users with the ability to influence the way +the providers are utilized for traffic load. +.SH SUPPORTED FEATURES +.PP +This release contains an initial implementation of the LINKx provider +that offers the following support: +.TP +\f[I]Endpoint types\f[R] +The provider supports only endpoint type \f[I]FI_EP_RDM\f[R]. +.TP +\f[I]Endpoint capabilities\f[R] +LINKx is a passthrough layer on the send path. +On the receive path LINKx utilizes the peer infrastructure to create +shared receive queues (SRQ). +Receive requests are placed on the SRQ instead of on the core provider +receive queue. +When the provider receives a message it queries the SRQ for a match. +If one is found the receive request is completed, otherwise the message +is placed on the LINKx shared unexpected queue (SUQ). +Further receive requests query the SUQ for matches. +The first release of the provider only supports tagged and RMA +operations. +Other message types will be supported in future releases. +.TP +\f[I]Modes\f[R] +The provider does not require the use of any mode bits. +.TP +\f[I]Progress\f[R] +LINKx utilizes the peer infrastructure to provide a shared completion +queue. +Each linked provider still needs to handle its own progress. +Completion events will however be placed on the shared completion queue, +which is passed to the application for access. +.TP +\f[I]Address Format\f[R] +LINKx wraps the linked providers addresses in one common binary blob. +It does not alter or change the linked providers address format. +It wraps them into a LINKx structure which is then flattened and +returned to the application. +This is passed between different nodes. +The LINKx provider is able to parse the flattened format and operate on +the different links. +This assumes that nodes in the same group are all using the same version +of the provider with the exact same links. +IE: you can\[cq]t have one node linking SHM+CXI while another linking +SHM+RXM. +.TP +\f[I]Message Operations\f[R] +LINKx is designed to intercept message operations such as fi_tsenddata +and based on specific criteria forward the operation to the appropriate +provider. +For the first release, LINKx will only support linking SHM provider for +intra-node traffic and another provider (ex: CXI) for inter node +traffic. +LINKx send operation looks at the destination and based on whether the +destination is local or remote it will select the provider to forward +the operation to. +The receive case has been described earlier. +.TP +\f[I]Using the Provider\f[R] +In order to use the provider the user needs to set FI_LINKX_PROV_LINKS +environment variable to the linked providers in the following format +shm+. +This will allow LINKx to report back to the application in the +fi_getinfo() call the different links which can be selected. +Since there are multiple domains per provider LINKx reports a +permutation of all the possible links. +For example if there are two CXI interfaces on the machine LINKx will +report back shm+cxi0 and shm+cxi1. +The application can then select based on its own criteria the link it +wishes to use. +The application typically uses the PCI information in the fi_info +structure to select the interface to use. +A common selection criteria is the interface nearest the core the +process is bound to. +In order to make this determination, the application requires the PCI +information about the interface. +For this reason LINKx forwards the PCI information for the inter-node +provider in the link to the application. +.SH LIMITATIONS AND FUTURE WORK +.TP +\f[I]Hardware Support\f[R] +LINKx doesn\[cq]t support hardware offload; ex hardware tag matching. +This is an inherit limitation when using the peer infrastructure. +Due to the use of a shared receive queue which linked providers need to +query when a message is received, any hardware offload which requires +sending the receive buffers to the hardware directly will not work with +the shared receive queue. +The shared receive queue provides two advantages; 1) reduce memory +usage, 2) coordinate the receive operations. +For #2 this is needed when receiving from FI_ADDR_UNSPEC. +In this case both providers which are part of the link can race to gain +access to the receive buffer. +It is a future effort to determine a way to use hardware tag matching +and other hardware offload capability with LINKx +.TP +\f[I]Limited Linking\f[R] +This release of the provider supports linking SHM provider for +intra-node operations and another provider which supports the FI_PEER +capability for inter-node operations. +It is a future effort to expand to link any multiple sets of providers. +.TP +\f[I]Memory Registration\f[R] +As part of the memory registration operation, varying hardware can +perform hardware specific steps such as memory pinning. +Due to the fact that memory registration APIs do not specify the source +or destination addresses it is not possible for LINKx to determine which +provider to forward the memory registration to. +LINkx, therefore, registers the memory with all linked providers. +This might not be efficient and might have unforeseen side effects. +A better method is needed to support memory registration. +.TP +\f[I]Operation Types\f[R] +This release of LINKx supports tagged and RMA operations only. +Future releases will expand the support to other operation types. +.TP +\f[I]Multi-Rail\f[R] +Future design effort is being planned to support utilizing multiple +interfaces for traffic simultaneously. +This can be over homogeneous interfaces or over heterogeneous +interfaces. +.SH RUNTIME PARAMETERS +.PP +The \f[I]LINKx\f[R] provider checks for the following environment +variables: +.TP +\f[I]FI_LINKX_PROV_LINKS\f[R] +This environment variable is used to specify which providers to link. +This must be set in order for the LINKx provider to return a list of +fi_info blocks in the fi_getinfo() call. +The format which must be used is: ++\&... As mentioned earlier currently +LINKx supports linking only two providers the first of which is SHM +followed by one other provider for inter-node operations +.TP +\f[I]FI_LINKX_DISABLE_SHM\f[R] +By default this environment variable is set to 0. +However, the user can set it to one and then the SHM provider will not +be used. +This can be useful for debugging and performance analysis. +The SHM provider will naturally be used for all intra-node operations. +Therefore, to test SHM in isolation with LINKx, the processes can be +limited to the same node only. +.TP +\f[I]FI_LINKX_SRQ_SUPPORT\f[R] +Shared Receive Queues are integral part of the peer infrastructure, but +they have the limitation of not using hardware offload, such as tag +matching. +SRQ is needed to support the FI_ADDR_UNSPEC case. +If the application is sure this will never be the case, then it can turn +off SRQ support by setting this environment variable to 0. +It is 1 by default. +.SH SEE ALSO +.PP +\f[C]fabric\f[R](7), \f[C]fi_provider\f[R](7), \f[C]fi_getinfo\f[R](3)