Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libfabric version2: Initial set of proposed changes #9384

Open
wants to merge 34 commits into
base: main
Choose a base branch
from

Commits on Oct 9, 2023

  1. prov/bgq: Remove provider

    Provider only supported by 1.x series
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 9, 2023
    Configuration menu
    Copy the full SHA
    76e2999 View commit details
    Browse the repository at this point in the history
  2. prov/usnic: Remove provider

    Provider only supported by v1.x series
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 9, 2023
    Configuration menu
    Copy the full SHA
    e231758 View commit details
    Browse the repository at this point in the history
  3. prov/rstream: Remove unfinished provider

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 9, 2023
    Configuration menu
    Copy the full SHA
    bbcf4ac View commit details
    Browse the repository at this point in the history
  4. prov/gni: Remove provider

    GNI is only supported by the v1.x series
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 9, 2023
    Configuration menu
    Copy the full SHA
    c1ae77e View commit details
    Browse the repository at this point in the history
  5. prov/netdir: Remove provider

    NetworkDirect support is supported by verbs provider.
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 9, 2023
    Configuration menu
    Copy the full SHA
    ac4400a View commit details
    Browse the repository at this point in the history

Commits on Oct 16, 2023

  1. prov/sockets: Remove provider

    Sockets provider is only supported in the v1.x series
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    692339f View commit details
    Browse the repository at this point in the history
  2. prov/tcp: Add support for FABRIC_DIRECT builds

    The sockets provider will be removed.  This adds the ability to
    verify the FABRIC_DIRECT build option and provide blank direct
    header file templates.
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    26380e4 View commit details
    Browse the repository at this point in the history
  3. core: Remove internally used definitions from public headers

    Several defines and values should not have been exposed in the
    public header files.  Remove or move the definitions into internal
    headers.  This removes the chance of possible conflicts with
    application definitions and API breakage.
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    964e193 View commit details
    Browse the repository at this point in the history
  4. core: Move FI_PRIORITY to internal flag

    Flag is only used between rxm and verbs.
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    57984a2 View commit details
    Browse the repository at this point in the history
  5. core: Remove FI_PROVIDER_SPECIFIC

    The value is constrained to 32-bit int flags, not u64 flags.
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    486d48d View commit details
    Browse the repository at this point in the history
  6. core: Remove unimplemented EP types

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    224195d View commit details
    Browse the repository at this point in the history
  7. core: Remove unimplemented FI_VARIABLE_MSG

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    d40be4a View commit details
    Browse the repository at this point in the history
  8. core: Remove unimplemented FI_XPU_TRIGGER

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    50717fc View commit details
    Browse the repository at this point in the history
  9. core: Remove unused FI_RESTRICTED_COMP and FI_NOTIFY_FLAGS_ONLY

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    d3ba9f8 View commit details
    Browse the repository at this point in the history
  10. core/av: Simplify the AV API

    Remove support for asynchronous insertions and AV_MAP.  The
    format of the fi_addr_t value will either be indexed based in
    the standard case or provider defined in more advanced use cases,
    based on the AV configuration (such as using auth_keys).
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    ec2be57 View commit details
    Browse the repository at this point in the history
  11. core: Move FI_BUFFERED_RECV to internal flag

    Remove FI_BUFFERED_RECV as an exported API option.  Since it's
    currently used internally between mrail and rxm, make it an
    internal only option.  It has a limited use case for multirail
    over rxm over connected endpoints where shared receive queues are
    not available.  With shared receive queues, the feature wouldn't
    be needed, as mrail could own the buffers outright.
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    c59d16c View commit details
    Browse the repository at this point in the history
  12. core: Document preferred threading model for scalable endpoints

    Recommend that applications and providers use FI_THREAD_COMPLETION
    as the preferred threading model for lockless operation when using
    scalable endpoints.  This helps align application design with the
    provider implementation.
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    ce9622f View commit details
    Browse the repository at this point in the history
  13. core: Simplify threading models

    Remove overly complicated threading models and focus on specific
    models to allow better alignment between application designs and
    provider implementation.
    
    Use FI_THREAD_DOMAIN as the preferred lockless threading model
    for standard endpoints.
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    f3ac4bc View commit details
    Browse the repository at this point in the history
  14. core: Simplify progress definition

    Combine data and control progress into one progress option.
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    1413486 View commit details
    Browse the repository at this point in the history
  15. core: Remove comp_order attribute

    Completions are always unordered.
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    1466a80 View commit details
    Browse the repository at this point in the history
  16. core: Remove total_buffered_recv

    Field was deprecated and only serves as a placeholder for
    compatility.
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    133d965 View commit details
    Browse the repository at this point in the history
  17. core: Remove fid_wait API

    Support for wait sets adds significant complexity to the provider
    implementation and is basically an abstraction around the OS
    constructs for poll/epoll (on Linux).  Remove the feature from the
    API, but keep the internal implementation for now.  This allows
    providers to move away from wait set support.
    
    Note that blocking support and support for native wait objects
    (e.g. epoll fd's) are still supported by the API.  Only the wait
    set abstraction is removed, which allows providers control over
    creating wait objects.
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    d629944 View commit details
    Browse the repository at this point in the history
  18. core: Remove fid_poll from the public API

    Poll sets are a simple iterator around progressing multiple objects.
    Remove from the API to reduce provider complexity.
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    32adc2c View commit details
    Browse the repository at this point in the history
  19. core: Remove FI_WAIT_MUTEX_COND support from API

    There's never been an implementation.
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    ed2f32e View commit details
    Browse the repository at this point in the history
  20. core: Remove deprecated MR mode options

    Remove FI_MR_BASIC/SCALABLE/UNSPEC.  These were deprecated in
    version 1.5.  Remove FI_LOCAL_MR, which was an earlier version of
    FI_MR_LOCAL and also deprecated.
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    4e40418 View commit details
    Browse the repository at this point in the history
  21. core: Remove support for async memory registration

    Feature is not implemented natively by providers.
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    e0652c1 View commit details
    Browse the repository at this point in the history
  22. core: Cleanup FI_ORDER flags

    Remove FI_ORDER_NONE (flag that's 0) and FI_ORDER_STRICT (which isn't a
    flag, and covers only a portion of the valid flags).
    
    Remove FI_ORDER_DATA, which will not be supported by version 2 in
    order to allow for greater provider optimization.
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    8cf5f5a View commit details
    Browse the repository at this point in the history
  23. core: Restrict endpoints to a single CQ

    Remove the option of directing transmit and receive completions
    to separate CQs for the same endpoint.  The option adds complexity
    at the provider and application levels.  This is largely the
    result of needing SW based protocols for certain operations, such
    as tag matching.  This either makes it necessary for the app to
    drive progress across multiple CQs, or the provider emulates the
    application's CQs in SW.
    
    This change updates the man page only.  Provider developers are
    left to update their code bases separately.
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    7ab212a View commit details
    Browse the repository at this point in the history
  24. core: Require using libfabric APIs to allocate fi_info structures

    Disallow users hand-crafting or hand-copying their own fi_info structs.
    This allows the library to allocate hidden fields for internal use.
    Plus, there's no need to apps to do this, given that the API call is
    way easier to use.
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    39a01fe View commit details
    Browse the repository at this point in the history
  25. core: Add fi_fabric2() API

    Add a new call that takes fi_info as input, which provides
    consistency with the domain and endpoint open calls.
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    5fb413c View commit details
    Browse the repository at this point in the history
  26. core/log: Replace fi_log_subsys with flags

    Subsystem filtering isn't useful (or likely ever used).  Remove
    the enum fi_log_subsys.  Instead convert the API to accept int
    flags.  This maintains API compatibility.  Update all current
    FI_LOG_xxx subsys values to 0.  This avoids needing to update
    the providers to the new API, forcing them to pass in 0 for
    the flags.
    
    No actual flag values are defined.  Those become a placeholder
    for future options.
    
    The logging checks are simplified by this change.
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    1feb6d8 View commit details
    Browse the repository at this point in the history
  27. docs: Add information on porting applications between v1 and v2

    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    ac80575 View commit details
    Browse the repository at this point in the history
  28. core: Add new peer group feature

    Introduce the concept of peer groups.  A peer group is a set of
    peers that are communicating together for some specific set of tasks.
    Peer groups provide a lower-level mapping of HPC and AI communicators.
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    5f3c619 View commit details
    Browse the repository at this point in the history
  29. core: Define new tag formats

    Allow specifying precise tag formatting options.  The mem_tag_format
    takes as input a set of bit fields.  In practice, this ends up being
    unusable to implement, resulting in the entire tag simply being
    masked with ignore bits.
    
    When the mem_tag_format value only has the lower bits set (< 256),
    interpret the format as specific options.  Two new options are
    defined, one aligned with MPI and the other with CCLs.  This
    information can be used by providers to optimize for the separate
    use cases.
    
    Signed-off-by: Sean Hefty <sean.hefty@intel.com>
    shefty committed Oct 16, 2023
    Configuration menu
    Copy the full SHA
    44246f8 View commit details
    Browse the repository at this point in the history