Skip to content

Backward compatibility for configs of pools/providers #844

@vinser52

Description

@vinser52

Rationale

UMF should provide backward compatible interfaces.

Description

Background

MPI team experienced the issue after the PR #692 extends level_zero_memory_provider_params_t config structure. The root cause of the issue relates to how MPI instantiate/initialize the L0 provider config. They did the following:

level_zero_memory_provider_params_t l0_config = {
    .level_zero_context_handle = hContext;
    .level_zero_device_handle = hDevice;
    .memory_type = UMF_MEMORY_TYPE_DEVICE;
};

umf_memory_provider_handle_t hProvider = NULL;
umf_memory_provider_ops_t *l0_ops = umfLevelZeroMemoryProviderOps();
umfMemoryProviderCreate(l0_ops, &l0_config, &hProvider);

After PR #692 the code above start crashing because two new fields were added to the level_zero_memory_provider_params_t data structure and the example above does not initialize these new fields:

typedef struct level_zero_memory_provider_params_t {
    ze_context_handle_t level_zero_context_handle;
    ze_device_handle_t level_zero_device_handle;
    umf_usm_memory_type_t memory_type;

    // New fields
    ze_device_handle_t *resident_device_handles;
    uint32_t resident_device_count;
} level_zero_memory_provider_params_t;

A quick fix for the issue was to init the config data structure with 0 and than assign the required fields:

level_zero_memory_provider_params_t l0_config = { 0 }; // zero-initialize
l0_config.level_zero_context_handle = hContext;
l0_config.level_zero_device_handle = hDevice;
l0_config.memory_type = UMF_MEMORY_TYPE_DEVICE;

Open Questions

The quick fix above works only because it is OK to init fields of the level_zero_memory_provider_params_t structure with zeroes. But there are 2 related major question we should address:

1. How to initialize the configs with default values?

In general case, not every field of the config data structure could/should be initialized with zeroes.

2. How to support backward compatibility?

Config data structures are defined in the headers. If application was built with old version of UMF but on the system there is a newer version than even the application initialize all config's fields properly it initializes only fields that exists in the old version of UMF.
Consider an example. There is a provider_foo and corresponding foo_config_t structure that contains an int field in the 1st version:

// provider_foo.h:
typedef struct foo_params_t {
    int field 1;
} foo_config_t;

// provider_foo.c:
umf_result_t foo_provider_initialize(void *params, void **provider) {
    foo_params_t *foo_params = (foo_params_t *)params;
    // expected sizeof(foo_params_t) is equal sizeof(int) that is 4 bytes

    assert(foo_params->field1 == 0);
}

// application code:
int main() {
    foo_params_t foo_params = {0};

    umf_memory_provider_handle_t hProvider = NULL;
    umf_memory_provider_ops_t *foo_ops = umfFooMemoryProviderOps();
    umfMemoryProviderCreate(foo_ops, &foo_params, &hProvider);
}

Now in the 2nd version of UMF we extend the foo_params_t structure with additional field:

// provider_foo.h:
typedef struct foo_params_t {
    int field 1;
    int field2;
} foo_config_t;

// provider_foo.c:
umf_result_t foo_provider_initialize(void *params, void **provider) {
    foo_params_t *foo_params = (foo_params_t *)params;
    // expected sizeof(foo_params_t) is equal 2*sizeof(int) that is 8 bytes

    assert(foo_params->field1 == 0);
    assert(foo_params->field2 == 0); // ERROR: if application was compiled with 1st version of the UMF lib then the size of memory pointed by `params` is 4 bytes
}

Possible API Changes

Option 1: handle-based approach

Do not expose the config structure in interfaces. The config object is allocated inside libumf.so and handle to the config is returned to the client code. Setter/getter APIs are used to setup config parameters. For example:

foo_params_handle_t foo_params = umfCreateFooParams();
umfFooParamsSetField1(foo_params, 7);

umf_memory_provider_handle_t hProvider = NULL;
umf_memory_provider_ops_t *foo_ops = umfFooMemoryProviderOps();
umfMemoryProviderCreate(foo_ops, foo_params, &hProvider);

Client code does not depend on the layout of the foo_params_t structure only UMF knows it and can change in different versions.

Option 2: explicitly pass the size of params

Data structures that define configs for the pools/providers remains in headers (the same as today). The new fields can be added only to the end. Application explicitly passes the size of the config data structure. Provider/pool implementation determine the version of the config based on the size. To init the config data structure we need to introduce a special macros or header-based inline functions. For example:

// Version 1 provider_foo.h:
typedef struct foo_params_t {
    int field 1;
} foo_config_t;

inline void umfFooParamsInit(foo_params_t *params) {
     params->field1 = 0;
}

// Version 2 provider_foo.h:
typedef struct foo_params_t {
    int field 1;
    int field2;
} foo_config_t;

inline void umfFooParamsInit(foo_params_t *params) {
     params->field1 = 0;
     params->field2 = 0;
}

// Version 2 provider_foo.c:
umf_result_t foo_provider_initialize(void *params, size_t params_size, void **provider) {
    foo_params_t foo_params;
    umfFooParamsInit(& foo_params);

    if(sizeof(foo_params_t) == params_size) {
         // current version, just copy input params as is.
        foo_params = *(foo_params_t*)params;
    } else {
        // old version, copy only first `params_size` bytes and keep the rest default initialized
        memcpy(& foo_params, params, params_size);
    }

    assert(foo_params->field1 == 0);
    assert(foo_params->field2 == 0);
}

// application code:
int main() {
    foo_params_t foo_params;
    umfFooParamsInit(&params);

    umf_memory_provider_handle_t hProvider = NULL;
    umf_memory_provider_ops_t *foo_ops = umfFooMemoryProviderOps();
    umfMemoryProviderCreate(foo_ops, &foo_params, sizeof(foo_params_t), &hProvider);
}

Option 3: store version in the params data structure

Similar to Option 2, but instead of using the size of params data structure to determine the version store it explicitly as a first field in the params data structure.

// Version 1 provider_foo.h:
typedef struct foo_params_t {
    int version;
    int field 1;
} foo_config_t;

inline void umfFooParamsInit(foo_params_t *params) {
    version = 1;
    params->field1 = 0;
}

// Version 2 provider_foo.h:
typedef struct foo_params_t {
    int field 1;
    int field2;
} foo_config_t;

inline void umfFooParamsInit(foo_params_t *params) {
    version = 2;
    params->field1 = 0;
    params->field2 = 0;
}

// Version 2 provider_foo.c:
umf_result_t foo_provider_initialize(void *params, size_t params_size, void **provider) {
    foo_params_t foo_params;
    umfFooParamsInit(& foo_params);

    int params_version = *(int *)params;

    switch(params_version) {
        case 1:
            ...
            break;
        case 2:
            ...
            break;
        default:
            LOG_ERR("Wrong version");
    }

    assert(foo_params->field1 == 0);
    assert(foo_params->field2 == 0);
}

// application code:
int main() {
    foo_params_t foo_params;
    umfFooParamsInit(&params);

    umf_memory_provider_handle_t hProvider = NULL;
    umf_memory_provider_ops_t *foo_ops = umfFooMemoryProviderOps();
    umfMemoryProviderCreate(foo_ops, &foo_params, &hProvider);
}

Design Considerations

I prefer Option 1 over Option 2 and Option 3. I think it provides best ABI compatibility. Another advantage is that the config data structure always initialized when user calls umfCreateFooParams(). There is no way to get non-initialized fields in the config, while in case of Option 2 or 3 user might forget to call umfFooParamsInit(&params) function.

Between Options 2 and 3, I prefer Option 3 because of two things:

  • it stores version explicitly.
  • the umfMemoryProviderCreate API remains unchanged.

Implementation details

TBD

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions