Skip to content

Commit

Permalink
fabric/cq-eq: Have application provide error buffer
Browse files Browse the repository at this point in the history
The current CQ/EQ document that the err_data referenced by
error events points to a provider owned data buffer.  This
results in serialization issues to the application.  Allow
the application to provide a data buffer for error data,
including the size of the input buffer.  Add a domain
attribute to report the maximum size of error data that a
provider may need.

Fixes #2720

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
  • Loading branch information
shefty committed Feb 15, 2017
1 parent 4d7a588 commit 2cf97ec
Show file tree
Hide file tree
Showing 5 changed files with 35 additions and 7 deletions.
1 change: 1 addition & 0 deletions include/rdma/fabric.h
Original file line number Diff line number Diff line change
Expand Up @@ -342,6 +342,7 @@ struct fi_domain_attr {
size_t mr_iov_limit;
uint64_t caps;
uint64_t mode;
size_t max_err_data;
};

struct fi_fabric_attr {
Expand Down
1 change: 1 addition & 0 deletions include/rdma/fi_eq.h
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,7 @@ struct fi_cq_err_entry {
int prov_errno;
/* err_data is available until the next time the CQ is read */
void *err_data;
size_t err_data_size;
};

enum fi_cq_wait_cond {
Expand Down
18 changes: 15 additions & 3 deletions man/fi_cq.3.md
Original file line number Diff line number Diff line change
Expand Up @@ -395,6 +395,7 @@ struct fi_cq_err_entry {
int err; /* positive error code */
int prov_errno; /* provider error code */
void *err_data; /* error data */
size_t err_data_size; /* size of err_data */
};
```

Expand Down Expand Up @@ -490,9 +491,20 @@ of these fields are the same for all CQ entry structure formats.
associated with an error. The use of this field and its meaning is
provider specific. It is intended to be used as a debugging aid. See
fi_cq_strerror for additional details on converting this error data into
a human readable string. Providers are allowed to reuse a single internal
buffer to store additional error information. As a result, error data
is only guaranteed to be available until the next time the CQ is read.
a human readable string.

*err_data_size*
: On input, err_data_size indicates the size of the err_data buffer in bytes.
On output, err_data_size will be set to the number of bytes copied to the
err_data buffer. The err_data information is typically used with
fi_cq_strerror to provide details about the type of error that occurred.

For compatibility purposes, if err_data_size is 0 on input, or the fabric
was opened with release < 1.5, err_data will be set to a data buffer
owned by the provider. The contents of the buffer will remain valid until a
subsequent read call against the CQ. Applications must serialize access
to the CQ when processing errors to ensure that the buffer referenced by
err_data does no change.

# COMPLETION FLAGS

Expand Down
7 changes: 7 additions & 0 deletions man/fi_domain.3.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@ struct fi_domain_attr {
size_t mr_iov_limit;
uint64_t caps;
uint64_t mode;
size_t max_err_data;
};
```

Expand Down Expand Up @@ -586,6 +587,12 @@ The operational mode bit related to using the domain.
to only be used with endpoints, transmit contexts, and receive contexts that
have the same set of capability flags.

## Max Error Data Size (max_err_data)

: The maximum amount of error data, in bytes, that may be returned as part of
a completion or event queue error. This value corresponds to the err_data_size
field in struct fi_cq_err_entry and struct fi_eq_err_entry.

# RETURN VALUE

Returns 0 on success. On error, a negative value corresponding to fabric
Expand Down
15 changes: 11 additions & 4 deletions man/fi_eq.3.md
Original file line number Diff line number Diff line change
Expand Up @@ -406,10 +406,17 @@ through the prov_errno and err_data fields. Users may call fi_eq_strerror to
convert provider specific error information into a printable string
for debugging purposes.

If err_data_size is > 0, then the buffer referenced by err_data is directly
user-accessible. The contents of the buffer will remain valid until a
subsequent read call against the EQ. Applications which read the err_data
buffer must ensure that they do not read past the end of the referenced buffer.
On input, err_data_size indicates the size of the err_data buffer in bytes.
On output, err_data_size will be set to the number of bytes copied to the
err_data buffer. The err_data information is typically used with
fi_eq_strerror to provide details about the type of error that occurred.

For compatibility purposes, if err_data_size is 0 on input, or the fabric
was opened with release < 1.5, err_data will be set to a data buffer
owned by the provider. The contents of the buffer will remain valid until a
subsequent read call against the EQ. Applications must serialize access
to the EQ when processing errors to ensure that the buffer referenced by
err_data does not change.

# NOTES

Expand Down

0 comments on commit 2cf97ec

Please sign in to comment.