-
Notifications
You must be signed in to change notification settings - Fork 378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add core and usnic support for fid_nic #4433
Conversation
info->handle is a fid, which includes a type field. We either need to narrow the scope of what prov_attr can be (e.g. a pointer to a single allocated memory region), or find a way to hand it to the provider for processing. |
f54ed9a
to
653641a
Compare
Ok, fixed the code to check for
|
653641a
to
d7efe4b
Compare
src/fabric.c
Outdated
free(nic->prov_attr); | ||
} | ||
free(nic); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this to a separate function? I haven't looked and don't remember, but I'm guessing fi_dupinfo needs to be updated as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to a subroutine. Also added handling for fi_dupinfo()
.
d7efe4b
to
5b7d74b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to see what we can do for the prov_attr. But I'm fine having whatever that is go in as separate changes.
prov/usnic/src/usdf_fabric.c
Outdated
|
||
free(nic->prov_attr); | ||
free(nic); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you call ofi_free_nic() instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup! Fixed.
40f75ca
to
81db8ef
Compare
Ok, cleaned up the comments and the commit messages. @bturrubiates usnic
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
core changes look good
81db8ef
to
bbff01c
Compare
For the prov_attr, we need to extend struct fi_provider if we want to be able to print them or allow a complex structure. So, I guess the first decision is how flexible do we want provider attributes to be? My current thinking is to add tostr(), allocinfo, and freeinfo() calls to fi_provider. The provider would only be responsible for handling provider specific portions of input data. |
Did you mean allocnic and freenic? Or were you specifically being more general in terms of allocating/freeing an entire info? Can you flesh out a little more about what you're thinking for these 2 functions? |
I was considering adding the other external API calls to the provider, but the provider would only deal with the provider specific pieces. For example, today if an app calls fi_dupinfo() on an fi_info where the handle is set to FI_CLASS_CONNREQ, the handle is ignored. So, I'd like to have a more generic solution. These would be the additions:
There needs to be enough information in whatever input data there is in order to route the call to the correct provider. So, not all calls into the main API can find their way to the provider, even it provider specific data is present. For dupinfo/freeinfo, the provider only deals with the handle field. But maybe this would allow a provider to store data beyond the end of the user visible structure(s)? |
|
||
nic = calloc(1, sizeof(*nic)); | ||
if (!nic) | ||
goto nomem; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nomem
label makes a call to ofi_free_nic
if the calloc
fails. With the current version of the code, that will cause a seg fault because ofi_free_nic
is not safe if a NULL
pointer is passed in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed ofi_free_nic()
to handle NULL.
src/fabric.c
Outdated
@@ -481,6 +481,31 @@ FI_DESTRUCTOR(fi_fini(void)) | |||
ofi_osd_fini(); | |||
} | |||
|
|||
void ofi_free_nic(struct fid_nic *nic) | |||
{ | |||
if (nic->device_attr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to handle a NULL
nic pointer in here somehow?
Maybe wrap this in one more if
-block, or maybe make all of these something like:
if (nic && nic->device_attr) {
}
prov/usnic/src/usdf_fabric.c
Outdated
if (ret < 0) { | ||
USDF_DBG( | ||
"asprintf failed while creating device_id\n"); | ||
ret = -ENOMEM; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're missing a goto
in this cleanup path. You set ret
but then overwrite it immediately afterward.
prov/usnic/src/usdf_fabric.c
Outdated
if (ret < 0) { | ||
USDF_DBG( | ||
"asprintf failed while creating device_version\n"); | ||
ret = -ENOMEM; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, think you're missing a goto
.
prov/usnic/src/usdf_fabric.c
Outdated
if (ret < 0) { | ||
USDF_DBG( | ||
"asprintf failed while creating vendor_id\n"); | ||
ret = -ENOMEM; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
goto
missing.
prov/usnic/src/usdf_fabric.c
Outdated
if (!la) | ||
goto nomem; | ||
uint8_t *p = (uint8_t*) &dap->uda_ipaddr_be; | ||
ret = asprintf(&la->address, "%d.%d.%d.%d/%d", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that we use anything other than v4, but if we ever do support anything else this conversion can probably be replaced with inet_ntop
.
prov/usnic/src/usdf_fabric.c
Outdated
if (ret < 0) { | ||
USDF_DBG( | ||
"asprintf failed while creating link address\n"); | ||
ret = -ENOMEM; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing goto
.
src/fabric.c
Outdated
@@ -507,6 +532,10 @@ void DEFAULT_SYMVER_PRE(fi_freeinfo)(struct fi_info *info) | |||
free(info->fabric_attr->prov_name); | |||
free(info->fabric_attr); | |||
} | |||
if (info->handle && info->handle->fclass == FI_CLASS_NIC) { | |||
struct fid_nic *nic = (struct fid_nic*) info->handle; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need this, you can just directly ofi_free_nic(info->handle);
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The types are different -- a cast of some flavor or another is necessary.
I chose to make the param of ofi_free_nic()
be (struct fid_nic *)
because, well, you're freeing a struct fid_nic
, not a generic fid
. So I figured the onus should be on the caller to make sure the right type is passed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The types are different -- a cast of some flavor or another is necessary.
Ooops, I thought handle
here was void *
.
src/fabric.c
Outdated
la_dup->mtu = la->mtu; | ||
la_dup->speed = la->speed; | ||
la_dup->state = la->state; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that it's very important, but should you #undef MEMBER_STRDUP
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure; I have no strong opinions about this. Done.
src/fabric.c
Outdated
@@ -853,6 +957,14 @@ struct fi_info *DEFAULT_SYMVER_PRE(fi_dupinfo)(const struct fi_info *info) | |||
if (dup->fabric_attr->prov_name == NULL) | |||
goto fail; | |||
} | |||
if (info->handle && info->handle->fclass == FI_CLASS_NIC) { | |||
int ret; | |||
ret = ofi_dup_nic((struct fid_nic*) info->handle, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're not using the return value, and you don't need the cast. This can be expressed clearer as:
if (ofi_dup_nic(info->handle, &dup->handle) < 0)
goto fail;
We discussed this some at today's ofiwg. @j-xiong made a suggestion to add function pointers to prov_attr, basically turning it into a struct fid. Along those lines, we have struct fid_nic. We can add additional ops to fid_nic for handling the prov_attr, or use the existing fid_nic::control() function to handle the prov_attr. For example, fid_nic::close() could be called to free the nic data from fi_freeinfo(). Internally, the provider can call the generic helper ofi_free_nic() to free the common structures, then free the prov_attr themselves. Similarly, we can define FI_CONTROL codes for fid_nic that would copy or print the NIC attributes. A generic helper would handle the common structures, but the provider would handle the prov_attr field. |
8a9d812
to
8795e93
Compare
prov/usnic/src/usdf_fabric.c
Outdated
ba->bus_type = FI_BUS_UNKNOWN; | ||
|
||
la = calloc(1, sizeof(*la)); | ||
if (!la) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, remove braces here.
src/fabric.c
Outdated
ofi_free_nic(nic_dup); | ||
return -FI_ENOMEM; | ||
|
||
#undef MEMBER_STRUP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo?
8795e93
to
00c050c
Compare
@bturrubiates Sigh. Where is my brain today? Fixed. |
Any idea why AppVeyor is failing? That seems to have nothing to do with this PR:
See https://ci.appveyor.com/project/shefty/libfabric/build/4384/job/j40ldvxy4097d5by#L2146 |
I restarted appveyor to see if the error re-appears. I can't recall any recent changes that would have broken windows. |
@shefty Failed again, but a different test this time -- https://ci.appveyor.com/project/shefty/libfabric/build/4387/job/i7nbp95e1w3v249x#L2037:
|
Other recent PRs passed appveyor, so maybe there's something in this change that the existing tests aren't handling, or there's some other issue. The changes you've made to the core look okay to me though. |
I am in the process of pulling in the common pieces of this PR into a separate series, with an updated proposal to handle the prov_attr component. Please allow that PR to merge first, then we can update the usnic provider accordingly. |
@shefty no problem |
Please take a look at #4443. That contains a proposal for dealing with the prov_attr by making use of the fid_nic component of the structure. |
FYI - I have been sidetracked from this, but I do plan on getting back to it. I have some patches in progress locally to add a fid_nic to fi_info. But that breaks the ABI, so it requires some additional compatibility code. |
Me too -- I was planning on refreshing this PR according to the new fid_nic code on master "soon". |
00c050c
to
c787f7e
Compare
@shefty @bturrubiates Revamped, now there's two commits:
|
60bdc3a
to
6c82f8a
Compare
I have a patch to fixup the fid tostr function. I still need to update the ABI compat area to extend fi_info. |
See PR #4495 for what I have. I also have an incomplete patch locally to address the ABI change |
In theory, the core has all the updates that are needed now. This PR should just need the usnic specific patch now (c787f7e). |
6c82f8a
to
be356c2
Compare
@bturrubiates All previous review comments addressed, and updated to utilize fid_nic core infrastructure. This PR now ready for review. Output for usnic looks like this:
|
la = nic->link_attr; | ||
|
||
socklen_t size = 32; | ||
la->address = calloc(1, size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use INET_ADDRSTRLEN
.
|
||
nic = ofi_nic_dup(NULL); | ||
if (!nic) | ||
goto nomem; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nomem
label will dereference nic
, so this will likely cause a crash in the error path.
prov/usnic/src/usdf_fabric.c
Outdated
if (!da->firmware) | ||
goto nomem; | ||
|
||
ba = nic->bus_attr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor thing, but you probably don't need to use a variable for ba
since you only use it once. Probably easier to just nic->bus_addr->bus_type = FI_BUS_UNKNOWN
.
The usnic driver does not currently export PCI information, so that data is left empty in the fid_nic. Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
be356c2
to
8d97e71
Compare
@bturrubiates Fixed. |
This PR adds support for https://ofiwg.github.io/libfabric/master/man/fi_nic.3.html.
However, there are (at least) 2 problems that need to be resolved:
EDIT: there's only 1 problem left
fid_nic.prov_attr
is only understood by the provider.tostr()
it?fi_freeinfo()
)?Do we need additional provider ops for these functions?