-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to specify which provider to test #19
Comments
I'm not familiar enough with the libfabric API to know how to do it programmatically off the top of my head. You could maybe look at how Mercury selects providers in na_ofi.c. I'm not sure how well it will work with the code as structured, but you can also set a runtime environment variable (i.e. in the job scripts) that will restrict the set of providers that libfabric will allow. This would be the On Polaris we want to test "verbs,rxm" (two providers are required to use verbs in reliable datagram mode), on Crusher we want "cxi", and on Theta we want "gni", for example. I guess you could try setting that (or whatever is appropriate for your test platform) and see if the tests execute. Based on this discussion it sounds like we really need the test output to report what provider was used (independent of what was attempted) for validation. I don't know what provider is being selected by default in the tests thus far, but if it is the tcp provider that's not really the transport we want to be testing. |
When
When
When
|
How are you building libfabric (you can share your environment configuration if you are using Spack). It might be easiest to debug these kind of initialization problems by trying to launch the server in an interactive session. You can set the FI_LOG_LEVEL=debug environment variable to get more detailed information out of libfabric. |
For Crusher, 1.15.0 is provided. For Theta, I use |
I ran the test again by specifying the
to the test/wait.slurm script. I used the system libfabric.
I also could verify that the address returned by fabtget is different from tcp provider. Thus, I think fabtsuite seems to be able to test a different provider. |
@hyoklee can you confirm that the rest of the test suite passes on cxi? |
@carns , I tested the rest of suite today and they worked fine. Do you want me to update slurm job script to use CXI (e.g., cross.slurm)? Or just update documentation like FAQ? |
Thanks @hyoklee . Both if you don't mind. The script can be hardcoded to use cxi; that's likely to be the only thing we test on Crusher. The doc can describe more generically how to set the test to exercise a particular provider (cxi or otherwise). As a side note since we have mentioned platform-specific test scripts: the .slurm etc. files would be a little clearer if the names of the files included the machine name. There are a lot of slurm, qsub, etc. systems out there but what actually needs to be executed within the script is likely platform-specific. If the current naming is important to the overall test flow then maybe just a comment at the top of each one that says something like "# test script for the Polaris system @alcf". |
docs(faq): add FI_PROVIDER answer for #19
Libfabric library builds often have support for multiple providers built in. How do you control which one is tested?
Skimming through the code I don't immediately see programmatic API control over it in the C code or environment variable control over it in the job scripts.
The text was updated successfully, but these errors were encountered: