-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TL collective plugin interface #156
Conversation
7032390
to
e9275bd
Compare
@vspetrov I successfully created a new plugin, no major issue there. But I am not sure i understand the mpirun parameters to use to activate the plugin during the execution of a job. My understanding is the following, please let me know if i got something wrong:
Unfortunately, my plugin is not being picked up at runtime and i do not see what i am doing wrong. Could you please help? Thanks. |
@gvallee you TUNE syntax looks incorrect. See wiki: https://github.com/openucx/ucc/wiki/FAQ#6-what-is-tl-scoring-and-how-to-select-a-certain-tl do you see any output from UCC regarding wrong syntax ? |
No warning whatsoever and i do not understand the documentation pointed by your link. There is no list of possible values so i am sure the documentation works well for someone who has some understanding of what needs to be done, but i personally do not get it at all. I will dig into the code. |
@gvallee lets try with just UCC_TL_UCP_TLCP_EXAMPLE_TUNE=inf |
It works for the example plugin but not my plugin so something is wrong with my plugin. I will investigate. Thanks for your help! |
@vspetrov All done and it works. No suggestion for changes at this point other than it might be useful to add a version to guarantee the compatibility with the UCC where the plugin may be dropped. I think this could wait, it is not really required at the moment. It would be nice if this PR could be merged so that I can start to really work on the plugin and the code be as close as possible to the main branch (to avoid API change issues and so on). Thanks for your work! |
e9275bd
to
459c06c
Compare
459c06c
to
526e056
Compare
526e056
to
75e8074
Compare
@Sergei-Lebedev addressed |
75e8074
to
b2a69b0
Compare
b2a69b0
to
b8de611
Compare
What
PR adds the interface for the custom/closed/vendor plugins inside TL, i.e. plugin at the collective algorithm implementation level
Why ?
The use case is: given the existing open source TL (TL/UCP is the most obvious example) provide a way for 3rd party implement an algorithm re-using TL/UCP resources and functionality and distribute this algorithm as a closed plug-in.
How ?
The tl level iface is provided. Added a necessary logic to the build process that searches for the plugins. The build of plugins can be enabled with "--with-tlcp", where tlcp stands for "tl collective plugin".
Example reference implementation of a TL/UCP plugin is added in components/tl/ucp/coll_plugins/example.
The development flow for the 3rd party: fork UCC repo, add a "git submodule" with the code of the plugin. The code base of a plugin can be stored separately. The sync with the main ucc becomes easy and smooth: git pull. Since plugin code is in a separate folder (submodule) no merge issues.
The integration of a plugin algorithm into score-based selection logic is made naturally. Plugin provides "get_scores" interface and thus reports to the TL dering team creation. This allows vendor to define which collective, which msg range, which mem type thier plugin will run on. And this can be altered in runtime with the same parameter "SCORE". For the example plugin above it is: UCC_TL_UCP_TLCP_EXAMPLE_SCORE.
Plugin may implement several algorithms for different collectives if needed. So, no more than 1 plugin from single vendor is required.