You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is this a new feature, an improvement, or a change to existing functionality?
Change
How would you describe the priority of this feature request
High
Please provide a clear description of problem this feature solves
Currently Morpheus assumes an available user cpuset range hardcoded from 0 up to the number of threads (processors) minus one when setting the options for a MRC/SRF Executor in a pipeline. However, there may be certain environments where cpusets have been created for users that restrict CPU access to certain groups of processes (e.g., Slurm).
MRC does not make this same assumption and instead evaluates the hwloc topology of visible CPU and compares that to the user_cpuset that has been configured.
If the intersection of the two sets is null, MRC errors out and the pipeline fails with stacktrace.
Describe your ideal solution
Not sure which is ideal but:
provide an option for Morpheus to pass in a usable cpuset (user responsibility)
Morpheus doesn't do any cpuset configuration and instead defers to MRC to make a decision, possibly guided by a handful of configurable algorithms
MRC exposes an interface for the topology queries it is already doing prior to an Executor being built and Morpheus can fail more gracefully informing the user they must choose a usable cpuset from the topology query
Describe any alternatives you have considered
No response
Additional context
====Registering Pipeline====
Error occurred during Pipeline.build(). Exiting.
Traceback (most recent call last):
File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/pipeline/pipeline.py", line 277, in build_and_start
self.build()
File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/pipeline/pipeline.py", line 175, in build
self._srf_executor = srf.Executor(self._exec_options)
RuntimeError: intersection between user_cpuset and topo_cpuset is null
Traceback (most recent call last):
File "/data/sdp/cybersecurity_ai/files/pass_thru/run_passthru.py", line 40, in <module>
Exception occurred in pipeline. Rethrowing
Traceback (most recent call last):
File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/pipeline/pipeline.py", line 251, in join
await self._srf_executor.join_async()
AttributeError: 'NoneType' object has no attribute 'join_async'
====Pipeline Complete====
run_pipeline()
File "/data/sdp/cybersecurity_ai/files/pass_thru/run_passthru.py", line 37, in run_pipeline
pipeline.run()
File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/pipeline/pipeline.py", line 517, in run
asyncio.run(self._do_run())
File "/opt/conda/envs/morpheus/lib/python3.8/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/envs/morpheus/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/pipeline/pipeline.py", line 495, in _do_run
await self.join()
File "/opt/conda/envs/morpheus/lib/python3.8/site-packages/morpheus/pipeline/pipeline.py", line 251, in join
await self._srf_executor.join_async()
AttributeError: 'NoneType' object has no attribute 'join_async'
Code of Conduct
I agree to follow this project's Code of Conduct
I have searched the open feature requests and have found no duplicates for this feature request
The text was updated successfully, but these errors were encountered:
Is this a new feature, an improvement, or a change to existing functionality?
Change
How would you describe the priority of this feature request
High
Please provide a clear description of problem this feature solves
Currently Morpheus assumes an available user cpuset range hardcoded from 0 up to the number of threads (processors) minus one when setting the options for a MRC/SRF Executor in a pipeline. However, there may be certain environments where cpusets have been created for users that restrict CPU access to certain groups of processes (e.g., Slurm).
https://github.com/nv-morpheus/Morpheus/blob/branch-23.01/morpheus/pipeline/pipeline.py#L70
MRC does not make this same assumption and instead evaluates the hwloc topology of visible CPU and compares that to the user_cpuset that has been configured.
https://github.com/nv-morpheus/MRC/blob/branch-23.01/cpp/mrc/src/internal/system/topology.cpp#L141
If the intersection of the two sets is null, MRC errors out and the pipeline fails with stacktrace.
Describe your ideal solution
Not sure which is ideal but:
Describe any alternatives you have considered
No response
Additional context
Code of Conduct
The text was updated successfully, but these errors were encountered: