-
-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] [Submitit-Plugin] (Potentially a bug) Impossible to set certain flags in submitit launcher #1366
Comments
Thanks for reporting @Queuecumber ! This sounds like a issue with the Hydra default config not having the params as Optional Could you share a minimal repro of your configuration? |
It's more than that unfortunately, I tried making them optional and that's
where I run into the described problems with the auto executor.
…On Mon, Feb 1, 2021, 20:14 Jieru Hu ***@***.***> wrote:
Thanks for reporting @Queuecumber <https://github.com/Queuecumber> !
This sounds like a issue with the Hydra default config not having the
params as Optional
<https://github.com/facebookresearch/hydra/blob/master/plugins/hydra_submitit_launcher/hydra_plugins/hydra_submitit_launcher/config.py#L17-L23>
Could you share a minimal repro of your configuration?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1366 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABMX7KYRJ4ZA37KLBPG4G33S45GXXANCNFSM4W35YCAA>
.
|
Got it, thanks. It would still be helpful if we can get a minimal repro here when you get a chance. |
Yeah for sure I'll post one tomorrow
…On Mon, Feb 1, 2021, 20:29 Jieru Hu ***@***.***> wrote:
It's more than that unfortunately, I tried making them optional and that's
where I run into the described problems with the auto executor.
… <#m_2891848001268731531_>
On Mon, Feb 1, 2021, 20:14 Jieru Hu *@*.***> wrote: Thanks for reporting
@Queuecumber <https://github.com/Queuecumber>
https://github.com/Queuecumber ! This sounds like a issue with the Hydra
default config not having the params as Optional
https://github.com/facebookresearch/hydra/blob/master/plugins/hydra_submitit_launcher/hydra_plugins/hydra_submitit_launcher/config.py#L17-L23
Could you share a minimal repro of your configuration? — You are receiving
this because you were mentioned. Reply to this email directly, view it on
GitHub <#1366 (comment)
<#1366 (comment)>>,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/ABMX7KYRJ4ZA37KLBPG4G33S45GXXANCNFSM4W35YCAA
.
Got it, thanks. It would still be helpful if we can get a minimal repro
here when you get a chance.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1366 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABMX7K5DVXIGYHXWQQRD6YTS45IRFANCNFSM4W35YCAA>
.
|
To me it sounds more like a design issue with the plugin/Submitit's AutoExecutor. Once we have a concrete example of what you are trying to do it will be easier to discuss and involve people from the submititit side if we need to. |
Here's an example config that should reproduce:
with a simple python file:
As of current hydra, this will throw an error that a required parameter was set to None (or something like that), if you modify: hydra/plugins/hydra_submitit_launcher/hydra_plugins/hydra_submitit_launcher/config.py Line 23 in 51275ab
to read
because it is hardcoded to want My view on it is that the AutoExecutor is fine for simple things but anything requireing fine-grained control over the executor should call the executors directly. |
It looks like the plugin configuration is faithfully reflecting the intention of the library authors. You should probably open this discussion in the submitit repo. |
I disagree because the slurm executor allows these to all be optional, the auto executor just wasn't updated to support that and it isnt clear how best to do it. |
Another way to phrase it is that the separate local and slurm executors support a lot more features than the auto executor so by tying the submitit plugin to the auto executor hydra can't support these more advanced features |
The AutoExecutor is a part of submitit. |
Rodger that, I'll await his feedback |
Actually there is hardly any, most of the logic (especially for mem_per_node/mem_per_gpu) is provided by slurm. As you actually said, we don't want te reimplement something that is already done somewhere else ;). |
Actually I am partly wrong since it seems you (@Queuecumber) already encoded the logic to make |
It's a little more than that, the main issue seems to be that
if you allow it to be Optional in the hydra plugin. If you set it to zero, you again get
from slurm, I'm not sure how this works for the CPU settings. So there's two ways to proceed:
I think either option seems reasonable, probably it does make sense to be more lax in the auto executor and just let the slurm executor complain if something goes wrong. |
Please don't go for 1, this is what we try to avoid within And 2 seems more complicated than needed, I don't see any code in |
Because it checks that the value of
Because it actually passes to sbatch |
Also why is it called |
Here's the generated sbatch for reference
Note
|
it checks it if you provide it to the executor, what I mean is you could just not forward
it ignores 0 values, but it's passed as "0GB" which is not ignored, I fixed this behavior in the PR facebookincubator/submitit#1606
It was an attempt to unirformize naming (and add an explicit unit) from the time we supported several internal clusters (including a non-slurm cluster) |
OK I see the distinction, let me quickly try adding something to hydra which doesnt pass the None value and see if that fixes it |
Confirmed that fixes it, in that case I think the easiest path forward is to just PR this into the hydra launcher, its a very small change, I'll verify it works with the cpu options too |
You may not need facebookincubator/submitit#1606 in light of this |
@omry While I'm doing this, would you be interested in me updating the |
Confirmed that it all looks like it's working with the other parameters, heres the example config I am using:
and the generated sbatch
looks perfect |
Yup. That would be good. |
I'm using
But, I'm still getting the error: Any suggestion? |
Can you try with an integer for |
@Queuecumber Thanks for the reply. No luck even after trying it with 32. :( |
Can you post the generated sbatch file? That might help illustrate what's going wrong |
This is the full config:
And this is the auto-generated
|
what about the sbatch? It should be in the .submitit folder |
|
that sbatch looks correct in that it only sets |
I am facing the same issue. Was this successfully resolved @sumanabasu? My admin also confirmed that this should actually be running (and replacing the srun with a |
@timruhkopf After leaving it aside for many months I just got it working this morning (after spending many hours yesterday!)! Setting the My hydra launcher at this moment looks like this :
This is how my hydra generated
Good Luck! |
🚀 Feature Request
Motivation
This could be read as a feature request or a bug report, I'm not sure how you want to consider it, I'm going with feature request. Currently theres some flags that you cannot set in the submitit launcher. I practice "gpu centric" scheduling, so I like to specify
mem_per_gpu
andcpus_per_gpu
and then I can just usegpus_per_task
to always get the optimal settings.For example on "mystery cluster" (you know the one), we can use 10 cpus and 64GB RAM per GPU. These settings allow me to only have to change
gpus_per_task
for example if I need to have 2 GPUs for 2 different models, then I'll automatically get 20 cpus and 128GB RAM for each of the tasks without having to change all of the settings. I've already PRed stuff related to this into submitit.The problem occurs when you try to set, via additional parameters, something like
mem_per_gpu
. You can't setmem
andmem_per_gpu
, slurm just crashes when you do that. Similarly, if you try to setcpus_per_gpu
viaadditional_parameters
you'll wind up setting that in addition tocpus_per_task
.Pitch
I tried making a simple patch that fixes this, but it hits issues with the
AutoExecutor
which I never updated in submitit to be aware of the options which conflict with each other. I think in general auto is missing some of the recent work in submitit, and it feels like a semi-hacky workaround to me.Submitit already has pretty good validation logic so my pitch is to (1) let submitit handle everything by calling the correct executor for the job instead of using Auto. This is going to require a revamp of how the parameters are named/passed however and will likely be a breaking change to the API. (2) It would be nice if we could allow people to pass whatever parameters are supported by submitit without needing to update the hydra schema each time. These are my two major goals, so they sound reasonable/feasible? Point (2) may not be possible I guess.
Other options:
Are you willing to open a pull request? (See CONTRIBUTING)
Yes, but I want feedback first on the best way to go about it
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: