Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPUs parameter doesn't seem to be working #307

Open
aristizabal95 opened this issue Apr 24, 2023 · 3 comments
Open

GPUs parameter doesn't seem to be working #307

aristizabal95 opened this issue Apr 24, 2023 · 3 comments

Comments

@aristizabal95
Copy link

When I try to run a basic MLCube passing the --gpus parameter, I get an error mlcube.errors.ConfigurationError: Unknown keys: ['--gpus']. Namespace = runner.

I've tried different ways of passing the parameter and value: --gpus="all", --gpus=all, --gpus all, --gpus=1, --gpus="1" just in case that would affect, but I get the same error regardless.

I also tried specifying the accelerator_count to different values but nothing changed.

Full traceback:

HelloWorld/data_preparator/mlcube % mlcube run --task=prepare --gpus="all" -Pdocker.build_strategy=always
Traceback (most recent call last):
  File "/Users/alejandroaristizabal/opt/anaconda3/envs/medperf/bin/mlcube", line 8, in <module>
    sys.exit(cli())
  File "/Users/alejandroaristizabal/opt/anaconda3/envs/medperf/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/alejandroaristizabal/opt/anaconda3/envs/medperf/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/alejandroaristizabal/opt/anaconda3/envs/medperf/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/alejandroaristizabal/opt/anaconda3/envs/medperf/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/alejandroaristizabal/opt/anaconda3/envs/medperf/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/alejandroaristizabal/opt/anaconda3/envs/medperf/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/alejandroaristizabal/opt/anaconda3/envs/medperf/lib/python3.9/site-packages/mlcube/__main__.py", line 272, in run
    runner_cls, mlcube_config = parse_cli_args(
  File "/Users/alejandroaristizabal/opt/anaconda3/envs/medperf/lib/python3.9/site-packages/mlcube/cli.py", line 60, in parse_cli_args
    mlcube_config = MLCubeConfig.create_mlcube_config(
  File "/Users/alejandroaristizabal/opt/anaconda3/envs/medperf/lib/python3.9/site-packages/mlcube/config.py", line 158, in create_mlcube_config
    runner_cls.CONFIG.validate(mlcube_config)
  File "/Users/alejandroaristizabal/opt/anaconda3/envs/medperf/lib/python3.9/site-packages/mlcube_docker/docker_run.py", line 100, in validate
    _ = validator.check_unknown_keys(Config.DEFAULT.keys())\
  File "/Users/alejandroaristizabal/opt/anaconda3/envs/medperf/lib/python3.9/site-packages/mlcube/validate.py", line 78, in check_unknown_keys
    raise ConfigurationError(f"Unknown keys: {unknown_keys}.{self._namespace_msg()}")
mlcube.errors.ConfigurationError: Unknown keys: ['--gpus']. Namespace = runner.
@sergey-serebryakov
Copy link
Contributor

Thanks for reporting this, working on it today.

@sergey-serebryakov
Copy link
Contributor

@aristizabal95 I can't reproduce this error on my machine using several example MLCubes. Could you please point me to the repository with this HelloWorld example? Thanks!

@aristizabal95
Copy link
Author

aristizabal95 commented Apr 25, 2023

Nevermind, this might be related to my machine. I totally overlooked compatibility with Mac. I was naively expecting things to silently run if they didn't request gpu access.

In any case, this is the respository: https://github.com/mlcommons/medperf, and the HelloWorld example can be found in examples/HelloWorld.

I still find the error message unrelated to the actual issue, and made me think it had to do with MLCube instead of machine-runner compatibility. I'm not sure if there's a way to identify these kind of scenarios to provide a more descriptive error message. Feel free to close this if its too much of an ask!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants