Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AML deployment error due to missing az cli arguments #115

Open
aponte411 opened this issue Dec 8, 2022 · 5 comments
Open

AML deployment error due to missing az cli arguments #115

aponte411 opened this issue Dec 8, 2022 · 5 comments

Comments

@aponte411
Copy link

When trying to run the aml example, e.g. bloom aml, it tries to run get_acr_name() but fails because its missing the resource group name argument. Is there be a way to pass in user arguments such as the resource group, subscription, etc? It would also be nice to expose more arguments for the aml online endpoints such as the auth_mode, e.g. we arent allowed to use keys, only aml_tokens in production environments. But I can also imagine other deployment attributes/arguments being useful as well such as instance_count or type.

[2022-12-08 10:53:37,253] [INFO] [deployment.py:87:deploy] ************* MII is using DeepSpeed Optimizations to accelerate your model *************
ERROR: the following arguments are required: --resource-group/-g, --name/-n

Examples from AI knowledge base:
https://aka.ms/cli_ref
Read more about the command in reference docs

 ------------------------------ 

Unable to obtain ACR name from Azure-CLI. Please verify that you:
        - Have Azure-CLI installed (https://learn.microsoft.com/en-us/cli/azure/install-azure-cli)
        - Are logged in to an active account on Azure-CLI ($az login)
        - Have Azure-CLI ML plugin installed ($az extension add --name ml)

 ------------------------------ 

Traceback (most recent call last):
  File "/mnt/c/Users/davidaponte/Documents/CS677-DeepLearning/deeplearning/deeplearning/deep_learning/text_to_image/deepspeed_mii/bloom560m-aml.py", line 7, in <module>
    mii.deploy(task='text-generation',
  File "/home/bambam/.pyenv/versions/deeplearning/lib/python3.9/site-packages/mii/deployment.py", line 112, in deploy
    _deploy_aml(deployment_name=deployment_name, model_name=model, version=version)
  File "/home/bambam/.pyenv/versions/deeplearning/lib/python3.9/site-packages/mii/deployment.py", line 124, in _deploy_aml
    acr_name = mii.aml_related.utils.get_acr_name()
  File "/home/bambam/.pyenv/versions/deeplearning/lib/python3.9/site-packages/mii/aml_related/utils.py", line 31, in get_acr_name
    raise (e)
  File "/home/bambam/.pyenv/versions/deeplearning/lib/python3.9/site-packages/mii/aml_related/utils.py", line 13, in get_acr_name
    acr_name = subprocess.check_output(
  File "/home/bambam/.pyenv/versions/3.9.0/lib/python3.9/subprocess.py", line 420, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/home/bambam/.pyenv/versions/3.9.0/lib/python3.9/subprocess.py", line 524, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['az', 'ml', 'workspace', 'show', '--query', 'container_registry']' returned non-zero exit status 2.

Setup:
deepspeed==0.7.6
deepspeed-mii==0.0.4
py3.9.0
Ubuntu 20.04.4 LTS (Focal Fossa)

@aponte411 aponte411 changed the title AML deployment error due to missing resource group requirements AML deployment error due to missing resource group arguments Dec 8, 2022
@aponte411 aponte411 changed the title AML deployment error due to missing resource group arguments AML deployment error due to missing az cli arguments Dec 8, 2022
@mrwyattii
Copy link
Contributor

@aponte411 currently we expect the user to set resource group and subscription from the Azure-CLI like so:
az account set --subscription "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"

I agree that exposing more options and expanding the AML deployment capabilities would be nice. Let me know if you have some time to help test/debug/expand these capabilities!

@buswrecker
Copy link

buswrecker commented Mar 9, 2023

Hi, i resolved this - i had an issue with torch (had to pip uninstall nvidia_cublas_cu11) and also i wasn't on a GPU VM. Managed to build the folder with deploy.sh and deploying to a managed endpoint now

@aponte411 - did you make any progress? i'm getting the same error in Jupyter notebook.

deepspeed==0.8.2
deepspeed-mii==0.05+unknown
python==3.8.0
Ubuntu==20.04.1

@TahaBinhuraib
Copy link
Contributor

@buswrecker I can run deepspeed mii from the gpu vm but I still can't deploy, I get the same error:
subprocess.CalledProcessError: Command '['az', 'ml', 'workspace', 'show', '--query', 'container_registry']' returned non-zero exit status 2.

@jakemanger
Copy link

I also could not get this working after following instructions in the readme. The only way I could use aml is after I overrode the get_acr_name() function to return my acr name instead of calling the az cli command. Is there a way to set a default --name argument for this command so this can be fixed and it returns the correct acr name?

The command I'm talking about is:

["az",
  "ml",
  "workspace",
   "show",
   "--query",
   "container_registry"],

Note, I also tried putting --name myworkspacename in as an argument and it just returned ------

@ShuntaIto
Copy link

ShuntaIto commented Nov 2, 2023

I'm facing same problem, on GPU VM.

Maybe, adding "shell=True" will resolve this problem?

acr_name = subprocess.check_output(
            ["az",
             "ml",
             "workspace",
             "show",
             "--query",
             "container_registry"],
            text=True, shell=True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants