Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specifying artifact regististry images with tags fails #2181

Open
khaerensml6 opened this issue May 8, 2023 · 7 comments
Open

Specifying artifact regististry images with tags fails #2181

khaerensml6 opened this issue May 8, 2023 · 7 comments
Labels
api: vertex-ai Issues related to the googleapis/python-aiplatform API. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@khaerensml6
Copy link

Creating a PipelineJob based on a artifact registry URI using a tag instead of a version raises an internal server error.
This is pretty annoying.

to be clear:

All of this combined makes it seem like there's a bug for executing tagged artifacts.

Environment details

  • OS type and version: Ubuntu 22.04.2 LTS
  • Python version: Python 3.9.16
  • pip version: pip 22.0.4
  • google-cloud-aiplatform version: 1.24.1

Steps to reproduce

  1. Create a PipelineJob with a artifact repository URI using a tag instead of the hash
  2. Run the pipeline job

Code example

This fails with a 500 internal server error on the "run" call.

    from google.cloud import aiplatform as aip

    service_account = ... 
    pipeline_name = ...
    job_id = f"{pipeline_name}-{datetime.datetime.now().strftime('%Y%m%d%H%M%S')}"

    compiled_job = f"https://{region}-kfp.pkg.dev/{PROJECTID}/{REPOSITORY_NAME}/{PIPELINE_NAME}/{TAG}"
    pipeline_job = aip.PipelineJob(
        display_name="test-name",
        job_id=job_id,
        template_path=compiled_job,
    )

    pipeline_job.run(network=None,
                     service_account=service_account,
                     sync=True)

This runs without problem:

    from google.cloud import aiplatform as aip

    service_account = ... 
    pipeline_name = ...
    job_id = f"{pipeline_name}-{datetime.datetime.now().strftime('%Y%m%d%H%M%S')}"

    compiled_job = f"https://{region}-kfp.pkg.dev/{PROJECTID}/{REPOSITORY_NAME}/{PIPELINE_NAME}/sha:...."
    pipeline_job = aip.PipelineJob(
        display_name="test-name",
        job_id=job_id,
        template_path=compiled_job,
    )

    pipeline_job.run(network=None,
                     service_account=service_account,
                     sync=True)

Stack trace

Traceback (most recent call last):
  File "/home/.../Desktop/test/venv/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 72, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/home/.../Desktop/test/venv/lib/python3.9/site-packages/grpc/_channel.py", line 1030, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/home/.../Desktop/test/venv/lib/python3.9/site-packages/grpc/_channel.py", line 910, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.INTERNAL
        details = "Internal error encountered."
        debug_error_string = "UNKNOWN:Error received from peer ipv6:%5B2a00:1450:400e:800::200a%5D:443 {created_time:"2023-05-09T00:09:30.434205595+02:00", grpc_status:13, grpc_message:"Internal error encountered."}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/.../Desktop/test/mlpipelines/monthly_pipeline.py", line 98, in <module>
    pipeline_job.run(network=None,
  File "/home/.../Desktop/test/venv/lib/python3.9/site-packages/google/cloud/aiplatform/pipeline_jobs.py", line 314, in run
    self._run(
  File "/home/.../Desktop/test/venv/lib/python3.9/site-packages/google/cloud/aiplatform/base.py", line 814, in wrapper
    return method(*args, **kwargs)
  File "/home/.../Desktop/test/venv/lib/python3.9/site-packages/google/cloud/aiplatform/pipeline_jobs.py", line 345, in _run
    self.submit(
  File "/home/.../Desktop/test/venv/lib/python3.9/site-packages/google/cloud/aiplatform/pipeline_jobs.py", line 419, in submit
    self._gca_resource = self.api_client.create_pipeline_job(
  File "/home/.../Desktop/test/venv/lib/python3.9/site-packages/google/cloud/aiplatform_v1/services/pipeline_service/client.py", line 1347, in create_pipeline_job
    response = rpc(
  File "/home/.../Desktop/test/venv/lib/python3.9/site-packages/google/api_core/gapic_v1/method.py", line 113, in __call__
    return wrapped_func(*args, **kwargs)
  File "/home/.../Desktop/test/venv/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 74, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.InternalServerError: 500 Internal error encountered.
@product-auto-label product-auto-label bot added the api: vertex-ai Issues related to the googleapis/python-aiplatform API. label May 8, 2023
@matthew29tang matthew29tang added the type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. label May 9, 2023
@matthew29tang
Copy link
Contributor

Thanks for the detailed report! I've filed this as an internal bug and I'll get back to you when I have further updates about this.

@xRagnorokx
Copy link

xRagnorokx commented Jun 23, 2023

For anyone finding this via google. I encountered a very similar error to this when trying to run a template as a pipeline job.

Turns out that I was not specifying the service account in the job submit call. Adding the service account to that call (i.e job.submit(service_account=pipeline_service_account)) fixed the issue for me.

@kab840
Copy link

kab840 commented Jul 5, 2023

I have also similar problem.
In my case, the service account for setting to job.submit needs to have roles/artifactregistry.reader to the target artifact registry for uploading pipeline template.

Anyway in my understanding, the service account called in job.submit is for executing vertex ai but for call pipeline template.

@ghost
Copy link

ghost commented Aug 14, 2023

Any update on this? I seem to have the same issue

@ghost
Copy link

ghost commented Aug 14, 2023

BTW downloading the pipeline YAML from Artifact Registry via tag works fine using the KFP SDK registry functions

@ghost
Copy link

ghost commented Aug 14, 2023

Similarly it works fine using a curl request as per the docs

@ghost
Copy link

ghost commented Aug 14, 2023

A workaround for anyone with the same issue - first use the KFP SDK to resolve the tag to an exact version, then pass the exact version as template_path:

import re
from kfp.registry import RegistryClient
from google.cloud import aiplatform

_VALID_AR_URL = re.compile(
    r"https://([\w\-]+)-kfp\.pkg\.dev/([\w\-]+)/([\w\-]+)/([\w\-]+)/([\w\-.]+)",
    re.IGNORECASE,
)

template_path = f"https://{region}-kfp.pkg.dev/{PROJECTID}/{REPOSITORY_NAME}/{PIPELINE_NAME}/{TAG}"

match = _VALID_AR_URL.match(template_path)
if match and "sha256:" not in template_path:
    region = match.group(1)
    project = match.group(2)
    repo = match.group(3)
    package_name = match.group(4)
    tag = match.group(5)
    host = f"https://{region}-kfp.pkg.dev/{project}/{repo}"
    client = RegistryClient(host=host)
    metadata = client.get_tag(package_name, tag)
    version = metadata["version"][metadata["version"].find("sha256:") :]
    template_path = f"{host}/{package_name}/{version}"

# Instantiate PipelineJob object
pl = aiplatform.pipeline_jobs.PipelineJob(
    template_path=template_path,
    ...
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: vertex-ai Issues related to the googleapis/python-aiplatform API. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

4 participants