Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

samples.snippets.create_training_pipeline_tabular_regression_sample_test: test_ucaip_generated_create_training_pipeline_sample failed #413

Closed
flaky-bot bot opened this issue May 19, 2021 · 7 comments · Fixed by #508
Assignees
Labels
api: aiplatform Issues related to the AI Platform API. flakybot: flaky Tells the Flaky Bot not to close or comment on this issue. flakybot: issue An issue filed by the Flaky Bot. Should not be added manually. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. 🚨 This issue needs some love. samples Issues that are directly related to samples. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@flaky-bot
Copy link

flaky-bot bot commented May 19, 2021

This test failed!

To configure my behavior, see the Flaky Bot documentation.

If I'm commenting on this issue too often, add the flakybot: quiet label and
I will stop commenting.


commit: 7b7c950
buildURL: Build Status, Sponge
status: failed

Test output
shared_state = {'training_pipeline_name': 'projects/580378083368/locations/us-central1/trainingPipelines/5986541746077564928'}
pipeline_client = 
@pytest.fixture()
def teardown_training_pipeline(shared_state, pipeline_client):
    yield

    pipeline_client.cancel_training_pipeline(
        name=shared_state["training_pipeline_name"]
    )

    # Waiting for training pipeline to be in CANCELLED state
    helpers.wait_for_job_state(
        get_job_method=pipeline_client.get_training_pipeline,
      name=shared_state["training_pipeline_name"],
    )

conftest.py:168:


get_job_method = <bound method PipelineServiceClient.get_training_pipeline of <google.cloud.aiplatform_v1.services.pipeline_service.client.PipelineServiceClient object at 0x7f8064052990>>
name = 'projects/580378083368/locations/us-central1/trainingPipelines/5986541746077564928'
expected_state = 'CANCELLED', timeout = 90, freq = 1.5

def wait_for_job_state(
    get_job_method: Callable[[str], "proto.Message"],  # noqa: F821
    name: str,
    expected_state: str = "CANCELLED",
    timeout: int = 90,
    freq: float = 1.5,
) -> None:
    """ Waits until the Job state of provided resource name is a particular state.

    Args:
        get_job_method: Callable[[str], "proto.Message"]
            Required. The GAPIC getter method to poll. Takes 'name' parameter
            and has a 'state' attribute in its response.
        name (str):
            Required. Complete uCAIP resource name to pass to get_job_method
        expected_state (str):
            The state at which this method will stop waiting.
            Default is "CANCELLED".
        timeout (int):
            Maximum number of seconds to wait for expected_state. If the job
            state is not expected_state within timeout, a TimeoutError will be raised.
            Default is 90 seconds.
        freq (float):
            Number of seconds between calls to get_job_method.
            Default is 1.5 seconds.
    """

    for _ in range(int(timeout / freq)):
        response = get_job_method(name=name)
        if expected_state in str(response.state):
            return None
        time.sleep(freq)

    raise TimeoutError(
      f"Job state did not become {expected_state} within {timeout} seconds"
        "\nTry increasing the timeout in sample test"
        f"\nLast recorded state: {response.state}"
    )

E TimeoutError: Job state did not become CANCELLED within 90 seconds
E Try increasing the timeout in sample test
E Last recorded state: 5

helpers.py:55: TimeoutError

@flaky-bot flaky-bot bot added flakybot: issue An issue filed by the Flaky Bot. Should not be added manually. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels May 19, 2021
@product-auto-label product-auto-label bot added api: aiplatform Issues related to the AI Platform API. samples Issues that are directly related to samples. labels May 19, 2021
@flaky-bot
Copy link
Author

flaky-bot bot commented May 20, 2021

commit: b2ed51e
buildURL: Build Status, Sponge
status: failed

Test output
shared_state = {'training_pipeline_name': 'projects/580378083368/locations/us-central1/trainingPipelines/6609727344514957312'}
pipeline_client = 
@pytest.fixture()
def teardown_training_pipeline(shared_state, pipeline_client):
    yield

    pipeline_client.cancel_training_pipeline(
        name=shared_state["training_pipeline_name"]
    )

    # Waiting for training pipeline to be in CANCELLED state
    helpers.wait_for_job_state(
        get_job_method=pipeline_client.get_training_pipeline,
      name=shared_state["training_pipeline_name"],
    )

conftest.py:168:


get_job_method = <bound method PipelineServiceClient.get_training_pipeline of <google.cloud.aiplatform_v1.services.pipeline_service.client.PipelineServiceClient object at 0x7fb5226b80d0>>
name = 'projects/580378083368/locations/us-central1/trainingPipelines/6609727344514957312'
expected_state = 'CANCELLED', timeout = 90, freq = 1.5

def wait_for_job_state(
    get_job_method: Callable[[str], "proto.Message"],  # noqa: F821
    name: str,
    expected_state: str = "CANCELLED",
    timeout: int = 90,
    freq: float = 1.5,
) -> None:
    """ Waits until the Job state of provided resource name is a particular state.

    Args:
        get_job_method: Callable[[str], "proto.Message"]
            Required. The GAPIC getter method to poll. Takes 'name' parameter
            and has a 'state' attribute in its response.
        name (str):
            Required. Complete uCAIP resource name to pass to get_job_method
        expected_state (str):
            The state at which this method will stop waiting.
            Default is "CANCELLED".
        timeout (int):
            Maximum number of seconds to wait for expected_state. If the job
            state is not expected_state within timeout, a TimeoutError will be raised.
            Default is 90 seconds.
        freq (float):
            Number of seconds between calls to get_job_method.
            Default is 1.5 seconds.
    """

    for _ in range(int(timeout / freq)):
        response = get_job_method(name=name)
        if expected_state in str(response.state):
            return None
        time.sleep(freq)

    raise TimeoutError(
      f"Job state did not become {expected_state} within {timeout} seconds"
        "\nTry increasing the timeout in sample test"
        f"\nLast recorded state: {response.state}"
    )

E TimeoutError: Job state did not become CANCELLED within 90 seconds
E Try increasing the timeout in sample test
E Last recorded state: 5

helpers.py:55: TimeoutError

@flaky-bot
Copy link
Author

flaky-bot bot commented May 21, 2021

commit: 15552ce
buildURL: Build Status, Sponge
status: failed

Test output
shared_state = {'training_pipeline_name': 'projects/580378083368/locations/us-central1/trainingPipelines/4639121032563654656'}
pipeline_client = 
@pytest.fixture()
def teardown_training_pipeline(shared_state, pipeline_client):
    yield

    pipeline_client.cancel_training_pipeline(
        name=shared_state["training_pipeline_name"]
    )

    # Waiting for training pipeline to be in CANCELLED state
    helpers.wait_for_job_state(
        get_job_method=pipeline_client.get_training_pipeline,
      name=shared_state["training_pipeline_name"],
    )

conftest.py:168:


get_job_method = <bound method PipelineServiceClient.get_training_pipeline of <google.cloud.aiplatform_v1.services.pipeline_service.client.PipelineServiceClient object at 0x7f5f93780f90>>
name = 'projects/580378083368/locations/us-central1/trainingPipelines/4639121032563654656'
expected_state = 'CANCELLED', timeout = 90, freq = 1.5

def wait_for_job_state(
    get_job_method: Callable[[str], "proto.Message"],  # noqa: F821
    name: str,
    expected_state: str = "CANCELLED",
    timeout: int = 90,
    freq: float = 1.5,
) -> None:
    """ Waits until the Job state of provided resource name is a particular state.

    Args:
        get_job_method: Callable[[str], "proto.Message"]
            Required. The GAPIC getter method to poll. Takes 'name' parameter
            and has a 'state' attribute in its response.
        name (str):
            Required. Complete uCAIP resource name to pass to get_job_method
        expected_state (str):
            The state at which this method will stop waiting.
            Default is "CANCELLED".
        timeout (int):
            Maximum number of seconds to wait for expected_state. If the job
            state is not expected_state within timeout, a TimeoutError will be raised.
            Default is 90 seconds.
        freq (float):
            Number of seconds between calls to get_job_method.
            Default is 1.5 seconds.
    """

    for _ in range(int(timeout / freq)):
        response = get_job_method(name=name)
        if expected_state in str(response.state):
            return None
        time.sleep(freq)

    raise TimeoutError(
      f"Job state did not become {expected_state} within {timeout} seconds"
        "\nTry increasing the timeout in sample test"
        f"\nLast recorded state: {response.state}"
    )

E TimeoutError: Job state did not become CANCELLED within 90 seconds
E Try increasing the timeout in sample test
E Last recorded state: 5

helpers.py:55: TimeoutError

@flaky-bot
Copy link
Author

flaky-bot bot commented May 22, 2021

commit: f40f322
buildURL: Build Status, Sponge
status: failed

Test output
args = (name: "projects/580378083368/locations/us-central1/trainingPipelines/889874342777716736"
,)
kwargs = {'metadata': [('x-goog-request-params', 'name=projects/580378083368/locations/us-central1/trainingPipelines/889874342777716736'), ('x-goog-api-client', 'gl-python/3.7.10 grpc/1.38.0 gax/1.28.0 gapic/1.0.0')]}
@six.wraps(callable_)
def error_remapped_callable(*args, **kwargs):
    try:
      return callable_(*args, **kwargs)

.nox/py-3-7/lib/python3.7/site-packages/google/api_core/grpc_helpers.py:67:


self = <grpc._channel._UnaryUnaryMultiCallable object at 0x7f0e4d13b250>
request = name: "projects/580378083368/locations/us-central1/trainingPipelines/889874342777716736"

timeout = None
metadata = [('x-goog-request-params', 'name=projects/580378083368/locations/us-central1/trainingPipelines/889874342777716736'), ('x-goog-api-client', 'gl-python/3.7.10 grpc/1.38.0 gax/1.28.0 gapic/1.0.0')]
credentials = None, wait_for_ready = None, compression = None

def __call__(self,
             request,
             timeout=None,
             metadata=None,
             credentials=None,
             wait_for_ready=None,
             compression=None):
    state, call, = self._blocking(request, timeout, metadata, credentials,
                                  wait_for_ready, compression)
  return _end_unary_response_blocking(state, call, False, None)

.nox/py-3-7/lib/python3.7/site-packages/grpc/_channel.py:946:


state = <grpc._channel._RPCState object at 0x7f0e4f453f10>
call = <grpc._cython.cygrpc.SegregatedCall object at 0x7f0e4f4180a0>
with_call = False, deadline = None

def _end_unary_response_blocking(state, call, with_call, deadline):
    if state.code is grpc.StatusCode.OK:
        if with_call:
            rendezvous = _MultiThreadedRendezvous(state, call, None, deadline)
            return state.response, rendezvous
        else:
            return state.response
    else:
      raise _InactiveRpcError(state)

E grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
E status = StatusCode.FAILED_PRECONDITION
E details = "The TrainingPipeline "projects/580378083368/locations/us-central1/trainingPipelines/889874342777716736" is in state PIPELINE_STATE_FAILED and cannot be canceled."
E debug_error_string = "{"created":"@1621678195.348348367","description":"Error received from peer ipv4:74.125.199.95:443","file":"src/core/lib/surface/call.cc","file_line":1066,"grpc_message":"The TrainingPipeline "projects/580378083368/locations/us-central1/trainingPipelines/889874342777716736" is in state PIPELINE_STATE_FAILED and cannot be canceled.","grpc_status":9}"
E >

.nox/py-3-7/lib/python3.7/site-packages/grpc/_channel.py:849: _InactiveRpcError

The above exception was the direct cause of the following exception:

shared_state = {'training_pipeline_name': 'projects/580378083368/locations/us-central1/trainingPipelines/889874342777716736'}
pipeline_client = <google.cloud.aiplatform_v1.services.pipeline_service.client.PipelineServiceClient object at 0x7f0e4d102550>

@pytest.fixture()
def teardown_training_pipeline(shared_state, pipeline_client):
    yield

    pipeline_client.cancel_training_pipeline(
      name=shared_state["training_pipeline_name"]
    )

conftest.py:162:


../../google/cloud/aiplatform_v1/services/pipeline_service/client.py:803: in cancel_training_pipeline
request, retry=retry, timeout=timeout, metadata=metadata,
.nox/py-3-7/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py:145: in call
return wrapped_func(*args, **kwargs)


args = (name: "projects/580378083368/locations/us-central1/trainingPipelines/889874342777716736"
,)
kwargs = {'metadata': [('x-goog-request-params', 'name=projects/580378083368/locations/us-central1/trainingPipelines/889874342777716736'), ('x-goog-api-client', 'gl-python/3.7.10 grpc/1.38.0 gax/1.28.0 gapic/1.0.0')]}

@six.wraps(callable_)
def error_remapped_callable(*args, **kwargs):
    try:
        return callable_(*args, **kwargs)
    except grpc.RpcError as exc:
      six.raise_from(exceptions.from_grpc_error(exc), exc)

E google.api_core.exceptions.FailedPrecondition: 400 The TrainingPipeline "projects/580378083368/locations/us-central1/trainingPipelines/889874342777716736" is in state PIPELINE_STATE_FAILED and cannot be canceled.

.nox/py-3-7/lib/python3.7/site-packages/google/api_core/grpc_helpers.py:69: FailedPrecondition

@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label May 26, 2021
@flaky-bot
Copy link
Author

flaky-bot bot commented May 27, 2021

commit: 987ce3e
buildURL: Build Status, Sponge
status: failed

Test output
args = (name: "projects/580378083368/locations/us-central1/trainingPipelines/7330444022382592000"
,)
kwargs = {'metadata': [('x-goog-request-params', 'name=projects/580378083368/locations/us-central1/trainingPipelines/7330444022382592000'), ('x-goog-api-client', 'gl-python/3.7.10 grpc/1.38.0 gax/1.28.0 gapic/1.0.1')]}
@six.wraps(callable_)
def error_remapped_callable(*args, **kwargs):
    try:
      return callable_(*args, **kwargs)

.nox/py-3-7/lib/python3.7/site-packages/google/api_core/grpc_helpers.py:67:


self = <grpc._channel._UnaryUnaryMultiCallable object at 0x7fd4d02b1350>
request = name: "projects/580378083368/locations/us-central1/trainingPipelines/7330444022382592000"

timeout = None
metadata = [('x-goog-request-params', 'name=projects/580378083368/locations/us-central1/trainingPipelines/7330444022382592000'), ('x-goog-api-client', 'gl-python/3.7.10 grpc/1.38.0 gax/1.28.0 gapic/1.0.1')]
credentials = None, wait_for_ready = None, compression = None

def __call__(self,
             request,
             timeout=None,
             metadata=None,
             credentials=None,
             wait_for_ready=None,
             compression=None):
    state, call, = self._blocking(request, timeout, metadata, credentials,
                                  wait_for_ready, compression)
  return _end_unary_response_blocking(state, call, False, None)

.nox/py-3-7/lib/python3.7/site-packages/grpc/_channel.py:946:


state = <grpc._channel._RPCState object at 0x7fd4d0359b90>
call = <grpc._cython.cygrpc.SegregatedCall object at 0x7fd4d008de60>
with_call = False, deadline = None

def _end_unary_response_blocking(state, call, with_call, deadline):
    if state.code is grpc.StatusCode.OK:
        if with_call:
            rendezvous = _MultiThreadedRendezvous(state, call, None, deadline)
            return state.response, rendezvous
        else:
            return state.response
    else:
      raise _InactiveRpcError(state)

E grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
E status = StatusCode.FAILED_PRECONDITION
E details = "The TrainingPipeline "projects/580378083368/locations/us-central1/trainingPipelines/7330444022382592000" is in state PIPELINE_STATE_FAILED and cannot be canceled."
E debug_error_string = "{"created":"@1622110295.414087554","description":"Error received from peer ipv4:74.125.142.95:443","file":"src/core/lib/surface/call.cc","file_line":1066,"grpc_message":"The TrainingPipeline "projects/580378083368/locations/us-central1/trainingPipelines/7330444022382592000" is in state PIPELINE_STATE_FAILED and cannot be canceled.","grpc_status":9}"
E >

.nox/py-3-7/lib/python3.7/site-packages/grpc/_channel.py:849: _InactiveRpcError

The above exception was the direct cause of the following exception:

shared_state = {'training_pipeline_name': 'projects/580378083368/locations/us-central1/trainingPipelines/7330444022382592000'}
pipeline_client = <google.cloud.aiplatform_v1.services.pipeline_service.client.PipelineServiceClient object at 0x7fd4d01b9b10>

@pytest.fixture()
def teardown_training_pipeline(shared_state, pipeline_client):
    yield

    pipeline_client.cancel_training_pipeline(
      name=shared_state["training_pipeline_name"]
    )

conftest.py:162:


../../google/cloud/aiplatform_v1/services/pipeline_service/client.py:803: in cancel_training_pipeline
request, retry=retry, timeout=timeout, metadata=metadata,
.nox/py-3-7/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py:145: in call
return wrapped_func(*args, **kwargs)


args = (name: "projects/580378083368/locations/us-central1/trainingPipelines/7330444022382592000"
,)
kwargs = {'metadata': [('x-goog-request-params', 'name=projects/580378083368/locations/us-central1/trainingPipelines/7330444022382592000'), ('x-goog-api-client', 'gl-python/3.7.10 grpc/1.38.0 gax/1.28.0 gapic/1.0.1')]}

@six.wraps(callable_)
def error_remapped_callable(*args, **kwargs):
    try:
        return callable_(*args, **kwargs)
    except grpc.RpcError as exc:
      six.raise_from(exceptions.from_grpc_error(exc), exc)

E google.api_core.exceptions.FailedPrecondition: 400 The TrainingPipeline "projects/580378083368/locations/us-central1/trainingPipelines/7330444022382592000" is in state PIPELINE_STATE_FAILED and cannot be canceled.

.nox/py-3-7/lib/python3.7/site-packages/google/api_core/grpc_helpers.py:69: FailedPrecondition

@flaky-bot flaky-bot bot reopened this Jun 2, 2021
@flaky-bot flaky-bot bot added the flakybot: flaky Tells the Flaky Bot not to close or comment on this issue. label Jun 2, 2021
@flaky-bot
Copy link
Author

flaky-bot bot commented Jun 2, 2021

Looks like this issue is flaky. 😟

I'm going to leave this open and stop commenting.

A human should fix and close this.


commit: fdc968f
buildURL: Build Status, Sponge
status: failed

Test output
args = (name: "projects/580378083368/locations/us-central1/trainingPipelines/5224742915749707776"
,)
kwargs = {'metadata': [('x-goog-request-params', 'name=projects/580378083368/locations/us-central1/trainingPipelines/5224742915749707776'), ('x-goog-api-client', 'gl-python/3.7.10 grpc/1.38.0 gax/1.28.0 gapic/1.0.1')]}
@six.wraps(callable_)
def error_remapped_callable(*args, **kwargs):
    try:
      return callable_(*args, **kwargs)

.nox/py-3-7/lib/python3.7/site-packages/google/api_core/grpc_helpers.py:67:


self = <grpc._channel._UnaryUnaryMultiCallable object at 0x7f79af490190>
request = name: "projects/580378083368/locations/us-central1/trainingPipelines/5224742915749707776"

timeout = None
metadata = [('x-goog-request-params', 'name=projects/580378083368/locations/us-central1/trainingPipelines/5224742915749707776'), ('x-goog-api-client', 'gl-python/3.7.10 grpc/1.38.0 gax/1.28.0 gapic/1.0.1')]
credentials = None, wait_for_ready = None, compression = None

def __call__(self,
             request,
             timeout=None,
             metadata=None,
             credentials=None,
             wait_for_ready=None,
             compression=None):
    state, call, = self._blocking(request, timeout, metadata, credentials,
                                  wait_for_ready, compression)
  return _end_unary_response_blocking(state, call, False, None)

.nox/py-3-7/lib/python3.7/site-packages/grpc/_channel.py:946:


state = <grpc._channel._RPCState object at 0x7f79b58d5bd0>
call = <grpc._cython.cygrpc.SegregatedCall object at 0x7f79af482dc0>
with_call = False, deadline = None

def _end_unary_response_blocking(state, call, with_call, deadline):
    if state.code is grpc.StatusCode.OK:
        if with_call:
            rendezvous = _MultiThreadedRendezvous(state, call, None, deadline)
            return state.response, rendezvous
        else:
            return state.response
    else:
      raise _InactiveRpcError(state)

E grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
E status = StatusCode.FAILED_PRECONDITION
E details = "The TrainingPipeline "projects/580378083368/locations/us-central1/trainingPipelines/5224742915749707776" is in state PIPELINE_STATE_FAILED and cannot be canceled."
E debug_error_string = "{"created":"@1622628695.039564708","description":"Error received from peer ipv4:74.125.20.95:443","file":"src/core/lib/surface/call.cc","file_line":1066,"grpc_message":"The TrainingPipeline "projects/580378083368/locations/us-central1/trainingPipelines/5224742915749707776" is in state PIPELINE_STATE_FAILED and cannot be canceled.","grpc_status":9}"
E >

.nox/py-3-7/lib/python3.7/site-packages/grpc/_channel.py:849: _InactiveRpcError

The above exception was the direct cause of the following exception:

shared_state = {'training_pipeline_name': 'projects/580378083368/locations/us-central1/trainingPipelines/5224742915749707776'}
pipeline_client = <google.cloud.aiplatform_v1.services.pipeline_service.client.PipelineServiceClient object at 0x7f79af4a96d0>

@pytest.fixture()
def teardown_training_pipeline(shared_state, pipeline_client):
    yield

    pipeline_client.cancel_training_pipeline(
      name=shared_state["training_pipeline_name"]
    )

conftest.py:162:


../../google/cloud/aiplatform_v1/services/pipeline_service/client.py:803: in cancel_training_pipeline
request, retry=retry, timeout=timeout, metadata=metadata,
.nox/py-3-7/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py:145: in call
return wrapped_func(*args, **kwargs)


args = (name: "projects/580378083368/locations/us-central1/trainingPipelines/5224742915749707776"
,)
kwargs = {'metadata': [('x-goog-request-params', 'name=projects/580378083368/locations/us-central1/trainingPipelines/5224742915749707776'), ('x-goog-api-client', 'gl-python/3.7.10 grpc/1.38.0 gax/1.28.0 gapic/1.0.1')]}

@six.wraps(callable_)
def error_remapped_callable(*args, **kwargs):
    try:
        return callable_(*args, **kwargs)
    except grpc.RpcError as exc:
      six.raise_from(exceptions.from_grpc_error(exc), exc)

E google.api_core.exceptions.FailedPrecondition: 400 The TrainingPipeline "projects/580378083368/locations/us-central1/trainingPipelines/5224742915749707776" is in state PIPELINE_STATE_FAILED and cannot be canceled.

.nox/py-3-7/lib/python3.7/site-packages/google/api_core/grpc_helpers.py:69: FailedPrecondition

@munkhuushmgl
Copy link
Contributor

Quotas have increased. This should be fixed now.

@flaky-bot
Copy link
Author

flaky-bot bot commented Jun 11, 2021

Oops! Looks like this issue is still flaky. It failed again. 😬

I reopened the issue, but a human will need to close it again.


commit: 481d172
buildURL: Build Status, Sponge
status: failed

Test output
shared_state = {'training_pipeline_name': 'projects/580378083368/locations/us-central1/trainingPipelines/3442232256985300992'}
pipeline_client = 
@pytest.fixture()
def teardown_training_pipeline(shared_state, pipeline_client):
    yield

    pipeline_client.cancel_training_pipeline(
        name=shared_state["training_pipeline_name"]
    )

    # Waiting for training pipeline to be in CANCELLED state
    helpers.wait_for_job_state(
        get_job_method=pipeline_client.get_training_pipeline,
      name=shared_state["training_pipeline_name"],
    )

conftest.py:168:


get_job_method = <bound method PipelineServiceClient.get_training_pipeline of <google.cloud.aiplatform_v1.services.pipeline_service.client.PipelineServiceClient object at 0x7efd164bd210>>
name = 'projects/580378083368/locations/us-central1/trainingPipelines/3442232256985300992'
expected_state = 'CANCELLED', timeout = 90, freq = 1.5

def wait_for_job_state(
    get_job_method: Callable[[str], "proto.Message"],  # noqa: F821
    name: str,
    expected_state: str = "CANCELLED",
    timeout: int = 90,
    freq: float = 1.5,
) -> None:
    """ Waits until the Job state of provided resource name is a particular state.

    Args:
        get_job_method: Callable[[str], "proto.Message"]
            Required. The GAPIC getter method to poll. Takes 'name' parameter
            and has a 'state' attribute in its response.
        name (str):
            Required. Complete uCAIP resource name to pass to get_job_method
        expected_state (str):
            The state at which this method will stop waiting.
            Default is "CANCELLED".
        timeout (int):
            Maximum number of seconds to wait for expected_state. If the job
            state is not expected_state within timeout, a TimeoutError will be raised.
            Default is 90 seconds.
        freq (float):
            Number of seconds between calls to get_job_method.
            Default is 1.5 seconds.
    """

    for _ in range(int(timeout / freq)):
        response = get_job_method(name=name)
        if expected_state in str(response.state):
            return None
        time.sleep(freq)

    raise TimeoutError(
      f"Job state did not become {expected_state} within {timeout} seconds"
        "\nTry increasing the timeout in sample test"
        f"\nLast recorded state: {response.state}"
    )

E TimeoutError: Job state did not become CANCELLED within 90 seconds
E Try increasing the timeout in sample test
E Last recorded state: 6

helpers.py:55: TimeoutError

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: aiplatform Issues related to the AI Platform API. flakybot: flaky Tells the Flaky Bot not to close or comment on this issue. flakybot: issue An issue filed by the Flaky Bot. Should not be added manually. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. 🚨 This issue needs some love. samples Issues that are directly related to samples. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants