Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error running cifar10_tensorflow_tpu example #27

Open
akolesnikov opened this issue Jul 15, 2022 · 0 comments
Open

Error running cifar10_tensorflow_tpu example #27

akolesnikov opened this issue Jul 15, 2022 · 0 comments

Comments

@akolesnikov
Copy link

I'm trying to run the cifar10_tensorflow_tpu example on GCP and got this error:

  File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 445, in result
    return self.__get_result()
  File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 390, in __get_result
    raise self._exception
  File "/home/koles/.local/lib/python3.9/site-packages/xmanager/xm/core.py", line 824, in launch
    await experiment_unit.add(job, args, identity=identity)
  File "/home/koles/.local/lib/python3.9/site-packages/xmanager/xm_local/experiment.py", line 211, in _launch_job_group
    launch_result = await self._submit_jobs_for_execution(job_group)
  File "/home/koles/.local/lib/python3.9/site-packages/xmanager/xm_local/experiment.py", line 83, in _submit_jobs_for_execution
    vertex_handles = vertex.launch(self._experiment_title,
  File "/home/koles/.local/lib/python3.9/site-packages/xmanager/cloud/vertex.py", line 335, in launch
    job_name = get_default_client().launch(
  File "/home/koles/.local/lib/python3.9/site-packages/xmanager/cloud/vertex.py", line 181, in launch
    custom_job.wait_for_resource_creation()
  File "/home/koles/.local/lib/python3.9/site-packages/google/cloud/aiplatform/jobs.py", line 1026, in wait_for_resource_creation
    self._wait_for_resource_creation()
  File "/home/koles/.local/lib/python3.9/site-packages/google/cloud/aiplatform/base.py", line 1246, in _wait_for_resource_creation
    self._raise_future_exception()
  File "/home/koles/.local/lib/python3.9/site-packages/google/cloud/aiplatform/base.py", line 214, in _raise_future_exception
    raise self._exception
  File "/home/koles/.local/lib/python3.9/site-packages/google/cloud/aiplatform/base.py", line 226, in _complete_future
    future.result()  # raises
  File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 438, in result
    return self.__get_result()
  File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 390, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/koles/.local/lib/python3.9/site-packages/google/cloud/aiplatform/base.py", line 316, in wait_for_dependencies_and_invoke
    result = method(*args, **kwargs)
  File "/home/koles/.local/lib/python3.9/site-packages/google/cloud/aiplatform/jobs.py", line 1496, in run
    self._gca_resource = self.api_client.create_custom_job(
  File "/home/koles/.local/lib/python3.9/site-packages/google/cloud/aiplatform_v1/services/job_service/client.py", line 794, in create_custom_job
    response = rpc(
  File "/home/koles/.local/lib/python3.9/site-packages/google/api_core/gapic_v1/method.py", line 154, in __call__
    return wrapped_func(*args, **kwargs)
  File "/home/koles/.local/lib/python3.9/site-packages/google/api_core/grpc_helpers.py", line 52, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.NotFound: 404 custom_job.job_spec.service_account must be specified when uploading to TensorBoard.

I followed the xmanager setup instructions and then run the example from a clean GCP VM:

xmanager launch examples/cifar10_tensorflow_tpu/launcher.py

Thank you for the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant