Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot create cluster #65

Closed
eschnett opened this issue Aug 2, 2020 · 5 comments
Closed

Cannot create cluster #65

eschnett opened this issue Aug 2, 2020 · 5 comments
Assignees

Comments

@eschnett
Copy link
Contributor

eschnett commented Aug 2, 2020

I want to create a GKE cluster following your instructions. I have set up the cloud tools, authentication, etc. I receive this error message:

$ caliban cluster create --cluster_name einsteintoolkit-cluster --zone us-east1-b
I0801 23:07:37.349657 4416302528 cli.py:185] creating cluster einsteintoolkit-cluster in project fifth-curve-272318 in us-east1-b...
I0801 23:07:37.349900 4416302528 cli.py:186] please be patient, this may take several minutes
I0801 23:07:37.349989 4416302528 cli.py:188] visit https://console.cloud.google.com/kubernetes/clusters/details/us-east1-b/einsteintoolkit-cluster?project=fifth-curve-272318 to monitor cluster creation progress
E0801 23:07:37.582320 4416302528 util.py:68] exception in call <function Cluster.create at 0x7ffc08ba4a70>:
<HttpError 400 when requesting https://container.googleapis.com/v1beta1/projects/fifth-curve-272318/zones/us-east1-b/clusters?alt=json returned "Resource_limit.maximum must be greater than 0.">
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.7/site-packages/caliban-0.3.0+8.gaf9dd99-py3.7.egg/caliban/platform/gke/util.py", line 65, in wrapper
    response = fn(*args, **kwargs)
  File "/opt/anaconda3/lib/python3.7/site-packages/caliban-0.3.0+8.gaf9dd99-py3.7.egg/caliban/platform/gke/cluster.py", line 1178, in create
    rsp = request.execute()
  File "/opt/anaconda3/lib/python3.7/site-packages/google_api_python_client-1.10.0-py3.7.egg/googleapiclient/_helpers.py", line 134, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/opt/anaconda3/lib/python3.7/site-packages/google_api_python_client-1.10.0-py3.7.egg/googleapiclient/http.py", line 907, in execute
    raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 400 when requesting https://container.googleapis.com/v1beta1/projects/fifth-curve-272318/zones/us-east1-b/clusters?alt=json returned "Resource_limit.maximum must be greater than 0.">

When I use --dry_run, I see these details:

$ caliban cluster create --cluster_name einsteintoolkit-cluster --zone us-east1-b --dry_run
I0801 23:08:27.893903 4585823680 cli.py:175] request:
{'cluster': {'autoscaling': {'autoprovisioningNodePoolDefaults': {'oauthScopes': ['https://www.googleapis.com/auth/compute',
                                                                                  'https://www.googleapis.com/auth/cloud-platform']},
                             'enableNodeAutoprovisioning': 'true',
                             'resourceLimits': [{'maximum': '24',
                                                 'resourceType': 'cpu'},
                                                {'maximum': '1536',
                                                 'resourceType': 'memory'},
                                                {'maximum': '1',
                                                 'resourceType': 'nvidia-tesla-k80'},
                                                {'maximum': '1',
                                                 'resourceType': 'nvidia-tesla-p100'},
                                                {'maximum': '1',
                                                 'resourceType': 'nvidia-tesla-v100'},
                                                {'maximum': '1',
                                                 'resourceType': 'nvidia-tesla-p4'},
                                                {'maximum': '1',
                                                 'resourceType': 'nvidia-tesla-t4'},
                                                {'maximum': '0',
                                                 'resourceType': 'nvidia-tesla-a100'}]},
             'enable_tpu': 'true',
             'ipAllocationPolicy': {'useIpAliases': 'true'},
             'locations': ['us-east1-b', 'us-east1-c', 'us-east1-d'],
             'name': 'einsteintoolkit-cluster',
             'nodePools': [{'config': {'oauthScopes': ['https://www.googleapis.com/auth/devstorage.read_only',
                                                       'https://www.googleapis.com/auth/logging.write',
                                                       'https://www.googleapis.com/auth/monitoring',
                                                       'https://www.googleapis.com/auth/service.management.readonly',
                                                       'https://www.googleapis.com/auth/servicecontrol',
                                                       'https://www.googleapis.com/auth/trace.append']},
                            'initialNodeCount': '3',
                            'name': 'default-pool'}],
             'releaseChannel': {'channel': 'REGULAR'},
             'zone': 'us-east1-b'},
 'parent': 'projects/fifth-curve-272318/locations/us-east1-b'}

There is indeed a resource request with a maximum of 0.

I am using the current master branch.

@sritchie
Copy link
Collaborator

sritchie commented Aug 2, 2020

Thanks for the report, @eschnett ! @ajslone, our resident GKE expert, should be able to help out here.

@eschnett
Copy link
Contributor Author

eschnett commented Aug 2, 2020

This change avoids the error:

diff --git a/caliban/platform/gke/util.py b/caliban/platform/gke/util.py
index 9a754ea..0d70c45 100644
--- a/caliban/platform/gke/util.py
+++ b/caliban/platform/gke/util.py
@@ -395,9 +395,10 @@ def resource_limits_from_quotas(
     gd = gpu_match.groupdict()
     gpu_type = gd['gpu']

-    limits.append({
-        'resourceType': 'nvidia-tesla-{}'.format(gpu_type.lower()),
-        'maximum': str(limit)
+    if limit > 0:
+      limits.append({
+          'resourceType': 'nvidia-tesla-{}'.format(gpu_type.lower()),
+          'maximum': str(limit)
     })

   return limits

@ajslone
Copy link
Collaborator

ajslone commented Aug 2, 2020

Erik, thanks for the bug report, and for the fix. I'll merge that in shortly.

ajslone added a commit that referenced this issue Aug 2, 2020
This addresses an error where we specify resource maxima for GKE clusters using quota information returned from the GCP compute API. The API can return a quota of zero for a given resource type, but requesting a limit of zero in the cluster creation API causes an error, so this change prevents setting a limit of zero for any resource.
@ajslone
Copy link
Collaborator

ajslone commented Aug 2, 2020

Erik, I have merged this change, please let me know if this does not resolve your issue. Thanks again for the bug report and for using Caliban + GKE.

@eschnett
Copy link
Contributor Author

eschnett commented Aug 2, 2020

This change resolves the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants