Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instructions for GCP are not working for machine-type = f1-micro #1375

Closed
nscozzaro opened this issue Aug 24, 2019 · 8 comments
Closed

Instructions for GCP are not working for machine-type = f1-micro #1375

nscozzaro opened this issue Aug 24, 2019 · 8 comments

Comments

@nscozzaro
Copy link

nscozzaro commented Aug 24, 2019

When I follow the Zero to JupyterHub instructions for Google Cloud, they work perfectly for me. However, when I change the machine-type from n1-standard-2 to the smallest machine type f1-micro (https://cloud.google.com/compute/docs/machine-types) with num-nodes=8, the hub is not created and so the setup fails. My motivation, which I think may be shared by others, is to set up the most cost-efficient JupyterHub possible, at least for an initial setup.

The section in the instructions that I'm referring to is (https://zero-to-jupyterhub.readthedocs.io/en/latest/google/step-zero-gcp.html):

gcloud container clusters create \
  --machine-type n1-standard-2 \    # Trying --machine-type f1-micro doesn't work for me
  --num-nodes 2 \                   # For f1-micro it tells me I must have at least 3 nodes, so I set it to 8
  --zone us-central1-b \
  --cluster-version latest \
  <CLUSTERNAME>

If someone knows why this may fail, or confirm what I'm experiencing I'd appreciate it, thank you!

@manics
Copy link
Member

manics commented Aug 24, 2019

Could it be due to insufficient RAM or some other resource? What does kubectl describe pod show?

@nscozzaro
Copy link
Author

nscozzaro commented Aug 27, 2019

I create the a new cluster, and then when I run kubectl get node (as in step 5 on this page https://zero-to-jupyterhub.readthedocs.io/en/latest/google/step-zero-gcp.html) I see that not all of the nodes are running, even after an hour:
image

I then tried to understand why they're not ready by running kubectl describe node <nodename>, but I can't figure it out... (sorry if it's too much info below).

nscozzaro1@cloudshell:~ (studyhub-jupyterlab)$ kubectl describe node gke-studyhub-micro-default-pool-3722df19-32cb
Name:               gke-studyhub-micro-default-pool-3722df19-32cb
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/fluentd-ds-ready=true
                    beta.kubernetes.io/instance-type=f1-micro
                    beta.kubernetes.io/os=linux
                    cloud.google.com/gke-nodepool=default-pool
                    cloud.google.com/gke-os-distribution=cos
                    failure-domain.beta.kubernetes.io/region=us-central1
                    failure-domain.beta.kubernetes.io/zone=us-central1-b
                    kubernetes.io/hostname=gke-studyhub-micro-default-pool-3722df19-32cb
Annotations:        container.googleapis.com/instance_id: 4020851013522631237
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Mon, 26 Aug 2019 22:26:24 -0400
Taints:             node.kubernetes.io/unreachable:NoSchedule
Unschedulable:      false
Conditions:
  Type                          Status    LastHeartbeatTime                 LastTransitionTime                Reason                      Message
  ----                          ------    -----------------                 ------------------                ------                      -------
  ReadonlyFilesystem            False     Mon, 26 Aug 2019 23:11:41 -0400   Mon, 26 Aug 2019 22:26:16 -0400   FilesystemIsNotReadOnly     Filesystem is not read-only
  FrequentUnregisterNetDevice   Unknown   Mon, 26 Aug 2019 23:11:41 -0400   Mon, 26 Aug 2019 22:37:43 -0400   UnregisterNetDevice         Timeout when running plugin "/home/kubernetes/bin/log-counter": state - signal:
  FrequentKubeletRestart        Unknown   Mon, 26 Aug 2019 23:11:41 -0400   Mon, 26 Aug 2019 22:37:43 -0400   FrequentKubeletRestart      Timeout when running plugin "/home/kubernetes/bin/log-counter": state - signal:
  FrequentDockerRestart         Unknown   Mon, 26 Aug 2019 23:11:41 -0400   Mon, 26 Aug 2019 22:38:51 -0400   FrequentDockerRestart       Timeout when running plugin "/home/kubernetes/bin/log-counter": state - signal:
  FrequentContainerdRestart     Unknown   Mon, 26 Aug 2019 23:11:41 -0400   Mon, 26 Aug 2019 22:39:59 -0400   FrequentContainerdRestart   Timeout when running plugin "/home/kubernetes/bin/log-counter": state - signal:
  CorruptDockerOverlay2         Unknown   Mon, 26 Aug 2019 23:11:41 -0400   Mon, 26 Aug 2019 22:38:04 -0400   CorruptDockerOverlay2       Timeout when running plugin "/home/kubernetes/bin/log-counter": state - signal:
  KernelDeadlock                False     Mon, 26 Aug 2019 23:11:41 -0400   Mon, 26 Aug 2019 22:26:16 -0400   KernelHasNoDeadlock         kernel has no deadlock
  NetworkUnavailable            False     Mon, 26 Aug 2019 22:26:41 -0400   Mon, 26 Aug 2019 22:26:41 -0400   RouteCreated                RouteController created a route
  MemoryPressure                Unknown   Mon, 26 Aug 2019 22:55:06 -0400   Mon, 26 Aug 2019 22:55:49 -0400   NodeStatusUnknown           Kubelet stopped posting node status.
  DiskPressure                  Unknown   Mon, 26 Aug 2019 22:55:06 -0400   Mon, 26 Aug 2019 22:55:49 -0400   NodeStatusUnknown           Kubelet stopped posting node status.
  PIDPressure                   Unknown   Mon, 26 Aug 2019 22:55:06 -0400   Mon, 26 Aug 2019 22:55:49 -0400   NodeStatusUnknown           Kubelet stopped posting node status.
  Ready                         Unknown   Mon, 26 Aug 2019 22:55:06 -0400   Mon, 26 Aug 2019 22:55:49 -0400   NodeStatusUnknown           Kubelet stopped posting node status.
  OutOfDisk                     Unknown   Mon, 26 Aug 2019 22:26:24 -0400   Mon, 26 Aug 2019 22:55:49 -0400   NodeStatusNeverUpdated      Kubelet never posted node status.
Addresses:
  InternalIP:   10.128.0.34
  ExternalIP:   35.238.110.199
  InternalDNS:  gke-studyhub-micro-default-pool-3722df19-32cb.us-central1-b.c.studyhub-jupyterlab.internal
  Hostname:     gke-studyhub-micro-default-pool-3722df19-32cb.us-central1-b.c.studyhub-jupyterlab.internal
Capacity:
 attachable-volumes-gce-pd:  16
 cpu:                        1
 ephemeral-storage:          98868448Ki
 hugepages-2Mi:              0
 memory:                     600668Ki
 pods:                       110
Allocatable:
 attachable-volumes-gce-pd:  16
 cpu:                        940m
 ephemeral-storage:          47093746742
 hugepages-2Mi:              0
 memory:                     237148Ki
 pods:                       110
System Info:
 Machine ID:                 0298c99d7a519363c2a7cadea8cc794f
 System UUID:                0298C99D-7A51-9363-C2A7-CADEA8CC794F
 Boot ID:                    3c9a4a9a-98f8-47bb-97b7-da5548264f44
 Kernel Version:             4.14.137+
 OS Image:                   Container-Optimized OS from Google
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://18.9.7
 Kubelet Version:            v1.13.7-gke.24
 Kube-Proxy Version:         v1.13.7-gke.24
PodCIDR:                     10.32.7.0/24
ProviderID:                  gce://studyhub-jupyterlab/us-central1-b/gke-studyhub-micro-default-pool-3722df19-32cb
Non-terminated Pods:         (5 in total)
  Namespace                  Name                                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                  ----                                                        ------------  ----------  ---------------  -------------  ---
  kube-system                fluentd-gcp-v3.2.0-ccw8z                                    0 (0%)        0 (0%)      0 (0%)           0 (0%)         46m
  kube-system                heapster-v1.6.1-67f9f6f96c-fbhg5                            50m (5%)      50m (5%)    93560Ki (39%)    93560Ki (39%)  38m
  kube-system                kube-dns-6987857fdb-pzpt4                                   260m (27%)    0 (0%)      110Mi (47%)      170Mi (73%)    38m
  kube-system                kube-proxy-gke-studyhub-micro-default-pool-3722df19-32cb    100m (10%)    0 (0%)      0 (0%)           0 (0%)         46m
  kube-system                prometheus-to-sd-6r9pn                                      1m (0%)       3m (0%)     20Mi (8%)        20Mi (8%)      46m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                   Requests        Limits
  --------                   --------        ------
  cpu                        411m (43%)      53m (5%)
  memory                     226680Ki (95%)  288120Ki (121%)
  ephemeral-storage          0 (0%)          0 (0%)
  attachable-volumes-gce-pd  0               0
Events:
  Type    Reason                     Age                From                                                            Message
  ----    ------                     ----               ----                                                            -------
  Normal  FrequentDockerRestart      58m                systemd-monitor, gke-studyhub-micro-default-pool-3722df19-32cb  Node condition FrequentDockerRestart is now: Unknown, reason: FrequentDockerRestart
  Normal  FrequentContainerdRestart  57m                systemd-monitor, gke-studyhub-micro-default-pool-3722df19-32cb  Node condition FrequentContainerdRestart is now: Unknown, reason: FrequentContainerdRestart
  Normal  Starting                   46m                kubelet, gke-studyhub-micro-default-pool-3722df19-32cb          Starting kubelet.
  Normal  NodeHasSufficientMemory    46m (x2 over 46m)  kubelet, gke-studyhub-micro-default-pool-3722df19-32cb          Node gke-studyhub-micro-default-pool-3722df19-32cb status is now: NodeHasSufficientMemory
  Normal  NodeHasNoDiskPressure      46m (x2 over 46m)  kubelet, gke-studyhub-micro-default-pool-3722df19-32cb          Node gke-studyhub-micro-default-pool-3722df19-32cb status is now: NodeHasNoDiskPressure
  Normal  NodeHasSufficientPID       46m (x2 over 46m)  kubelet, gke-studyhub-micro-default-pool-3722df19-32cb          Node gke-studyhub-micro-default-pool-3722df19-32cb status is now: NodeHasSufficientPID
  Normal  NodeAllocatableEnforced    46m                kubelet, gke-studyhub-micro-default-pool-3722df19-32cb          Updated Node Allocatable limit across pods
  Normal  NodeReady                  46m                kubelet, gke-studyhub-micro-default-pool-3722df19-32cb          Node gke-studyhub-micro-default-pool-3722df19-32cb status is now: NodeReady
  Normal  Starting                   46m                kube-proxy, gke-studyhub-micro-default-pool-3722df19-32cb       Starting kube-proxy.
  Normal  CorruptDockerOverlay2      41m                docker-monitor, gke-studyhub-micro-default-pool-3722df19-32cb   Node condition CorruptDockerOverlay2 is now: False, reason: CorruptDockerOverlay2
  Normal  FrequentKubeletRestart     41m                systemd-monitor, gke-studyhub-micro-default-pool-3722df19-32cb  Node condition FrequentKubeletRestart is now: False, reason: FrequentKubeletRestart
  Normal  UnregisterNetDevice        41m                kernel-monitor, gke-studyhub-micro-default-pool-3722df19-32cb   Node condition FrequentUnregisterNetDevice is now: False, reason: UnregisterNetDevice
  Normal  FrequentDockerRestart      41m                systemd-monitor, gke-studyhub-micro-default-pool-3722df19-32cb  Node condition FrequentDockerRestart is now: False, reason: FrequentDockerRestart
  Normal  FrequentContainerdRestart  41m                systemd-monitor, gke-studyhub-micro-default-pool-3722df19-32cb  Node condition FrequentContainerdRestart is now: False, reason: FrequentContainerdRestart
  Normal  Starting                   37m                kubelet, gke-studyhub-micro-default-pool-3722df19-32cb          Starting kubelet.
  Normal  NodeNotReady               37m                kubelet, gke-studyhub-micro-default-pool-3722df19-32cb          Node gke-studyhub-micro-default-pool-3722df19-32cb status is now: NodeNotReady
  Normal  NodeHasSufficientMemory    37m                kubelet, gke-studyhub-micro-default-pool-3722df19-32cb          Node gke-studyhub-micro-default-pool-3722df19-32cb status is now: NodeHasSufficientMemory
  Normal  NodeHasSufficientPID       37m                kubelet, gke-studyhub-micro-default-pool-3722df19-32cb          Node gke-studyhub-micro-default-pool-3722df19-32cb status is now: NodeHasSufficientPID
  Normal  NodeHasNoDiskPressure      37m                kubelet, gke-studyhub-micro-default-pool-3722df19-32cb          Node gke-studyhub-micro-default-pool-3722df19-32cb status is now: NodeHasNoDiskPressure
  Normal  NodeAllocatableEnforced    37m                kubelet, gke-studyhub-micro-default-pool-3722df19-32cb          Updated Node Allocatable limit across pods
  Normal  NodeReady                  37m                kubelet, gke-studyhub-micro-default-pool-3722df19-32cb          Node gke-studyhub-micro-default-pool-3722df19-32cb status is now: NodeReady
  Normal  FrequentKubeletRestart     35m                systemd-monitor, gke-studyhub-micro-default-pool-3722df19-32cb  Node condition FrequentKubeletRestart is now: Unknown, reason: FrequentKubeletRestart
  Normal  UnregisterNetDevice        35m                kernel-monitor, gke-studyhub-micro-default-pool-3722df19-32cb   Node condition FrequentUnregisterNetDevice is now: Unknown, reason: UnregisterNetDevice
  Normal  CorruptDockerOverlay2      35m                docker-monitor, gke-studyhub-micro-default-pool-3722df19-32cb   Node condition CorruptDockerOverlay2 is now: Unknown, reason: CorruptDockerOverlay2
  Normal  FrequentDockerRestart      34m                systemd-monitor, gke-studyhub-micro-default-pool-3722df19-32cb  Node condition FrequentDockerRestart is now: Unknown, reason: FrequentDockerRestart
  Normal  FrequentContainerdRestart  33m                systemd-monitor, gke-studyhub-micro-default-pool-3722df19-32cb  Node condition FrequentContainerdRestart is now: Unknown, reason: FrequentContainerdRestart
  Normal  Starting                   18m                kubelet, gke-studyhub-micro-default-pool-3722df19-32cb          Starting kubelet.
  Normal  NodeHasSufficientMemory    18m (x2 over 18m)  kubelet, gke-studyhub-micro-default-pool-3722df19-32cb          Node gke-studyhub-micro-default-pool-3722df19-32cb status is now: NodeHasSufficientMemory
  Normal  NodeHasNoDiskPressure      18m (x2 over 18m)  kubelet, gke-studyhub-micro-default-pool-3722df19-32cb          Node gke-studyhub-micro-default-pool-3722df19-32cb status is now: NodeHasNoDiskPressure
  Normal  NodeHasSufficientPID       18m (x2 over 18m)  kubelet, gke-studyhub-micro-default-pool-3722df19-32cb          Node gke-studyhub-micro-default-pool-3722df19-32cb status is now: NodeHasSufficientPID
  Normal  NodeNotReady               18m                kubelet, gke-studyhub-micro-default-pool-3722df19-32cb          Node gke-studyhub-micro-default-pool-3722df19-32cb status is now: NodeNotReady
  Normal  NodeAllocatableEnforced    18m                kubelet, gke-studyhub-micro-default-pool-3722df19-32cb          Updated Node Allocatable limit across pods
  Normal  NodeReady                  18m                kubelet, gke-studyhub-micro-default-pool-3722df19-32cb          Node gke-studyhub-micro-default-pool-3722df19-32cb status is now: NodeReady
nscozzaro1@cloudshell:~ (studyhub-jupyterlab)$

@manics
Copy link
Member

manics commented Aug 27, 2019

Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                   Requests        Limits
  --------                   --------        ------
  cpu                        411m (43%)      53m (5%)
  memory                     226680Ki (95%)  288120Ki (121%)

Almost all the memory is used, and f1-micro is limited to an average of 0.2 CPU which has been exceeded just by the k8s system pods: https://cloud.google.com/compute/docs/machine-types#sharedcore

@nscozzaro
Copy link
Author

Thanks @manics, just so I understand, are you suggesting it might not be possible to run kubernetes on f1-micro?
I tried Googling and found this issue (kubernetes/kubernetes#44273 (comment)), but it seems I'm already using an adequately up-to-date cluster version so I don't know how to explain why it's not working.

Tonight when I tried, the nodes were momentarily ready, but then running helm init --service-account tiller --wait timed out, after which the nodes were no longer ready:

nscozzaro1@cloudshell:~ (studyhub-jupyterlab)$ gcloud container clusters create --machine-type f1-micro --num-nodes 3 --zone us-central1-b --cluster-version 1.13.7-gke.24 studyhub-micro
Creating cluster studyhub-micro in us-central1-b... Cluster is being health-checked (master is healthy)...done.
Created [https://container.googleapis.com/v1/projects/studyhub-jupyterlab/zones/us-central1-b/clusters/studyhub-micro].
To inspect the contents of your cluster, go to: https://console.cloud.google.com/kubernetes/workload_/gcloud/us-central1-b/studyhub-micro?project=studyhub-jupyterlab
kubeconfig entry generated for studyhub-micro.
NAME            LOCATION       MASTER_VERSION  MASTER_IP     MACHINE_TYPE  NODE_VERSION   NUM_NODES  STATUS
studyhub-micro  us-central1-b  1.13.7-gke.24   34.66.217.57  f1-micro      1.13.7-gke.24  3          RUNNING
nscozzaro1@cloudshell:~ (studyhub-jupyterlab)$ kubectl get node
NAME                                            STATUS   ROLES    AGE   VERSION
gke-studyhub-micro-default-pool-d1bf9b21-2x12   Ready    <none>   21s   v1.13.7-gke.24
gke-studyhub-micro-default-pool-d1bf9b21-pg4c   Ready    <none>   21s   v1.13.7-gke.24
gke-studyhub-micro-default-pool-d1bf9b21-s3wr   Ready    <none>   21s   v1.13.7-gke.24
nscozzaro1@cloudshell:~ (studyhub-jupyterlab)$ kubectl create clusterrolebinding cluster-admin-binding --clusterrole=cluster-admin --user=nscozzaro1@gmail.com
clusterrolebinding.rbac.authorization.k8s.io/cluster-admin-binding created
nscozzaro1@cloudshell:~ (studyhub-jupyterlab)$ kubectl --namespace kube-system create serviceaccount tiller
serviceaccount/tiller created
nscozzaro1@cloudshell:~ (studyhub-jupyterlab)$ kubectl create clusterrolebinding tiller --clusterrole cluster-admin --serviceaccount=kube-system:tiller
clusterrolebinding.rbac.authorization.k8s.io/tiller created
nscozzaro1@cloudshell:~ (studyhub-jupyterlab)$ kubectl get node
NAME                                            STATUS   ROLES    AGE   VERSION
gke-studyhub-micro-default-pool-d1bf9b21-2x12   Ready    <none>   68s   v1.13.7-gke.24
gke-studyhub-micro-default-pool-d1bf9b21-pg4c   Ready    <none>   68s   v1.13.7-gke.24
gke-studyhub-micro-default-pool-d1bf9b21-s3wr   Ready    <none>   68s   v1.13.7-gke.24
nscozzaro1@cloudshell:~ (studyhub-jupyterlab)$ helm init --service-account tiller --wait
$HELM_HOME has been configured at /home/nscozzaro1/.helm.

Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster.

Please note: by default, Tiller is deployed with an insecure 'allow unauthenticated users' policy.
To prevent this, run `helm init` with the --tiller-tls-verify flag.
For more information on securing your installation see: https://docs.helm.sh/using_helm/#securing-your-helm-installation
Error: tiller was not found. polling deadline exceeded
nscozzaro1@cloudshell:~ (studyhub-jupyterlab)$ kubectl get node                                                                                                   
NAME                                            STATUS     ROLES    AGE     VERSION
gke-studyhub-micro-default-pool-d1bf9b21-2x12   Ready      <none>   7m51s   v1.13.7-gke.24
gke-studyhub-micro-default-pool-d1bf9b21-pg4c   NotReady   <none>   7m51s   v1.13.7-gke.24
gke-studyhub-micro-default-pool-d1bf9b21-s3wr   NotReady   <none>   7m51s   v1.13.7-gke.24

@consideRatio
Copy link
Member

I tried the same once, i concluded that it was a bit of a stretch to go for the micronode.

I settled for a n1-standard-1 for basic jh stuff and then allowing things to autoscale.

Making that single node preemtible could be a good way to reduce cost further i think but that would also allow the service to crash now and then.

When i got the 1 core setup functional, i removed some resource requests i think.

On mobile omw to work, so its hard to provide an example for me now on setting this but i think there are notes about this in the z2jh.jupyter.org guide. Note you want to reduce resource requets for the hub, proxy, and singleuser pod, not only one of these for example.

Also, you may want to trim the requests of system pods like kube-dns etc by going kubectl edit on the deployment or what the underlying controll of the pods are.

@manics
Copy link
Member

manics commented Aug 28, 2019

just so I understand, are you suggesting it might not be possible to run kubernetes on f1-micro?

That's what the resource usage and limits suggest. f1-micro is burstable which means you can use more than 0.2 CPU for short periods but it'll then be throttled again, this may explain why things may temporarily work.

@nscozzaro
Copy link
Author

@consideRatio thank you for those insights... I looked into preemptible nodes, but it seems they only last 24 hours? Since it seems you have some good experience with this, I would be grateful if you would mind sharing the exact steps (commands) to set up the cheapest JupyterHub on GCP/kubernetes that can be permanently running?

For reference, below is a summary of the instructions from the the z2jh docs that work for me for launching the current site that I have at studyhub.co (I've left out customizing my config.yaml). How would these change to implement your suggestions?

gcloud container clusters create --machine-type n1-standard-2 --num-nodes 1 --zone us-central1-b --cluster-version latest studyhub

kubectl create clusterrolebinding cluster-admin-binding --clusterrole=cluster-admin --user=nscozzaro1@gmail.com
kubectl --namespace kube-system create serviceaccount tiller
kubectl create clusterrolebinding tiller --clusterrole cluster-admin --serviceaccount=kube-system:tiller

helm init --service-account tiller --wait

kubectl patch deployment tiller-deploy --namespace=kube-system --type=json --patch='[{"op": "add", "path": "/spec/template/spec/containers/0/command", "value": ["/tiller", "--listen=localhost:44134"]}]'

helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
helm repo update
helm upgrade --install jhub jupyterhub/jupyterhub --namespace jhub --version=0.9-470ec04 --values config.yaml

@consideRatio
Copy link
Member

@nscozzaro sorry for now following up at the time, I'd say its fully possible to get a 1 CPU deployment. The procedure is pretty much to shut down all not needed containers, and minimize the resource requests on all containers.

While working to do so, its relevant to know that there is sometimes a k8s resource that influence the default requests pods make: the LimitRange resource. kubectl get limitrange, see https://kubernetes.io/docs/concepts/policy/limit-range/ for more info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants