Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubectl wait on arbitrary jsonpath #83094

Closed
dims opened this issue Sep 25, 2019 · 52 comments · Fixed by #105776
Closed

kubectl wait on arbitrary jsonpath #83094

dims opened this issue Sep 25, 2019 · 52 comments · Fixed by #105776
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/cli Categorizes an issue or PR as relevant to SIG CLI. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@dims
Copy link
Member

dims commented Sep 25, 2019

kubectl currently can wait on condition or delete
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/kubectl/pkg/cmd/wait/wait.go#L110-L111

In addition to the above, can we please add the ability to wait on high level summary that we have phase?
https://cs.k8s.io/?q=Phase%20.*json%3A%22phase&i=nope&files=&repos=kubernetes/kubernetes

why? Looks like we are not adding conditions anymore, so CRD(s) seem to have picked up the phase at least. Will help avoid having to write loops like this ( from https://github.com/kubernetes-sigs/cluster-api-provider-gcp/pull/175/files )

  # Wait till all machines are running (bail out at 30 mins)
  attempt=0
  while true; do
    kubectl get machines --kubeconfig=$(kind get kubeconfig-path --name="clusterapi")
    read running total <<< $(kubectl get machines --kubeconfig=$(kind get kubeconfig-path --name="clusterapi") \
      -o json | jq -r '.items[].status.phase' | awk '/running/ {count++} END{print count " " NR}') ;
    if [[ $total == "5" && $running == "5" ]]; then
      return
    fi
    timestamp=$(date +"[%H:%M:%S]")
    if [ $attempt -gt 180 ]; then
      echo "cluster did not start in 30 mins ... bailing out!"
      exit 1
    fi
    echo "$timestamp Total machines : $total / Running : $running .. waiting for 10 seconds"
    sleep 10
    attempt=$((attempt+1))
  done

Another example is : https://github.com/kubernetes-incubator/kube-aws/blob/master/contrib/cluster-backup/restore.sh#L80-L86

@dims dims added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 25, 2019
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Sep 25, 2019
@dims
Copy link
Member Author

dims commented Sep 25, 2019

/sig cli

@k8s-ci-robot k8s-ci-robot added sig/cli Categorizes an issue or PR as relevant to SIG CLI. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 25, 2019
@liggitt
Copy link
Member

liggitt commented Sep 25, 2019

Looks like we are not adding conditions anymore

The reports of conditions' death have been greatly exaggerated :)

Rather than a specific phase field, a jsonpath evaluation and expected value might be interesting

@dims
Copy link
Member Author

dims commented Sep 25, 2019

something like the following?

wait --for='jsonpath={.status.phase == "Running"}'
wait --for='jsonpath={.items..status.phase == "Ready"}'

@dims
Copy link
Member Author

dims commented Sep 25, 2019

guess what i want to see is ... wait until all the items reached the phase (or) wait until the single item reached the phase i specified. right?

@dims
Copy link
Member Author

dims commented Sep 26, 2019

cc @hpandeycodeit

@dims
Copy link
Member Author

dims commented Sep 26, 2019

@deads2k Can you please suggest some command lines that will be useful for a "generic wait"?

@hpandeycodeit
Copy link

/assign

@deads2k
Copy link
Contributor

deads2k commented Oct 16, 2019

Conditions are generally better than phase. Phases were a mistake in our original pod implementation because they implied a single unified state machine, but we found that instead we had multiple orthogonal states that are better represented as separate conditions.

You could try kubectl wait --for=jsonpath=.status.phase=wish-i-was-a-condition

@dims
Copy link
Member Author

dims commented Oct 16, 2019

A bit more on chat:

dims 10:18 AM if the resource.group/resource.name | resource.group has multiple items, then we ensure that the jsonpath is true for each? what happens when there is nothing selected?
deads2k 10:19 AM I think the jsonpath must result in a single node. If it results in multiple nodes, it should fail. If it results in zero nodes, it waits but we shouldn't codify phase

@aparedero
Copy link

@deads2k But how can you capture when a pod is up and running, but not ready?
Just a small example with Hashicorp Vault Helm and this being managed with Ansible.

Putting Ansible aside this helm manifest deploys a pod with Vault but it won't be ready until the application has been configured; so is 0/1 but in running state (due to a readiness probe).

NAME      READY   STATUS    RESTARTS   AGE
vault-0   0/1     Running   0          3m55s

Currently tt's required to execute some kubectl exec commands into this pod to become ready.

As I'm automating the process with Ansible I've tried creating a task with the bare command
kubectl wait --for=condition=Initialized pod/vault-0 --timeout=90s but its not working because the initalized state does not reflect the pod is still being created, so the "phase" is the most accured state to wait.

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2019-11-04T10:54:55Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2019-11-04T10:54:55Z"
    message: 'containers with unready status: [vault]'
    reason: ContainersNotReady
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2019-11-04T10:54:55Z"
    message: 'containers with unready status: [vault]'
    reason: ContainersNotReady
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2019-11-04T10:54:55Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - image: vault:1.2.2
    imageID: ""
    lastState: {}
    name: vault
    ready: false
    restartCount: 0
    state:
      waiting:
        reason: ContainerCreating
  hostIP: 172.16.16.97
  phase: Pending
  qosClass: BestEffort
  startTime: "2019-11-04T10:54:55Z"

As a workaround, I'm handling the error of not ready adding retries in Ansible, but I think it could be interesting having more control from K8s

@nicorikken
Copy link

I have a similar use-case, that can benefit from this feature. I'm waiting for a Kubernetes Job that runs an integration test. I want to wait for the output in my Jenkins CI/CD tool.

Now we wait for completion:

kubectl wait --for=condition=complete --timeout=${timeoutMinutes}m job/${jobName} --namespace ${namespace}

But if the job fails, the waiting continues until the timeout, as the failed condition is not equal to the complete condition. I prefer to have an early return on waiting, so I can retrieve the logs and mark the CI/CD job as failed and notify the developers.

Design

Logical conditions

I was expecting the CLI to take multiple arguments as conditions:

kubectl wait --for=condition=failed,complete

But that doesn't work (yet 🤞 ). As a job can only have one condition, providing multiple options as OR logic seems to be sufficient for all use-cases.

Generic

Going by this thread, I could use something like:

kubectl wait --for='jsonpath={ (.status.conditions.type == "Complete") or (.status.conditions.type == "Failed" }'

If I understand correctly, the logic operators are currently lacking in the Kubectl JSONpath syntax.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 18, 2020
@nicorikken
Copy link

/remove-lifecycle stale

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 17, 2020
@qalinn
Copy link

qalinn commented Jul 16, 2021

any update on this thread ?

@AjithPanneerselvam
Copy link

/assign

@MadhavJivrajani
Copy link
Contributor

MadhavJivrajani commented Aug 11, 2021

/unassign @hpandeycodeit
Un-assigning due to inactivity.
Please feel free to provide inputs/feedback. And also re-assign it to yourself at a later point if @AjithPanneerselvam can't work on this, thanks! :)

@joca-bt
Copy link

joca-bt commented Sep 15, 2021

We also need this. It's flexible enough to cover lots of use cases.

@dims
Copy link
Member Author

dims commented Sep 15, 2021

it's been more than an a month
/unassign AjithPanneerselvam

@lauchokyip
Copy link
Member

lauchokyip commented Sep 17, 2021

Will work on this incorporate with PR #92277 and #83959.
I am new to kubectl wait and JSON handling so definitely need some time to come out with a PR. Please be patient :)

/assign

Update (10/4): A bit busy, but making little progress. Have tested Jsonpath function (https://github.com/kubernetes/client-go/blob/master/util/jsonpath/jsonpath.go) , will start implementation soon

Update (10/19): Finished implementation, will clean up and add more tests soon.

Update (10/25): Adding unit test cases

Update (10/27): Finished testing. Ready to review and merge

@ldemailly
Copy link

which version of kubectl has this? any example btw?

@lauchokyip
Copy link
Member

which version of kubectl has this? any example btw?

@ldemailly it's available from kubectl version 1.23, you can try kubectl wait --help to look for examples

@minherz
Copy link
Contributor

minherz commented Mar 11, 2023

It seems that the current --for=jsonpath syntax does not support waiting for any ("non-empty") value.
In my use case, I would like to wait until an instance of the external load balancer is provisioned. I cannot do it on the instance of the load balancer itself because I cannot identify it using K8s. So, it leaves me with the readiness indicator that is the status.loadBalancer.ingress field of the service. If I call wait for the value of status.loadBalancer.ingress[0].ip, I need to provide a concrete "value". But I cannot know the IP of the load balancer.

Is it possible to modify the existing syntax of --for=jsonpath='{}'=value to support also return on non-empty value when querying for jsonpath only: --for=jsonpath='{}'?

chrisd8088 added a commit to chrisd8088/mongo-express that referenced this issue Mar 15, 2023
As the Kubernetes load balancer service for the mongo-express application
does not require redeployment on every application change, add a dispatch
workflow to perform the deployment of the load balancer service only
when triggered by an administrative dispatch event.

This GitHub Actions workflow expects an Azure Kubernetes Service cluster
to exist and for its name and resource group to be configured as
GitHub Actions secrets, as well as the Azure login credentials.

After deployment of the persistent volumes manifest, the workflow
uses kubectl to wait for the service's ingress IP to be assigned.
Unfortunately this can not be done exclusively with the "kubectl wait"
command as its JSON condition matching requires a fixed string, and
the IP address cannot be known in advance.  Instead, use "kubectl get"
in a loop and wait for the load balancer's status to contain an
"ingress" key, as suggested in:

kubernetes/kubernetes#80828 (comment)

See also this concern regarding how JSON condition matching was
implemented for "kubectl wait":

kubernetes/kubernetes#83094 (comment)
@jonashackt
Copy link

jonashackt commented Mar 28, 2023

Would be highly appreciated to have kubectl wait be usable with status.loadBalancer.ingress[0].ip or status.loadBalancer.ingress[0].hostname (as also discussed here and here). Allowing an empty value like @minherz suggested would be a great option. Maybe this needs another issue, since that one here is closed?

@minherz
Copy link
Contributor

minherz commented Apr 8, 2023

@lauchokyip can you please advise if the implementation that you merged can be used to reach the goal mentioned in these two comments (1 and 2)?

@lauchokyip
Copy link
Member

Hi @minherz , thanks for bringing it up. I used to contribute during my free time but currently I am looking for new opportunities so I don't have the resources to do that. Would you be able to bring it up to sig-cli meeting to get their attention? Much appreciated

@minherz
Copy link
Contributor

minherz commented Apr 12, 2023

@lauchokyip thank you. I've submitted a discussion topic for the next sig-cli bi-weekly meeting. I will post the results later.

@minherz
Copy link
Contributor

minherz commented May 3, 2023

Following today's SIG CLI meeting, I have created a new feature request #117761.

@minherz
Copy link
Contributor

minherz commented Jul 4, 2023

@jonashackt I hope that the change in the version 1.28 should allow to wait for a K8s service to get its IPs. Syntax is supposed to be like:

kubectl wait --for=jsonpath='{.status.loadBalancer.ingress}' service/my_lb_service

or the old style:

kubectl wait --for=jsonpath='{.status.loadBalancer.ingress[0].ip}' service/my_lb_service

@volatilemolotov
Copy link

@jonashackt I hope that the change in the version 1.28 should allow to wait for a K8s service to get its IPs. Syntax is supposed to be like:

kubectl wait --for=jsonpath='{.status.loadBalancer.ingress}' service/my_lb_service

or the old style:

kubectl wait --for=jsonpath='{.status.loadBalancer.ingress[0].ip}' service/my_lb_service

This works in 1.28

@minherz
Copy link
Contributor

minherz commented Sep 21, 2023

@volatilemolotov thank you for confirming that!

mauriciovasquezbernal added a commit to inspektor-gadget/inspektor-gadget that referenced this issue Feb 12, 2024
Tests on ARO were failing with: "error: no matching resources found",
this because it's not possible to watch a resource that hasn't been
created.

See kubernetes/kubernetes#83094 & radius-project/radius#6914

Fixes: 4e1bb64 ("integration: Disable installing Security Profile Operator")

Signed-off-by: Mauricio Vásquez <mauriciov@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/cli Categorizes an issue or PR as relevant to SIG CLI. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet