Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Misleading error message in RayService when upgrading to KubeRay v1.1.0 #2088

Open
1 of 2 tasks
kevin85421 opened this issue Apr 18, 2024 · 3 comments
Open
1 of 2 tasks
Assignees
Labels
bug Something isn't working rayservice

Comments

@kevin85421
Copy link
Member

Search before asking

  • I searched the issues and found no similar issues.

KubeRay Component

ray-operator

What happened + What you expected to happen

 "error": "strconv.Atoi: parsing \"\": invalid syntax",
    "level": "error",
    "logger": "controllers.RayService",
    "msg": "Failed to serialize new RayCluster config. Manual config updates will NOT be tracked accurately. Please manually tear down the cluster and apply a new config.",
    "stacktrace": "github.com/ray-project/kuberay/ray-operator/controllers/ray.(*RayServiceReconciler).shouldPrepareNewRayCluster\n\t/home/runner/work/kuberay/kuberay/ray-operator/controllers/ray/rayservice_controller.go:556\ngithub.com/ray-project/kuberay/ray-operator/controllers/ray.(*RayServiceReconciler).reconcileRayCluster\n\t/home/runner/work/kuberay/kuberay/ray-operator/controllers/ray/rayservice_controller.go:397\ngithub.com/ray-project/kuberay/ray-operator/controllers/ray.(*RayServiceReconciler).Reconcile\n\t/home/runner/work/kuberay/kuberay/ray-operator/controllers/ray/rayservice_controller.go:126\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.16.3/pkg/internal/controller/controller.go:227",
    "ts": "2024-03-30T00:53:31.584Z"
  }

Reproduction script

The user uses ArgoCD to upgrade (1) the KubeRay operator and (2) the CRD with a running RayService. Then, the KubeRay operator will print the error message above, but the RayService still functions well (e.g., in-place updates still work). We need to figure out the reason for printing the misleading message.

Anything else

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@kevin85421 kevin85421 added bug Something isn't working triage labels Apr 18, 2024
@kevin85421 kevin85421 self-assigned this Apr 18, 2024
@kevin85421 kevin85421 changed the title [Bug] [Bug] Misleading error message in RayService when upgrading to KubeRay v1.1.0 Apr 18, 2024
@tmyhu
Copy link

tmyhu commented Apr 25, 2024

The error message may not be misleading. I've run into the same after updating KubeRay operator to v1.1.0 and no updates to the ray cluster config were applied anymore e.g. updating the Ray version afterwards. I had to delete and recreate the whole RayService so that a new cluster would be created and only after that, further updates were reconciled again as expected. (And this error message did not appear anymore)

@sfrolich
Copy link

I am also using ArgoCD to upgrade the Kuberay operator from 1.0.0 to 1.1.1 then updating my RayService from 2.10.0 to 2.22.0 but the RayService is not upgrading. The existing 2.10.0 cluster is still there and no new pods to create a new 2.22.0 cluster are created. I think this is the same as what @tmyhu commented above.

For some customers deleting and re-creating the RayService is not possible because we are running production applications that cannot have downtime.

@sfrolich
Copy link

@kevin85421 should I open a new bug on RayService not upgrading or is this one good enough (even though the title doesn't match my problem)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working rayservice
Projects
None yet
Development

No branches or pull requests

3 participants