Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Launcher only reports Pod errors/log whereas others can happen #793

Closed
1 task done
guimou opened this issue Nov 21, 2022 · 7 comments · Fixed by #1703
Closed
1 task done

[Bug]: Launcher only reports Pod errors/log whereas others can happen #793

guimou opened this issue Nov 21, 2022 · 7 comments · Fixed by #1703
Assignees
Labels
feature/ds-projects Data Science Projects feature (formerly Data Science Groupings - DSG) feature/notebook-controller KubeFlow NoteBook Controller (KFNBC) Feature field-priority Flag to track improvements that are for stability -- effort to put in front of new functionality kind/bug Something isn't working priority/high Important issue that needs to be resolved asap. Releases should not have too many of these. rhods-1.33

Comments

@guimou
Copy link
Member

guimou commented Nov 21, 2022

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

The notebook launcher reports errors and can display the event log from the Pod being launched.
However, there are other errors that can prevent the notebook to start, with the Pod itself ont even being launched.

For example, a LimitRange set on the namespace can prevent the Pod from scheduled. In this case, the request from the Notebook Controller is simply filtered and nothing happens. The launcher stays on the modal window, stating "Waiting for server request to start..." indefinitely...

Expected Behavior

The launcher should detect and display other errors or events that happen all along the chain: Pod, StatefulSet Controller, Notebook Controller, Dashboard backend.

Steps To Reproduce

  1. On the opendatahub namespace, set a limit range that will predent the default "Small" environment to be launched. Example:
spec:
  limits:
    - type: Container
      max:
        cpu: '4'
        memory: 6Gi
      default:
        cpu: 500m
        memory: 1536Mi
      defaultRequest:
        cpu: 50m
        memory: 256Mi
    - type: Pod
      max:
        cpu: '4'
        memory: 12Gi

As the default "Small" environment has a limit set at 8Gi, so more than the 6Gi allowed, the Pod request will be blocked.
2. Try to launch a notebook using the "Small" environment
3. You stay stuck on "Waiting for server request to start..." indefinitely.

Workaround (if any)

Look at events, logs from different controllers to figure out what's happening. This requires some rights on the cluster that users don't have.

What browsers are you seeing the problem on?

No response

Open Data Hub Version

Any version using the KF Notebook Controller

Anything else

No response

@guimou guimou added kind/bug Something isn't working untriaged Indicates the newly create issue has not been triaged yet labels Nov 21, 2022
@andrewballantyne andrewballantyne added feature/notebook-controller KubeFlow NoteBook Controller (KFNBC) Feature priority/normal An issue with the product; fix when possible feature/ds-projects Data Science Projects feature (formerly Data Science Groupings - DSG) and removed untriaged Indicates the newly create issue has not been triaged yet labels Feb 1, 2023
@andrewballantyne
Copy link
Member

We currently look at the pod by UID -- but we could consider looking back at the Notebook and a collection of other resources, but we'll need to make sure we limit what we get by time; otherwise past instances or other instances that might touch the same k8s items will start flooding the log.

Perhaps some UX can be done here to handle specific use-cases instead of just adding to the log... we will need to see what cost reporting logs will have.

@andrewballantyne andrewballantyne added this to the Upcoming Release milestone May 16, 2023
@andrewballantyne andrewballantyne added priority/high Important issue that needs to be resolved asap. Releases should not have too many of these. and removed priority/normal An issue with the product; fix when possible labels May 16, 2023
@andrewballantyne
Copy link
Member

I'm bumping this up -- we probably don't have time this sprint to work on it, but there are several issues from RHODS as well as this one in ODH that note the same thing. We need to be able to detect state beyond the pod logs as the pod is not always created.

@andrewballantyne andrewballantyne added the field-priority Flag to track improvements that are for stability -- effort to put in front of new functionality label Jun 22, 2023
@christianvogt christianvogt self-assigned this Aug 22, 2023
@andrewballantyne andrewballantyne removed this from the Current Release milestone Sep 15, 2023
@lugi0
Copy link
Contributor

lugi0 commented Sep 25, 2023

I think this issue is the same one as #1849, which was still present in 1.33 RC1. I'll retest in RC2 but there's a possibility this hasn't been fixed @lucferbux @andrewballantyne @christianvogt

@andrewballantyne
Copy link
Member

Apparently we did not catch the full scenario. We improved it some and will need to readdress your feedback in 1849. Thanks for logging that issue.

@lugi0
Copy link
Contributor

lugi0 commented Sep 25, 2023

@andrewballantyne I retested in RC2 and it appears to work?
image

@andrewballantyne
Copy link
Member

We didn't do anything -- I imagine this is a flake or something. Keep an eye out for it, but I am glad you have it working again.

@lugi0
Copy link
Contributor

lugi0 commented Sep 26, 2023

Retested on another cluster, with the same build, and the event does show up in the modal. I'm not sure what happened the first time I hit this issue :/ As you suggest I'll keep an eye on this and come back if I figure it out
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature/ds-projects Data Science Projects feature (formerly Data Science Groupings - DSG) feature/notebook-controller KubeFlow NoteBook Controller (KFNBC) Feature field-priority Flag to track improvements that are for stability -- effort to put in front of new functionality kind/bug Something isn't working priority/high Important issue that needs to be resolved asap. Releases should not have too many of these. rhods-1.33
Projects
Archived in project
Status: Dashboard
Development

Successfully merging a pull request may close this issue.

5 participants