Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

When user's job config is too large, OpenPAI job will be in Waiting or Stopping forever #5093

Open
hzy46 opened this issue Nov 18, 2020 · 1 comment

Comments

@hzy46
Copy link
Contributor

hzy46 commented Nov 18, 2020

  1. To reproduce this issue, use a job config with 1000+ lines. You can set up a lot of taskroles to achieve this.

  2. The job will be in Waiting status. POST request to K8S api server reports Resource Exhausted

image

  1. PATCH will fail because of entity too large, and it doesn't report a 404 error.

image

  1. If user tries to stop this job, it will be in Stopping status forever.

image

  1. It may be related to make DBController tolerant to wrong framework request #4889
@yqwang-ms
Copy link
Member

Except #4889, for this specific case, we should also limit the job config size to a reasonable value.

@hzy46 hzy46 mentioned this issue Dec 3, 2020
52 tasks
@debuggy debuggy mentioned this issue Jan 4, 2021
14 tasks
@debuggy debuggy mentioned this issue Jan 15, 2021
55 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants