Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

harden flux-core processing of jobspec for rabbit storage use case #5767

Open
grondo opened this issue Mar 4, 2024 · 0 comments
Open

harden flux-core processing of jobspec for rabbit storage use case #5767

grondo opened this issue Mar 4, 2024 · 0 comments

Comments

@grondo
Copy link
Contributor

grondo commented Mar 4, 2024

Problem: When jobspec is submitted with DWS directives in the CORAL2 environment, the jobspec will be modified after submission by a flux-coral2 jobtap plugin to add necessary resource information for Fluxion to schedule rabbits. This solution may end up adding vertices to the resources section of the jobspec which are not currently supported by flux-core internals. Specifically, libjj, a very simple flux-core internal convenience library, may throw an error when trying to get its simplified resource counts from such a jobspec.

As a motivating example, for testing purposes @jameshcorbett was submitting a pre-modified jobspec to a Flux system instance with the novalidate flag, and the job was still rejected with the error:

flux-job: Unsupported resource type 'rack'

It turns out that the limit-job-size plugin uses libjj which was the source of this error.

Since the jobspec will be modified after limits are checked in the real use case, this particular failure is not critical. However, there may be other parts of flux-core that use libjj, so that library should perhaps be made more forgiving when parsing jobspec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant