Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC31: Jobspec V2 #229

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

SteVwonder
Copy link
Member

@SteVwonder SteVwonder commented Feb 28, 2020

Add a new Jobspec version. Mainly designed to handle the flexibility provided by the new flux run and flux submit interfaces in flux-framework/flux-core#2323.

Major changes from V1:

  • relaxes restrictions on the types of resources
  • relaxes restrictions on the ordering of resources, especially in regards to the slot (i.e., a node can be inside or outside a slot).
  • adds exclusive flag to more resources (i.e., outside a slot)
  • adds per_resource count key for late-binding of tasks to resources
  • adds job dependencies

Still TODO:

  • Things that need more discussion (also marked within the text as a TODO warning):

    • should we restrict this jobspec version to just Nodes, CPUs, and GPUs like V1?
    • do we allow resource requests that do not contain core (e.g., slot>node)?
    • do we allow adding exclusive to any resource (i.e., within a slot)?
  • Things within the RFC text that I just need to go back and handle (also marked within the text as a TODO warning):

  • More tightening up of the json.schema (not sure ultimately how important this is now that we are using the python Jobspec class for validation in flux-core).

  • Add additional compelling use case examples that demonstrate the difference between this version and V1. In particular, how the loosening of the ordering is beneficial.

@SteVwonder
Copy link
Member Author

should we restrict this jobspec version to just Nodes, CPUs, and GPUs like V1?

Based on discussion in #624, if we do restrict the resource set, it should also include hardware threads.

@dongahn
Copy link
Member

dongahn commented Mar 18, 2020

At the risk of a scope creep, do we want to add memory capacity into V2? Our workflow users already want to specify memory as contrasts.

How about sockets?

How about SSDs? (probably still far fetched?)

@SteVwonder
Copy link
Member Author

Yeah, I'm partial to not restricting the resource set for two reasons:

  1. The flux run interface and resource scheduler are both powerful and generic enough to support arbitrary resources types
  2. It would generically solve all of the use-cases that you just mentioned.

Downside being that sched-simple wouldn't easily be able to claim complete V2 compliance. It would probably have to have an asterisk next to V2 for a while (e.g., * only supports the following resources: core, node, socket, gpu, slot). I think this logic also applies for any future schedulers that might be developed for Flux; the jump in complexity from Jobspec V1 -> V2 would be quite large.


If we do restrict the resource set, I don't think adding sockets would be too much of a burden. AFAIK, sockets can only be children of nodes and parents of cores. This prevents a explosion of the resource ordering space. That being said, the interplay between socket and memory may be complex if we decide to include memory.

If we explicitly add memory, I'm wondering if we should just generically allow it, or be prescriptive about the types of memory layouts V2 would support. For example, do we want to just support node-granularity pool(s) of memory? Or allow socket-level/numa-level as well? Do we allow L1/2/3 cache, or just main memory? If we are restricting the resource set for simplicity and feasibility of implementation, then I suggest we limit the memory that V2 supports to node-granularity main memory (i.e., memory can only be a child of node or slot). This would handle the memory-constrained application use-case (i.e., don't OOM the node) while minimizing the total added complexity. Unfortunately, it would also prevent explicit requests for an application's cores and numa-memory to be co-scheduled

@SteVwonder
Copy link
Member Author

Per the coffee discussion:

  • Restricting the resource types to a finite set that we enumerate will make this version more tractable. It also makes the extension of R easier.
  • Proposed resource type set: node, socket, core, hardware-thread, gpu, memory, slot
  • We should make sure that the slot->(core,memory) use case works. This would be the most portable jobspec option for memory-constrained applications.

@SteVwonder
Copy link
Member Author

We have decided to restrict the resource set to: node, socket, core, hardware-thread, gpu, memory, and slot. Do we also want to impose some sort of limited ordering on those resources? For example, do we want to say that:

  • hardware-threads, memory and gpus cannot have children
  • valid children for core only include hardware-thread
  • valid children for socket, node, and slot include core, memory, hardware-threads, gpus
  • node and slot also have socket as a valid child
  • core and hardware-thread cannot be siblings

Would it make sense to just say that the ordering is limited to valid orderings produced by hwloc?

@SteVwonder
Copy link
Member Author

During a coffee hour, @eleon requested that slot->node be considered valid jobspec in V2.

@SteVwonder
Copy link
Member Author

SteVwonder commented Jan 11, 2021

During a ☕ hour call, we discussed pushing this forward again. At a September Exaworks meeting, I made a slide that attempts to summarize the V2 changes as follows:

Jobspec V2 opens up supported resource set and orderings:

  • Resource set: node, socket, core, hardware-thread, gpu, memory, storage, and slot
  • Compute hardware “spine” (intermediate resources optional): 
node->socket->core->hardware-thread
  • “Auxillary” resources (i.e., memory, storage, gpu) can ”hang off” of anywhere:
(node->(socket->(core,gpu), memory), storage)

@dongahn
Copy link
Member

dongahn commented Jan 11, 2021

Yes, it feels like this will be something that can immediately impact our collaboration with 1) HPE, 2) OpenShift and 3) workflow community including ExaWorks.

@SteVwonder SteVwonder changed the title [WIP] Jobspec V2 RFC31: Jobspec V2 Feb 23, 2021
Also copy most use cases from spec_25's data. The last example (2.5)
comes from spec_14.
mainly designed to handle the flexibility provided by the new `flux run`
and `flux submit` interfaces

- relaxes restrictions on the ordering of resources, especially in regards
  to the slot (i.e., a node can be inside or outside a slot).
- adds exclusive flag
- adds per_resource count key for late-binding of tasks to resources
- adds job dependencies

Update examples to meet Jobspec V2 specification, including adding a new
use case example to highlight the capabilities of V2.
improves the separation between use-cases by making the use-case headers
distinct from the headers for "existing examples" and "jobspec YAML"
Problem: RFC25 references a section that only exists in the canonical
jobspec specification (RFC14).

Solution: make an explicit label for the section and reference it from
RFC25 with `:ref:`.
@@ -64,7 +66,7 @@
{ "$ref": "#/definitions/resource_vertex_base" },
{
"properties": {
"type": { "not": { "enum": ["slot"] } }
"type": { "enum": ["node", "gpu", "core"] }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to expand this list of valid types.

@SteVwonder SteVwonder marked this pull request as ready for review February 23, 2021 23:32
@SteVwonder
Copy link
Member Author

Took another pass on this PR today. Flipping this out of draft status since I think the wording is ready to be reviewed. As the OP states, the last big remaining TODOs (that I'm aware of) are to add an additional example or two that is compelling for what V2 can do over V1 and tightening up the jsonschema file a little bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants