Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate one-of support with Goa #80

Closed
stefannica opened this issue Apr 26, 2021 · 6 comments
Closed

Investigate one-of support with Goa #80

stefannica opened this issue Apr 26, 2021 · 6 comments
Labels
area/core research Investigating tool and technologies

Comments

@stefannica
Copy link
Member

We need to model mutually exclusive attributes in the fuseml REST API, but goa doesn't seem to offer this out-of-the-box.

Example: runnable and workflow REST API resources can have several types of inputs. Depending on its type, the input's attributes are different, e.g.:

...
inputs:
  - parameter: myparam
    optional: true
    defaultValue: myvalue
  - codeset: mycodeset
    store: gitea
    project: myproject
    version: 1.0
...

This could be modeled using a one-of type of validation:

var RunnableInputParameter = Type("RunnableInputParameter", func() {
	Field(1, "parameter", String, "The name of an input parameter", func() {
		Pattern(`^[A-Za-z0-9-_]+$`)
	})
	Field(7, "description", String, "Input description", func() {
		MaxLength(1000)
	})
	Field(8, "optional", Boolean, "Optional input", func() {
		Default(false)
	})
	Field(9, "defaultValue", String, "Default value for optional input parameters")
})

var RunnableInputCodeset = Type("RunnableInputCodeset", func() {
	Field(1, "codeset", String, "The name of an input codeset artifact", func() {
		Pattern(`^[A-Za-z0-9-_]+$`)
	})
	Field(2, "description", String, "Input description", func() {
		MaxLength(1000)
	})
	Field(3, "store", String, "Artifact store name")
	Field(4, "project", String, "Artifact project name or wildcard")
	Field(5, "version", String, "Artifact versino or wildcard")
})
var Runnable = Type("Runnable", func() {
        ...
	Field(6, "inputs", ArrayOf(OneOf(RunnableInputParameter, RunnableInputCodeset, ...), "List of inputs (artifacts, parameters) accepted by this runnable")
        ...
})

, but this isn't implemented in goa, so we use a single type for all inputs, and have all the attributes dumped there:

var RunnableInput = Type("RunnableInput", func() {
	Field(1, "parameter", String, "The name of an input parameter", func() {
		Pattern(`^[A-Za-z0-9-_]+$`)
	})
	Field(2, "codeset", String, "The name of an input codeset artifact", func() {
		Pattern(`^[A-Za-z0-9-_]+$`)
	})
	Field(3, "description", String, "Input description", func() {
		MaxLength(1000)
	})
	Field(4, "optional", Boolean, "Optional input", func() {
		Default(false)
	})
	Field(5, "defaultValue", String, "Default value for optional input parameters")
	Field(6, "store", String, "Artifact store name")
	Field(7, "project", String, "Artifact project name or wildcard")
	Field(8, "version", String, "Artifact versino or wildcard")
})
var Runnable = Type("Runnable", func() {
        ...
	Field(6, "inputs", ArrayOf(RunnableInput), "List of inputs (artifacts, parameters) accepted by this runnable")
        ...
})
@project-bot project-bot bot added this to Backlog in FuseML Project Board Apr 26, 2021
@stefannica stefannica added area/core research Investigating tool and technologies labels Apr 26, 2021
@evrardjp
Copy link
Contributor

evrardjp commented Apr 26, 2021

I have yet to understand why would someone use those input parameters. Could you clarify how it's gonna be exposed to the API user?

Shouldn't fuseml instead focus on provide a generic input payload or more specific apis for clarity?
Reminder: Validation can also be done in the business logic of the api, not in the API definition itself.
I like the fact we are generic, but adding to many arguments make it harder to maintain.

I still will have a look at optional arguments in goa, that might be useful.

@evrardjp
Copy link
Contributor

Also: What is the exclusive part: If parameter is given, codesets can't be given? Can multiple codesets be given? Can multiple parameters be given? (Those questions would be clarified by explaining how/why someone would use those input parameters)

@stefannica
Copy link
Member Author

I have yet to understand why would someone use those input parameters. Could you clarify how it's gonna be exposed to the API user?

I can explain the background, maybe a wider context is relevant to solve this particular case: when you're running a container, as input you may need to provide either scalar values (named parameters here), or more complex artifacts (file or set of files, directories, archive etc. to be mounted as volumes). Artifacts are represented in the example here as a very specific type of artifact: a codeset. Scalar values and artifacts have different attributes. Moreover, other types of artifacts will be supported and they will have yet different attributes then codesets: ML models, datasets etc. In the business logic, an input would be represented as a C/C++ union of different types, or as a golang interface with different implementations.

Shouldn't fuseml instead focus on provide a generic input payload or more specific apis for clarity?

I don't understand the question. Could you please be more specific, or even give some example of what you're referring to ? Those concepts (codesets, models, datasets) are the particular concepts used in machine learning. How much more generic would you want them to be ?

Reminder: Validation can also be done in the business logic of the api, not in the API definition itself.

Good point, that's exactly the approach I'm taking now, precisely because one-of support is missing in goa. Validation is done in the business logic.

I like the fact we are generic, but adding to many arguments make it harder to maintain.

I don't understand this comment either, more so as it comes in conflict with the earlier question about not being generic about the input payload. If you have an idea about reducing the number of arguments without losing expressivity, I'd love to hear more about it. The only other variant I explored was having an input type attribute to code the type of input (parameter, codeset, etc) and, depending on its value, model the "payload" as different structures, but again, not something goa can help with.

I still will have a look at optional arguments in goa, that might be useful.

@evrardjp
Copy link
Contributor

evrardjp commented Apr 27, 2021

I was indeed thinking that there are "types" of input:

input:
  codesets:
    - name: x
      store: gitea
      project: y
    -name: y
     store: gitea
     project: y
  container_parameters:
    - name: param1
      value: 1
    - name: param2
      value: 2

another example would be:

input:
  codesets: []
  container_parameters:
    - name: param1
      value: 1
    - name: param2
      value: 2

The inputs would then be a dict, containing lists of pre-determined objects. If we want the dict to be extensible, we can have known keys with pre-defined structure, and unknown payloads in text, which needs proper serializing, like this:

input:
  codesets: []
  container_parameters:
    - name: param1
      value: 1
    - name: param2
      value: 2
  other: []

This could help on maintainability, and doesn't hurt usability too much, thanks to the fact we generate the client.
I am not fond of this approach, but it has validation and readability, and is very explicit (makes some arguments "first class" arguments. I will still analyze the optional fields, as it might be an useful alternative.

What I meant above with the "generic input payload" is the following:
While I like the idea of having a flexible API with the "other" or flexible types, it seems a bit "off" for an API, instead you might want to have arbitrary input:

input: []

for which no structure is given, and all validation is done on the business logic. That seems by far the more flexible and readable bit. Reason: When you unmarshal, you will most likely use a client library, that will throw an error on wrong content, and you can return it to the user. This is not possible if the data structure is statically defined in the API. At the end, it will be less expressive and more restricted.

However, one might wonder why we have an API that is so abstracted away (accept any inpyt). For that, we might want to "rethink" the API to be more precise.

Similarly, we might want to think if we really need a gRPC interface, as this won't make our life easier.

@stefannica
Copy link
Member Author

I was indeed thinking that there are "types" of input:

input:
  codesets:
    - name: x
      store: gitea
      project: y
    -name: y
     store: gitea
     project: y
  container_parameters:
    - name: param1
      value: 1
    - name: param2
      value: 2

Yes, I have to say I like this approach, maybe even more than the current one I settled for in my ongoing PR (fuseml/fuseml-core#24), which uses a type attribute to differentiated between types of inputs/output.

The inputs would then be a dict, containing lists of pre-determined objects. If we want the dict to be extensible, we can have known keys with pre-defined structure, and unknown payloads in text, which needs proper serializing, like this:

I agree with the general idea of "other", but I can't see any use for "unknown input type" in this particular case, given that fuseml needs to explicitly translate all inputs and outputs into Tekton concepts.

What I meant above with the "generic input payload" is the following:
While I like the idea of having a flexible API with the "other" or flexible types, it seems a bit "off" for an API, instead you might want to have arbitrary input:

input: []

for which no structure is given, and all validation is done on the business logic. That seems by far the more flexible and readable bit. Reason: When you unmarshal, you will most likely use a client library, that will throw an error on wrong content, and you can return it to the user. This is not possible if the data structure is statically defined in the API. At the end, it will be less expressive and more restricted.

Yes, this is a trade-off between convenience and flexibility. From a domain driven design, the business logic isn't (and shouldn't) be concerned with marshalling, serialization, data type validation etc or basically anything that deals with I/O. So either we use a code generation tools to offload some of that effort, or we write our own marshalling/unmarshalling logic in addition to the business logic. Currently, we're harvesting the convenience of goa to get results fast, while focusing on business logic, but I don't see any reason why we couldn't switch to a different approach at a later time. The fact that we're using domain driven design to decouple our business logic from I/O details makes this possible with minimal impact to the existing code.

However, one might wonder why we have an API that is so abstracted away (accept any inpyt). For that, we might want to "rethink" the API to be more precise.

Similarly, we might want to think if we really need a gRPC interface, as this won't make our life easier.

Yes, we could disable gRPC for now, we don't really use it anyway, and it's a transport detail.

@stefannica
Copy link
Member Author

No longer relevant. We went with the approach described in #80 (comment)

FuseML Project Board automation moved this from Backlog to Done Sep 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/core research Investigating tool and technologies
Projects
Development

No branches or pull requests

2 participants