Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model "file type"? #5

Open
m-mohr opened this issue Nov 15, 2021 · 3 comments
Open

Model "file type"? #5

m-mohr opened this issue Nov 15, 2021 · 3 comments
Labels
question Further information is requested

Comments

@m-mohr
Copy link
Contributor

m-mohr commented Nov 15, 2021

Disclaimer: I'm not so much into ML models, but have a use case ;-)

So we will have several providers that aim to implement this extension for models they generate with specific software in their infrastructure (e.g. random forest). Other providers may then read these results. I've been told that these models depending on which software they may have been generated with, may generate different types of model files so that only some software may be able to read it and some others may not, for example:

  • Software A does RF and can read/write it in model type X
  • Software B does RF and can read/write it in model type X and Y
  • Software C does RF and can read/write it in model type Y

How can I know from the model metadata whether I can read the exposed model file with my software? @duckontheweb
Maybe this is easy to answer and may just be reading a different media type or so, but want to ensure this is considered. :-)

Related issue: Open-EO/openeo-processes#300

@m-mohr m-mohr added the question Further information is requested label Nov 15, 2021
@m-mohr
Copy link
Contributor Author

m-mohr commented Dec 13, 2021

Thoughts @duckontheweb ?

@duckontheweb
Copy link
Contributor

Sorry I missed this the first time around @m-mohr!

I think the easiest way to handle this is through some combination of media types and roles or relation types (depending on whether we are dealing with an Asset or a Link). In some ways, using media types would be preferable because it would work for both Assets and Links. However, it seems like most model artifacts do not have an official IANA media type, so we would have to define our own within the spec.

I recently added the "ml-model:checkpoint" role to handle the case of PyTorch checkpoint files as assets, but it seems like if we continue to take this approach we would need define a new role for each type of model file, which could be cumbersome.

I will put some more thought into this, but I'm curious if others have any insight into a better approach.

@m-mohr
Copy link
Contributor Author

m-mohr commented Dec 14, 2021

I just had the idea to use "processing:software" (on assets?) to specify the software writing it, but it could also be additional roles or media types. I guess we need to investigate this a bit more, we will probably also experiment with it in openEO Platform and see what works for us and propose that as a potential solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants