Restructure pipeline JSON to prevent custom components from breaking #1882

kiersten-stokes · 2021-07-08T19:09:56Z

Fixes #1797
Fixes #1748
Fixes #1773
Fixes #1907
Fixes #1912
Fixes #1892
Fixes #1932

What changes were proposed in this pull request?

This PR will restructure the format of the pipeline JSON and how it is parsed on the backend. Frontend changes and changes to the palette/properties JSON that is sent to the frontend may also require changes. Will definitely require changes to tests.

Change palette/properties JSON sent from backend
Change pipeline JSON sent to backend (frontend work)
- Adjust for new properties JSON format sent from backend component parsers (see comment below)
- Adjust for new pipeline JSON format expected by backend processors (see comment below)
Incorporate changes to fix Pipeline JSON Format Change - Version Next #1773
Change parsing of pipeline JSON in backend processors
Fix backend tests
Migration work
Manual testing

Removes the runtime-specific parameters component_source_type and runtime image from the UI, and changes the component_source parameter to a relative path if the component specification is filesystem-based.

Changes the component id of notebooks from notebooks to notebook for consistency.

Adds a GenericOperation subclass to the Operation class for generic components, which all have the same attributes/properties (filename, runtime image, env_vars, etc.).

How was this pull request tested?

Tests will have to be rewritten.

Acceptance criteria

Please validate by migrating and properly running:

Developer's Certificate of Origin 1.1

   By making a contribution to this project, I certify that:

   (a) The contribution was created in whole or in part by me and I
       have the right to submit it under the Apache License 2.0; or

   (b) The contribution is based upon previous work that, to the best
       of my knowledge, is covered under an appropriate open source
       license and I have the right under that license to submit that
       work with modifications, whether created in whole or in part
       by me, under the same open source license (unless I am
       permitted to submit under a different license), as indicated
       in the file; or

   (c) The contribution was provided directly to me by some other
       person who certified (a), (b) or (c) and I have not modified
       it.

   (d) I understand and agree that this project and the contribution
       are public and that a record of the contribution (including all
       personal information I submit with it, including my sign-off) is
       maintained indefinitely and may be redistributed consistent with
       this project or the open source license(s) involved.

elyra-bot · 2021-07-08T19:09:57Z

Thanks for making a pull request to Elyra!

To try out this branch on binder, follow this link:

kiersten-stokes · 2021-07-12T16:48:52Z

Based on some offline discussions, we're updating the pipeline JSON sent from the frontend to the backend in the following way. The first example is a generic component and the second example is a runtime-specific component.

{
"id": "4fcb7b56-7450-454f-9f4a-be454847c6cf",
"type": "execution_node",
"op": "execute-notebook-node",
"app_data": {
     "label": "",
-    "filename": "elyra/my_pipeline/node1.ipynb",
-    "runtime_image": "continuumio/anaconda3:2020.07",
-    "cpu": "",
-    "gpu": "",
-    "memory": "",
-    "outputs": [],
-    "env_vars": [], 
-    "dependencies": [],
-    "include_subdirectories": false
+    "component_parameters": {
+       "filename": "elyra/my_pipeline/node1.ipynb",
+       "runtime_image": "continuumio/anaconda3:2020.07",
+       "cpu": "",
+       "gpu": "",
+       "memory": "",
+       "outputs": [],
+       "env_vars": [], 
+       "dependencies": [],
+       "include_subdirectories": false
+    },
    "ui_data": {
              "label": "node1.ipynb",
              "description": "Notebook"
     }
},
{
"id": "ca95fd8a-776b-43ae-a73e-e99906b80935",
"type": "execution_node",
"op": "serve-pytorch-model-seldon-core",
"app_data": {
     "label": "",
-    "model_id": "dddd",
-    "deployment_name": "dddd",
-    "model_class_name": "ModelClass",
-    "model_class_file": "model_class.py",
-    "serving_image": "aipipeline/seldon-pytorch:0.1",
+    "component_parameters": {
+        "model_id": "dddd",
+        "deployment_name": "dddd",
+        "model_class_name": "ModelClass",
+        "model_class_file": "model_class.py",
+        "serving_image": "aipipeline/seldon-pytorch:0.1"
+    }
      "ui_data": {
              "label": "Serve PyTorch Model - Seldon Core",
              "description": "Serve PyTorch Models remotely as web service using Seldon Core"
       }
}

kiersten-stokes · 2021-07-12T16:57:17Z

Additionally, the properties JSON sent from the backend to the frontend for display on the canvas is changing in the following way. The parameter_ref and the group_info id for every property besides the 'system-wide properties' (only label currently) will be prefixed with elyra_, e.g. elyra_filename, elyra_serving_image.

This is to prevent collisions between our 'system-wide properties' and the properties given in the runtime-specific component. When displaying these properties and when sending them to the backend on pipeline submit/export, the elyra_ prefix should be removed.

bourdakos1 · 2021-07-20T20:15:30Z

@kiersten-stokes oh is "operation name" supposed to be the label?

If so, you should check app_data.label then app_data.component_parameters[<filehandler-param-ref>] if both of those are empty, it should fall back to the component name (e.g. Notebook)

…to pipeline-params

kevin-bates · 2021-07-20T21:29:03Z

If so, you should check app_data.label then app_data.component_parameters[] if both of those are empty, it should fall back to the component name (e.g. Notebook)

@kiersten-stokes and I just had a sidebar and have discovered that the pipeline being submitted in this particular test is invalid. It's version is listed as '4', yet it still holds an app_data.ui_data.label value that has not been transferred to app_data.label.

            "nodes": [
              {
                "id": "a6d8506d-2faf-46f1-8b9e-bee82895fc29",
                "type": "execution_node",
                "op": "execute-notebook-node",
                "app_data": {
                  "component_parameters": {
                    "filename": "examples/data 2.ipynb",
                    "runtime_image": "amancevice/pandas:1.1.1",
                    "outputs": [],
                    "env_vars": [],
                    "dependencies": [],
                    "include_subdirectories": false
                  },
                  "label": "",
                  "ui_data": {
                    "label": "data 2.ipynb",
                    "image": "data:image/svg+xml;utf8,%3Csvg%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%20width%3D%2216%22%20viewBox%3D%220%200%2022%2022%22%3E%0A%20%20%3Cg%20class%3D%22jp-icon-warn0%20jp-icon-selectable%22%20fill%3D%22%23EF6C00%22%3E%0A%20%20%20%20%3Cpath%20d%3D%22M18.7%203.3v15.4H3.3V3.3h15.4m1.5-1.5H1.8v18.3h18.3l.1-18.3z%22%2F%3E%0A%20%20%20%20%3Cpath%20d%3D%22M16.5%2016.5l-5.4-4.3-5.6%204.3v-11h11z%22%2F%3E%0A%20%20%3C%2Fg%3E%0A%3C%2Fsvg%3E%0A",
                    "x_pos": 102,
                    "y_pos": 105.5,
                    "description": "Run notebook file"
                  }
                },

We also discussed having the server go and dereference the parameter_ref.filehandler pointer that you suggested, and that is not viable today with the current structure of the software.

Since the server must assume that all pipeline payloads are at the current version and all current versions have a value for app_data.label, I think we need to take closer look at how this particular test pipeline was "migrated". If that was accomplished manually, the pipeline should be fixed. If migrated via the migration utility, then I believe there's more work involved.

There's also the distinct possibility that we're simply caught in between of evolving software and may need to ignore this until all the dust has settled.

cc: @ajbozarth

ajbozarth · 2021-07-20T21:33:36Z

app_data.ui_data.label is not being removed during migration, app_data.label is just being set to it. Since ui_data is transient I didn't both to clean up it's content during migration.

…elyra into pipeline-params

lresende · 2021-07-22T02:42:03Z

elyra/pipeline/component_parser_airflow.py

-                                                value=default_value,
-                                                description=description,
-                                                control_id=control_id))
+            properties.append(ComponentParameter(ref=arg,


why are we calling this ref? and not id? I know this might be overriding things, etc but we should be consistent...

I have a local change to rename this back to id, but will not push before discussing it.

elyra/pipeline/component.py

lresende · 2021-07-22T03:30:53Z

elyra/pipeline/component_registry.py

@@ -62,22 +79,22 @@ def __init__(self, component_registry_location: str, parser: ComponentParser):
    def registry_location(self) -> str:
        return self._component_registry_location

-    def get_all_components(self) -> List[Component]:
+    def get_all_components(self) -> Tuple[List[Component], List[Dict]]:


What is being returned here? Is this really the best design to implement?

Is this really the best design to implement?

This seems a little harsh - no? Here's why this was changed to a tuple is due to the late-breaking discovery of a malformed palette that existed since custom components were introduced into master a few weeks ago: #1882 (comment)

However, I didn't fully convey how best to implement this workaround. Now, after looking closer, the solution will be more involved due to the additional parsing that is required for components and the "adjusted_id" - that appears to only apply to airflow components. That said, since only the CachedComponentRegistry is used in this "design", the callers get the correct result despite the tuple. I think what should happen is the parser code necessary only in get_all_components() but not get_all_categories() should be moved into the read method so that it returns "final results". Then the base ComponentRegistry will just continue reading the file, once for each call to get_all_components() and get_all_categories(), while the CachedComponentRegistry will call the read method directly so that the file only gets read once (every 60 seconds). What is causing trouble with that "workaround" is with the way get_component() works with its need to handle adjusted_id.

I do have a question: do we also need to handle adjusted_id in get_all_components()?

This is worth further discussion, preferably via a virtual call.

Sorry, I didn't mean to be harsh, but the API seemed a little strange, and the unpacking requirement... wanted basically to start a discussion and agree we could handle it on a call (maybe as an item on the scrum one).

lresende

Left some comments, but want to understand the choice of design with the support for categories

lresende · 2021-07-22T03:36:37Z

Overall functionality looks good, have tested some scenarios with CLI and fixed an issue there and have tested a few pipelines created in 2.2.4 and migrated and run locally and on kfp. I had a few questions above around some implementation details.

elyra/pipeline/component.py

kevin-bates · 2021-07-22T14:22:01Z

elyra/pipeline/component.py

        if not name:
            raise ValueError("Invalid component: Missing field 'name'.")

-        self._ref = ref
+        self._ref = id


Shouldn't the attribute and property names be updated to reflect id rather than ref?

…1882) Co-authored-by: Kevin Bates <kevin.bates@us.ibm.com> Co-authored-by: Alex Bozarth <ajbozart@us.ibm.com> Co-authored-by: Nick Bourdakos <nicholas.bourdakos@ibm.com> Co-authored-by: Luciano Resende <lresende@us.ibm.com>

kiersten-stokes added the status:Work in Progress Development in progress. A PR tagged with this label is not review ready unless stated otherwise. label Jul 8, 2021

ptitzler added this to the 3.0.0 milestone Jul 8, 2021

ptitzler added impact:blocker help wanted Extra attention is needed labels Jul 8, 2021

kevin-bates mentioned this pull request Jul 9, 2021

Enable migration from previous pipeline versions #1860

Merged

lresende removed the help wanted Extra attention is needed label Jul 12, 2021

lresende mentioned this pull request Jul 12, 2021

Pipeline JSON Format Change - Version Next #1773

Closed

kevin-bates mentioned this pull request Jul 13, 2021

Validation service initial implementation #1664

Merged

31 tasks

kiersten-stokes mentioned this pull request Jul 13, 2021

Airflow runtime components are showing runtime image as text field #1912

Closed

kiersten-stokes force-pushed the pipeline-params branch from 2fbd4c1 to fb885a5 Compare July 13, 2021 23:26

kiersten-stokes and others added 17 commits July 13, 2021 18:35

Start changing parser/processors to handle new proposed pipeline format

47c90e8

Update tests to reflect component_parameters stanza

1d80562

Change structure of component_params object and properties

86ded12

Split parameters into separate groups in component parser

326d07f

Update cli to reflect component_parameters stanza

7c607a8

Move properties and palette json to test resources

478a0c7

Prototype properties.json using elyra_ prefix

f6ddcd8

Update component parser tests with prefixing approach

56f88fa

Add prefixes to group_info entries

400ca00

Remove nesting from properties JSON and add prefixing to template

eab6f1c

Add elyra_ prefix to group_info in properties JSON

77f3b78

Get more tests working

5d5d844

More test updates, fix a couple of issues in Operator

e9567de

fix indentation

b900319

Update test pipelines, get tests working

3efd12b

Add GenericOperation subclass, more type hints

af1a5c8

Complete type hints on Operation and Pipeline classes

c52f204

bourdakos1 and others added 3 commits July 20, 2021 16:47

use optional chaining

6e25761

Merge branch 'pipeline-params' of github.com:kiersten-stokes/elyra in…

0684050

…to pipeline-params

fix tests

c863735

kiersten-stokes added 3 commits July 20, 2021 17:07

Add missing kfp test resources back in

9f7e0ca

Merge branch 'pipeline-params' of https://github.com/kiersten-stokes/…

740dc61

…elyra into pipeline-params

Add kfp bootstrapper test resources back in (take 2)

f5cb9a6

lresende removed the status:Work in Progress Development in progress. A PR tagged with this label is not review ready unless stated otherwise. label Jul 21, 2021

bump pipeline-editor

a48dd98

karlaspuldaro mentioned this pull request Jul 21, 2021

Update pipeline properties error message theme #1939

Merged

Make pipeline cli to validdate version as first step

94c37a0

lresende reviewed Jul 22, 2021

View reviewed changes

elyra/pipeline/component.py Outdated Show resolved Hide resolved

lresende reviewed Jul 22, 2021

View reviewed changes

elyra/pipeline/component.py Show resolved Hide resolved

lresende reviewed Jul 22, 2021

View reviewed changes

lresende requested changes Jul 22, 2021

View reviewed changes

kevin-bates reviewed Jul 22, 2021

View reviewed changes

elyra/pipeline/component.py Outdated Show resolved Hide resolved

kevin-bates and others added 2 commits July 21, 2021 23:25

Remove mutable default value

d395e1f

Consistently use id parameter name for unids

27d3ab8

kevin-bates reviewed Jul 22, 2021

View reviewed changes

lresende approved these changes Jul 22, 2021

View reviewed changes

bourdakos1 approved these changes Jul 22, 2021

View reviewed changes

lresende merged this pull request into elyra-ai:master Jul 22, 2021

kiersten-stokes mentioned this pull request Jul 22, 2021

Minor changes to component & category parsing as a followup to palette restructuring #1948

Merged

4 tasks

kiersten-stokes deleted the pipeline-params branch August 20, 2021 18:38

ajbozarth mentioned this pull request Oct 7, 2021

Pipeline migration produces invalid container image reference in pipeline file #2208

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restructure pipeline JSON to prevent custom components from breaking #1882

Restructure pipeline JSON to prevent custom components from breaking #1882

kiersten-stokes commented Jul 8, 2021 •

edited by ajbozarth

Loading

elyra-bot bot commented Jul 8, 2021

kiersten-stokes commented Jul 12, 2021

kiersten-stokes commented Jul 12, 2021

bourdakos1 commented Jul 20, 2021

kevin-bates commented Jul 20, 2021

ajbozarth commented Jul 20, 2021

lresende Jul 22, 2021

lresende Jul 22, 2021

lresende Jul 22, 2021

kevin-bates Jul 22, 2021

lresende Jul 22, 2021

lresende left a comment

lresende commented Jul 22, 2021

kevin-bates Jul 22, 2021

Restructure pipeline JSON to prevent custom components from breaking #1882

Restructure pipeline JSON to prevent custom components from breaking #1882

Conversation

kiersten-stokes commented Jul 8, 2021 • edited by ajbozarth Loading

What changes were proposed in this pull request?

How was this pull request tested?

Acceptance criteria

elyra-bot bot commented Jul 8, 2021

kiersten-stokes commented Jul 12, 2021

kiersten-stokes commented Jul 12, 2021

bourdakos1 commented Jul 20, 2021

kevin-bates commented Jul 20, 2021

ajbozarth commented Jul 20, 2021

lresende Jul 22, 2021

Choose a reason for hiding this comment

lresende Jul 22, 2021

Choose a reason for hiding this comment

lresende Jul 22, 2021

Choose a reason for hiding this comment

kevin-bates Jul 22, 2021

Choose a reason for hiding this comment

lresende Jul 22, 2021

Choose a reason for hiding this comment

lresende left a comment

Choose a reason for hiding this comment

lresende commented Jul 22, 2021

kevin-bates Jul 22, 2021

Choose a reason for hiding this comment

kiersten-stokes commented Jul 8, 2021 •

edited by ajbozarth

Loading