New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize component reading and processing for BYO catalog-types #2241
Conversation
Thanks for making a pull request to Elyra! To try out this branch on binder, follow this link: |
I will provide review comments based on a proof of concept catalog connector implementation for MLX. My initial main evaluation focus:
Follow up:
|
Schema/UI for directory catalogs
|
Addendum for directory catalog type processing:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good and will be a super useful feature - thank you.
It appears that As a result non GUI clients can only display partial information (catalog type information is "lost") :
So does the GUI if there is no matching catalog registered yet: |
catalog entries in parallel in read_component_definitions(); default 3 | ||
""" | ||
).tag(config=True) | ||
max_readers = Integer(3, config=True, allow_none=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Since we see more containerized installations, I think it would be good to generally provide env variable support for configurable values. This can be in a follow-up, but I think we'd add an ELYRA_MAX_CATALOG_CONNECTOR_READERS
(or something named similarly) that provides the "default value" (and this env defaults to 3 if not present). An example of this can be found where we deal with the cache ttl value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I can open an issue for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thank you @kiersten-stokes!
Closes #2220. This PR generalizes the
ComponentReader
class in order to better support additional catalog types for which we may not know the structure.Related:
What changes were proposed in this pull request?
This PR touches the following elements:
setup.py
elyra.component.catalog_types
) and contains one name for each type of catalog (see schemas section below)ComponentRegistrySchemas
classcomponent-registry
schema name has been removed in favor of creating one schema per component catalog typeschema_name
must match the entrypoint name in order to be found during the call toget_schemas()
component-registry
schema in order to avoid some larger migration issuesschema_name
field still must change to reflect the new schemas -- more discussion neededlocal-file-catalog
andschema calledlocal-directory-catalog
base_path
, which allows a user to enter an optional base directory from which the rest of thepaths
values may be resolved (otherwise, absolute directories are assumed)local-directory-catalog
type calledinclude_subdirs
, which allows a user to indicate whether they want to search subdirectories for additional component specslocal-file-catalog
instance (note thatBase Path
doesn't line up withPaths
since the paths array type required some different formatting options)title
,display_name
, anduihints.title
)ComponentRegistry
schema_name
field now determines the appropriate reader class by comparing with entrypoint keyscomponent_entry
SimpleNamespace
object includes a few changes:component_identifier
field is added, which is passed on to theComponent
object after parsing and saved for later accesslocation
is removed from the top-level of thecomponent_entry
object, as it now optionally resides in thecomponent_metadata
valuecomponent_id
value is added, the value of which now is determined in and returned from the reader (see below)ComponentReader
classescomponent.py
and into it's own filecomponent_reader.py
since it got pretty bulkymax_readers
configurable Integer has been converted to one of potentially/eventually several override-able settings (stored in a configurable dictionary)read_component_definitions()
still drives the below functionality using threadsget_catalog_entries()
(formerlyget_absolute_locations()
) takes the registry/catalog instance metadata and, for each component in that registry, builds a dictionary containing all the relevant data that will be required in order to read a component in the call toread_component_definition()
; these dictionaries are returned in a listread_component_definitions()
function, the catalog entry dictionaries are added to the thread queue<catalog-type-name>:hash(<hashing_key1>:<hashing_key2>:<...>)
get_hash_keys()
to return a list of keys available in thecomponent_entry_data
object returned fromget_catalog_entries()
for each componentread_component_definition()
is passed the catalog entry data and registry metadata used to return the catalog entry's definition in string formAdds some logging functions so that classes can implement custom messages -- this might be overkill and will likely change when we finalize what we are doing with thecomponent_source
propertyComponentParser
classescomponent_id
is removed from the parser classes and takes place in the readeridentifier
field and other field changes in the constructorsAirflowComponentParser
is refactored to only include the portion of the operator definition for a component of a single classComponent
object classdefinition
attribute that can be used during processing to access the component definition without re-reading/parsinglocation_type
attribute has been changed tocatalog_type
location
to something likelocation_identifier
or something that better conveys where/how a particular component can be defined (consider the MLX example where the "location" is something like the component access id instead)component_identifier
is another naming option for this attributeComponent Source
property in the node properties panelRuntimePipelineProcessor
classesload_component_from_text()
is now used for every type of component, where thedefinition
attribute of theComponent
object is given as the argumentimport
statement, so we need a way for all catalog types to be able to import the correct modulesWith these changes, the existing
component-registry
schema is replaced by three schemas reflecting the previouslocation_type
property. Any user-maintained instances will need to be adjusted in the following ways (note, factory instances will immediately reflect the new forms):location_type
value, the appropriate schema name is appliedlocation_type
property is then removedversion
property is added with a value of1
paths
tourls
anddirectories
forurl-catalog
andlocal-directory-catalog
instances, respectively, then those properties will be "renamed".component-registry
schema. Since theComponentRegistry
initialization accesses all instances of theComponentRegistries
schemaspace, each instance is essentially "touched".component-registry
instances, they are transitioning to a new schema entirely, dropping and adding a property (as noted previously). In this case, the migration will hook thepost_load()
method and perform an update of the updated instance prior to its return. As a result, what is returned is an instance corresponding to the new schema that has also been persisted in the metadata storage (filesystem).component-registry
schema is still required to be present, support for schema deprecation has been added and adeprecated = true
property has been added to thecomponent-registry
schema. The behavior is that deprecated schemas can be accessed by name only and their instances retrieved, but deprecated schemas are NOT present when retrieving the set of schemas of a given schemaspace. This way, they will not show up in metadata-driven applications (such as Elyra's frontend) but internal code can still reference them.version
property has been added to allcomponent-registries
schemaspace schemas. We may want a meta-version property at some point that reflects the version of the schema itself, but, for now,version
is meant to reflect the version of instances that must adhere to the schema.component-registry
schema. If we deliver these changes in 3.3, then perhaps 3.4 could be a removal point. This way, anyone jumping from 3.2 to 3.4 will be asked to go through 3.3 first.Still Left To Do
$id
fields to be the appropriate path closer to merge timeFrontend concerns (not must haves):
display_name
property references from schema and frontend logic #2255Formatting for schema fields that appear side-by-sidethis will be coming in JL 4.0Docs will be updated in a separate PR; see #2249
How to implement a new catalog type
elyra/metadata/schemas
directoryschema_name
field in the JSON file, e.g. url-catalog.json and'schema_name': 'url-catalog'
schemaspace_name
iscomponent-registries
and theschemaspace_id
isae79159a-489d-4656-83a6-1adfbc567c70
ComponentRegistries
schemaspace that makes the schema available via itsget_schemas()
method.ComponentReader
subclass, including the following functions for...get_catalog_entries()
-- gets a dictionary of relevant data from each component in the catalog that will be used to access component definitionsread_catalog_entry()
-- reads and returns a component definition (in string form) given the data for that catalog entry/componentget_hash_keys()
-- provides a list of keys available in the 'catalog_entry_data' dictionary the values of which will be used in constructing a unique hash id for each entry with the given catalog typeschema_name
to the entrypoint forelyra.component.catalog_types
insetup.py
schema_name
defined in the schema'my-type-catalog = elyra.pipeline.component_reader:MyTypeComponentReader'
How was this pull request tested?
This PR will definitely require updates to the tests, but I am holding off on making those until we've finalized the design elements.
Developer's Certificate of Origin 1.1