Align Katalog with taxonomy #920
Replies: 2 comments 3 replies
-
@roee88 There may be different catalog connectors when not all of them are in fybrik repo, even not all of them are written in Go. Yet, we want to reuse the same structure in manager + catalog connectors. How would any of the approaches work with respect to a catalog connector written in Java? |
Beta Was this translation helpful? Give feedback.
-
I discussed with @roee88 something we may call A', which could be a short-term (and relatively easy) intermediate step before B.
The idea behind the proposal is that the starting point of the openAPI code is a json, rather than directly relying on some Go structs. This way we can as first step use the CRD structs, and later move to proposal B where the structs won't be part of the CRD structs directly. |
Beta Was this translation helpful? Give feedback.
-
Katalog defines the
Asset
CRD for managing data assets. It is used as a reference data catalog in the fybrik project. The current implementation predates the work on taxonomy and takes a different approach. We would like to align it with the current approach of using OpenAPI and using JSON schema objects for validation.Below are proposals on how to do that.
🏷️ All proposals require that we first have a design of the taxonomy for a dataset resource. This includes adding related definitions to the base taxonomy and having a written design of the structure with fields for authentication, connection, tags, etc. @rohithdv is currently working on that.
🏷️ In all proposals a webhook can validate the
Asset
CRD using the taxonomy validation object, but we can also rely on the manager to do the validation on the connector response. Note that doing validation in a controller/connector has no benefits. This discussion explicitly ignores where the validation occurs because it's orthogonal.👍 In all proposals the
Asset
CRD can adapt to taxonomy.json changes and not be coded against any specific taxonomy.👍 In all proposals the katalog connector will be implemented using OpenAPI like what we did for the opa connector. It should have almost no logic compared to what we have now.
Proposal A
Asset
CRD defined directly in go code like other CRDs in this repository. It will be aligned with the written design for dataset resource. Any field that is dynamically configured (i.e., taken fromtaxonomy.json#/definitions
) will be defined in the CRD according to the more generic base taxonomy definition.This makes the
Asset
CRD part of the fybrik model. We must ensure that in this approach the taxonomy validation object is reusable for validating the response of getDatasetDetails from any catalog connector. But it could be okay.Proposal B
A similar but better approach is to define the fybrik model for a dataset resource in the manager go code (or in /pkg) and only re-use it via import in the Katalog CRD definition. This makes it much cleaner without sacrificing anything.
This will require a bit more implementation efforts:
I personally think that it's worth it because these things are needed in other parts of the project too.
Proposal C
Similar to how the katalog connector is currently implemented but using the new taxonomy approach:
The details on how this will work are still unclear to me so we will just need to try and hope for the best here. I'm not a fan on keeping openapi2crd as a dependency (although for our current code it works).
cc @tomersolomon1 @ronenkat
Beta Was this translation helpful? Give feedback.
All reactions