You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Example URN with above config: urn:kafka:prod-example-logstream:topic:my-service-log.
When we are trying to extract any given asset model type that has Kafka as an upstream, we would not be able to construct the URN. This problem became apparent while working on #417 where a Feature Table could have a Kafka topic as an upstream. Only the Kafka broker information would be available in the Feature Store.
This specific example can be applicable for other extractor-URN format combinations as well.
Describe the solution you'd like
Introduce a mapping processor that would run after the lineage is built and emitted by the extractor. The mapping processor would leverage a scripting language to provide a sandbox to execute the script in the recipe to be provided by the user.
The API of the Processor would not change, we would pass in the *v1beta2.Asset which can then be transformed by the user defined script. The scripting language should ideally support:
Running the script in a black-box with capability to detect and recover from errors
Assigning a computed or literal value to a field
Iterating a list and transforming each element in the list
Basic string manipulation functions such as replace, join, split, lowercase, uppercase etc.
Conditionals such as if, case etc with boolean logic and arithmetic
Custom helper functions that could be added in the future. ex: resolve DNS for IP
Additionally, it is preferred if the Go type information is retained while running the script.
Describe alternatives you've considered
Change the scope to the broker DNS for Kafka extractor. This is quite fragile and we would need to account for the naming pattern of URNs while defining Kafka, or any other, extractors. Additionally, it still wouldn't work when there is a list of Kafka broker IPs instead of the DNS.
Additional context
The idea of a mapping layer could be utilised in some other scenarios as well such as an HTTP extractor.
The text was updated successfully, but these errors were encountered:
sudo-suhas
changed the title
feat(processor): add mapping processor
feat: add mapping processor
Oct 10, 2022
Is your feature request related to a problem? Please describe.
Background:
In a recipe for Kafka extractor, we use a
urn-scope
for constructing the URN for the topic. Example config:Example URN with above config:
urn:kafka:prod-example-logstream:topic:my-service-log
.When we are trying to extract any given asset model type that has Kafka as an upstream, we would not be able to construct the URN. This problem became apparent while working on #417 where a Feature Table could have a Kafka topic as an upstream. Only the Kafka broker information would be available in the Feature Store.
This specific example can be applicable for other extractor-URN format combinations as well.
Describe the solution you'd like
Introduce a mapping processor that would run after the lineage is built and emitted by the extractor. The mapping processor would leverage a scripting language to provide a sandbox to execute the script in the recipe to be provided by the user.
The API of the
Processor
would not change, we would pass in the*v1beta2.Asset
which can then be transformed by the user defined script. The scripting language should ideally support:Additionally, it is preferred if the Go type information is retained while running the script.
Describe alternatives you've considered
Change the scope to the broker DNS for Kafka extractor. This is quite fragile and we would need to account for the naming pattern of URNs while defining Kafka, or any other, extractors. Additionally, it still wouldn't work when there is a list of Kafka broker IPs instead of the DNS.
Additional context
The idea of a mapping layer could be utilised in some other scenarios as well such as an HTTP extractor.
The text was updated successfully, but these errors were encountered: