feat: add mapping processor #420

sudo-suhas · 2022-10-10T12:24:48Z

Is your feature request related to a problem? Please describe.

Background:
In a recipe for Kafka extractor, we use a urn-scope for constructing the URN for the topic. Example config:

name: prod-example-logstream
version: v1beta1
source:
  name: kafka
  scope: prod-example-logstream
  config:
    broker: production-example-logstream-kafka.example-company.io:9993
sinks:
  # ...

Example URN with above config: urn:kafka:prod-example-logstream:topic:my-service-log.

When we are trying to extract any given asset model type that has Kafka as an upstream, we would not be able to construct the URN. This problem became apparent while working on #417 where a Feature Table could have a Kafka topic as an upstream. Only the Kafka broker information would be available in the Feature Store.

This specific example can be applicable for other extractor-URN format combinations as well.

Describe the solution you'd like

Introduce a mapping processor that would run after the lineage is built and emitted by the extractor. The mapping processor would leverage a scripting language to provide a sandbox to execute the script in the recipe to be provided by the user.

The API of the Processor would not change, we would pass in the *v1beta2.Asset which can then be transformed by the user defined script. The scripting language should ideally support:

Running the script in a black-box with capability to detect and recover from errors
Assigning a computed or literal value to a field
Iterating a list and transforming each element in the list
Basic string manipulation functions such as replace, join, split, lowercase, uppercase etc.
Conditionals such as if, case etc with boolean logic and arithmetic
Custom helper functions that could be added in the future. ex: resolve DNS for IP

Additionally, it is preferred if the Go type information is retained while running the script.

Describe alternatives you've considered

Change the scope to the broker DNS for Kafka extractor. This is quite fragile and we would need to account for the naming pattern of URNs while defining Kafka, or any other, extractors. Additionally, it still wouldn't work when there is a list of Kafka broker IPs instead of the DNS.

Additional context

The idea of a mapping layer could be utilised in some other scenarios as well such as an HTTP extractor.

The text was updated successfully, but these errors were encountered:

sudo-suhas · 2022-10-27T10:22:13Z

Based on the analysis done in https://github.com/sudo-suhas/play-script-engine, we are proceeding with using Tengo for the scripting layer.

sudo-suhas changed the title ~~feat(processor): add mapping processor~~ feat: add mapping processor Oct 10, 2022

sudo-suhas mentioned this issue Nov 3, 2022

feat: add script processor using Tengo #428

Merged

sudo-suhas closed this as completed in #428 Nov 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add mapping processor #420

feat: add mapping processor #420

sudo-suhas commented Oct 10, 2022 •

edited

sudo-suhas commented Oct 27, 2022

feat: add mapping processor #420

feat: add mapping processor #420

Comments

sudo-suhas commented Oct 10, 2022 • edited

sudo-suhas commented Oct 27, 2022

sudo-suhas commented Oct 10, 2022 •

edited