Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add mapping processor #420

Closed
sudo-suhas opened this issue Oct 10, 2022 · 2 comments · Fixed by #428
Closed

feat: add mapping processor #420

sudo-suhas opened this issue Oct 10, 2022 · 2 comments · Fixed by #428

Comments

@sudo-suhas
Copy link
Contributor

sudo-suhas commented Oct 10, 2022

Is your feature request related to a problem? Please describe.

Background:
In a recipe for Kafka extractor, we use a urn-scope for constructing the URN for the topic. Example config:

name: prod-example-logstream
version: v1beta1
source:
  name: kafka
  scope: prod-example-logstream
  config:
    broker: production-example-logstream-kafka.example-company.io:9993
sinks:
  # ...

Example URN with above config: urn:kafka:prod-example-logstream:topic:my-service-log.

When we are trying to extract any given asset model type that has Kafka as an upstream, we would not be able to construct the URN. This problem became apparent while working on #417 where a Feature Table could have a Kafka topic as an upstream. Only the Kafka broker information would be available in the Feature Store.

This specific example can be applicable for other extractor-URN format combinations as well.

Describe the solution you'd like

Introduce a mapping processor that would run after the lineage is built and emitted by the extractor. The mapping processor would leverage a scripting language to provide a sandbox to execute the script in the recipe to be provided by the user.

The API of the Processor would not change, we would pass in the *v1beta2.Asset which can then be transformed by the user defined script. The scripting language should ideally support:

  • Running the script in a black-box with capability to detect and recover from errors
  • Assigning a computed or literal value to a field
  • Iterating a list and transforming each element in the list
  • Basic string manipulation functions such as replace, join, split, lowercase, uppercase etc.
  • Conditionals such as if, case etc with boolean logic and arithmetic
  • Custom helper functions that could be added in the future. ex: resolve DNS for IP

Additionally, it is preferred if the Go type information is retained while running the script.

Describe alternatives you've considered

Change the scope to the broker DNS for Kafka extractor. This is quite fragile and we would need to account for the naming pattern of URNs while defining Kafka, or any other, extractors. Additionally, it still wouldn't work when there is a list of Kafka broker IPs instead of the DNS.

Additional context

The idea of a mapping layer could be utilised in some other scenarios as well such as an HTTP extractor.

@sudo-suhas sudo-suhas changed the title feat(processor): add mapping processor feat: add mapping processor Oct 10, 2022
@sudo-suhas
Copy link
Contributor Author

Based on the analysis done in https://github.com/sudo-suhas/play-script-engine, we are proceeding with using Tengo for the scripting layer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants
@sudo-suhas and others