Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Datetime handling in streammaps #986

Closed
qbatten opened this issue Sep 20, 2022 · 2 comments · Fixed by #1175
Closed

[Feature]: Datetime handling in streammaps #986

qbatten opened this issue Sep 20, 2022 · 2 comments · Fixed by #1175
Labels
kind/Feature New feature or request valuestream/SDK

Comments

@qbatten
Copy link
Contributor

qbatten commented Sep 20, 2022

Feature scope

Taps (catalog, state, stream maps, etc.)

Description

Being able to reference datetime (or equivalent) Python objects in streammaps statements would be very helpful.

The simple way to do this is mentioned here: grafting a few custom datetime uses into streammaps, similarly to the way it was done with md5. Anything more complex than that (I think) would dramatically increase the lift, because you'd likely have to change the way statements are evaluated (currently simpleeval).

A few cases where this would be useful:

  • Filtering out rows that were created <24 hours ago (would need something like "datetime.now()")
  • Filtering out rows that were created on some specific date or date range (e.g. date of a known bug that created bad data)
@qbatten qbatten added kind/Feature New feature or request valuestream/SDK labels Sep 20, 2022
@qbatten
Copy link
Contributor Author

qbatten commented Sep 29, 2022

Continuing to think through this— I see two paths forward:

  1. The most flexible way to do this would be to just pass the datetime.datetime object as a function into simpleeeval. This allows users to essentially anything they want with datetimes. I would prefer this; it opens up a ton of use cases and is a simple/small change to the code.
  2. A less-flexible alternative is to do a few smaller, more locked-down functions that always return a string, rather than giving the end-user an object to play around with.

Option 1 opens up a ton of use cases, is highly flexible for end-users, and is a very simple change to make. Some concerns I can think of with option 1 are:

  • Security concerns? I am not sure exactly how to evaluate this. I guess the main thing to be concerned about is if giving the user access to the datetime library could allow some sort of escalation of privilege. I assume that giving the user more complex libraries/objects to interact with results in an increased attack surface.
  • The increased flexibility adds complexity to the environment that the user is working with, and it gives the user more room to make weird mistakes that aren't handled well by the plugin or Meltano.

Does the Meltano team have thoughts about these two options?

@edgarrmondragon
Copy link
Collaborator

I like Option 1, and it's similar to what Airflow provides in their templates context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/Feature New feature or request valuestream/SDK
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants