Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional Merges [REQUEST] #97

Open
angelosantos4 opened this issue Sep 20, 2023 · 1 comment
Open

Conditional Merges [REQUEST] #97

angelosantos4 opened this issue Sep 20, 2023 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@angelosantos4
Copy link
Contributor

angelosantos4 commented Sep 20, 2023

Is your feature request related to a problem? Please describe.
When we ingest datatypes from different sources, we may run into the issue where we ingest data from an old source and one from a recent source with different properties. The current implementation builds a merge query which simply overwrites the properties on matched nodes based on which record was ingested first. I would like for there to be a way to conditionally change properties on a node.

Describe the solution you'd like
When I create an interpretation within my pipeline, I would like to declare the following:

merge_condition: latest # Where we could have different options latest being greater value wins. default=None
condition_key: date_created # key of the value we are comparing with.
condition_value: !!python/jmespath date_created #The value from the record we are pulling from

This would then modify the merge query which currently performs the following for source nodes:

MERGE(node:$node_type) WHERE node.key = $key
ON CREATE
    SET node.param = param.value
ON MATCH
    SET node.param = param.value

I would like it to create the following:

MERGE(node:$node_type) WHERE node.key = $key
ON CREATE
    SET node.param = param.value
ON MATCH
    SET node.condition = CASE WHEN $condition THEN true ELSE false END // We need a variable for the condition in some way
    SET node.param = CASE WHEN node.condition THEN param.value ELSE node.param END
    // Find a way to unset node.condition

Where condition in our case would be (date_created > node.date_created)

Describe alternatives you've considered
The alternative I can perform to ensure the recency of my data is I can schedule my pipelines such that the recent data comes in after the old data.
In my pipeline, I can create an interpreter that makes a call to the database to get the value, then conditionally write to the database (this takes too long.)

Additional context

@angelosantos4 angelosantos4 added the enhancement New feature or request label Sep 20, 2023
@zprobst
Copy link
Contributor

zprobst commented Oct 19, 2023

I agree that this could be handy. Question marks around whether this is required in an ETL framework. Definitely willing to take PRs on this.

One major challenge is going to be around retaining the abstraction behind graph databases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Backlog
Development

No branches or pull requests

2 participants