Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: multiple parents for a child stream #2290

Open
rubenvereecken opened this issue Mar 7, 2024 · 2 comments
Open

feat: multiple parents for a child stream #2290

rubenvereecken opened this issue Mar 7, 2024 · 2 comments

Comments

@rubenvereecken
Copy link

Feature scope

Taps (catalog, state, stream maps, tests, etc.)

Description

We already have 1:many relationships (1 parent, many children): #97

It would be incredibly useful to have many:many relationships while still maintaining a DAG. This would allow a child stream to gather input from multiple parents, combinatorially.

My concrete use case: fetch events for countries and categories, which first have to be fetched themselves.

Current possible workarounds that I'm contemplating:

  • Fetch all countries and categories (=parent streams) in the events (=child ) stream setup. Paginate through all of them.
  • Create an artificial combinatorial countries-and-categories parent stream. Disregard the output, but use it to run the child stream.
@edgarrmondragon
Copy link
Collaborator

Interesting idea. Thanks for filing @rubenvereecken!

I think there's a few open questions, but mainly when is the child sync triggered?.

Right now, a sync is triggered after each parent record is extracted, but if there's more than one parent then we'd need to store a queue of parent contexts, and schedule the child to be synced after all the parents have finished syncing or after the queue is full. Depending on the volume coming from the parents, syncing the child stream would then take a long time to start syncing.

Another question may be how multiple parent contexts should be merged.

Any thoughts, ideas, suggestions?

@rubenvereecken
Copy link
Author

I think letting a sync manager store up contexts is the way to go. For each child, store the parents. When the queue for each is parent is non-empty, start generating combinations.

Don't see a way around the child stream having to wait a long time... unless the sync manager optimises for that, but that would be very different behaviour to what we have now.

As to combining parent contexts – a simple merge should cover most use cases. With the option to override it in the sync manager?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants