Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: If one stream has an error, then all streams fail and no STATE messages are ever sent #1064

Closed
spacecowboy opened this issue Oct 11, 2022 · 10 comments · Fixed by #1192
Closed

Comments

@spacecowboy
Copy link
Contributor

Singer SDK Version

0.11.1

Python Version

3.9

Bug scope

Targets (data type handling, batching, SQL object generation, etc.)

Operating System

Linux

Description

Scenario

  • Your target is using incremental replication
  • Your tap is sending multiple streams
  • Each stream gets sent to its own Sink
  • After a while, one of the Sinks throws an exception. For example if the tap sent a bad value in a record.
  • This exception stops everything. The Target and all Sinks stop.

What is the actual result

Not a single STATE message is emitted by the target during this process even though it might have spent hours processing data already. Which means you need to start over from scratch. For ALL streams.

What is the expected result

The TARGET or SINKs should emit state messages periodically so state is not lost.

Code

No response

@spacecowboy spacecowboy added kind/Bug Something isn't working valuestream/SDK labels Oct 11, 2022
@spacecowboy spacecowboy changed the title [Bug]: If one stream has an error, then in effect all streams fail [Bug]: If one stream has an error, then all streams fail and no STATE messages are ever sent Oct 11, 2022
@spacecowboy
Copy link
Contributor Author

I can mention that I'm using a BatchSink. It would be great if a state message was emitted after each batch.

@spacecowboy
Copy link
Contributor Author

I propose changing the default _MAX_RECORD_AGE_IN_MINUTES in Target to something much smaller than 30 minutes.

@tayloramurphy
Copy link
Collaborator

@edgarrmondragon and @kgpayne thoughts here since you've been working on BATCH support and implementation

@edgarrmondragon
Copy link
Collaborator

@tayloramurphy this is independent from BATCH messages, there's a bit of term overloading (see #963 (comment) and
#1026).

@spacecowboy what's the target you're working with and are you able to override the value there? It might be a good idea to get heuristics or make the target batch size and wait limit configurable, similar to target-snowflake's batch_wait_limit_seconds and batch_size_rows.

@spacecowboy
Copy link
Contributor Author

@spacecowboy what's the target you're working with and are you able to override the value there? It might be a good idea to get heuristics or make the target batch size and wait limit configurable, similar to target-snowflake's batch_wait_limit_seconds and batch_size_rows.

I have my own target-bigquery. I have overridden the value (in fact made it part of the config).

But in my opinion it just makes sense for the default value to be a lot smaller.

@aaronsteers
Copy link
Contributor

aaronsteers commented Nov 1, 2022

@spacecowboy - re:

I propose changing the default _MAX_RECORD_AGE_IN_MINUTES in Target to something much smaller than 30 minutes.

Yes, this sounds like a good approach to me as well. 10 or 15 minutes is not often enough that we're spending a large percentage of time in the batch flush operations. Even 5 minutes might be okay here.

If we find that the max record age is not triggering as expected, that would be a separate bug we'd want to also address.

In your own target, what do you think of overriding _MAX_RECORD_AGE_IN_MINUTES to confirm the fix, and then we can merge back here as a PR if the STATE messages are emitted as expected?

Nm. I see above you've already confirmed. Yes, I'm in favor. 👍

@tayloramurphy
Copy link
Collaborator

@aaronsteers what's your view on priority here? Would love a weight estimate as well.

@aaronsteers aaronsteers changed the title [Bug]: If one stream has an error, then all streams fail and no STATE messages are ever sent Bug: If one stream has an error, then all streams fail and no STATE messages are ever sent Nov 16, 2022
@aaronsteers
Copy link
Contributor

aaronsteers commented Nov 16, 2022

@tayloramurphy and @spacecowboy - some scenarios affected here would be resolved by this which is merged and soon-to-be released:

The relative priority for this would be low, since all is functioning as designed - or will be when the above releases. That said, I've added the Accepting Pull Requests label to invite contributions if anyone has time to pick this up.

Weight is probably 1 or 2 if we only reduce the trigger cadence defined in _MAX_RECORD_AGE_IN_MINUTES from 30 minutes to 15 or 10 - so we could also just decide to pick it up as low-hanging fruit.

@spacecowboy
Copy link
Contributor Author

See #1192

@tayloramurphy
Copy link
Collaborator

Even better with the PR @spacecowboy 😄

@aaronsteers I've added the community contributed PR label and added it to the engineering board as well.

edgarrmondragon pushed a commit that referenced this issue Nov 16, 2022
…tead of 30 mins (#1192)

Co-authored-by: Edgar R. M <edgar@meltano.com>
Fixes #1064
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants