Bug: If one stream has an error, then all streams fail and no STATE messages are ever sent #1064

spacecowboy · 2022-10-11T08:47:23Z

Singer SDK Version

0.11.1

Python Version

3.9

Bug scope

Targets (data type handling, batching, SQL object generation, etc.)

Operating System

Linux

Description

Scenario

Your target is using incremental replication
Your tap is sending multiple streams
Each stream gets sent to its own Sink
After a while, one of the Sinks throws an exception. For example if the tap sent a bad value in a record.
This exception stops everything. The Target and all Sinks stop.

What is the actual result

Not a single STATE message is emitted by the target during this process even though it might have spent hours processing data already. Which means you need to start over from scratch. For ALL streams.

What is the expected result

The TARGET or SINKs should emit state messages periodically so state is not lost.

Code

No response

The text was updated successfully, but these errors were encountered:

spacecowboy · 2022-10-11T08:50:10Z

I can mention that I'm using a BatchSink. It would be great if a state message was emitted after each batch.

spacecowboy · 2022-10-11T09:01:00Z

I propose changing the default _MAX_RECORD_AGE_IN_MINUTES in Target to something much smaller than 30 minutes.

tayloramurphy · 2022-10-11T15:56:40Z

@edgarrmondragon and @kgpayne thoughts here since you've been working on BATCH support and implementation

edgarrmondragon · 2022-10-11T22:29:04Z

@tayloramurphy this is independent from BATCH messages, there's a bit of term overloading (see #963 (comment) and
#1026).

@spacecowboy what's the target you're working with and are you able to override the value there? It might be a good idea to get heuristics or make the target batch size and wait limit configurable, similar to target-snowflake's batch_wait_limit_seconds and batch_size_rows.

spacecowboy · 2022-10-12T19:25:00Z

@spacecowboy what's the target you're working with and are you able to override the value there? It might be a good idea to get heuristics or make the target batch size and wait limit configurable, similar to target-snowflake's batch_wait_limit_seconds and batch_size_rows.

I have my own target-bigquery. I have overridden the value (in fact made it part of the config).

But in my opinion it just makes sense for the default value to be a lot smaller.

aaronsteers · 2022-11-01T20:27:49Z

@spacecowboy - re:

I propose changing the default _MAX_RECORD_AGE_IN_MINUTES in Target to something much smaller than 30 minutes.

Yes, this sounds like a good approach to me as well. 10 or 15 minutes is not often enough that we're spending a large percentage of time in the batch flush operations. Even 5 minutes might be okay here.

If we find that the max record age is not triggering as expected, that would be a separate bug we'd want to also address.

~~In your own target, what do you think of overriding _MAX_RECORD_AGE_IN_MINUTES to confirm the fix, and then we can merge back here as a PR if the STATE messages are emitted as expected?~~

Nm. I see above you've already confirmed. Yes, I'm in favor. 👍

tayloramurphy · 2022-11-15T22:38:16Z

@aaronsteers what's your view on priority here? Would love a weight estimate as well.

aaronsteers · 2022-11-16T06:36:12Z

@tayloramurphy and @spacecowboy - some scenarios affected here would be resolved by this which is merged and soon-to-be released:

fix: Write a valid final state message at the end of each stream sync #1164

The relative priority for this would be low, since all is functioning as designed - or will be when the above releases. That said, I've added the Accepting Pull Requests label to invite contributions if anyone has time to pick this up.

Weight is probably 1 or 2 if we only reduce the trigger cadence defined in _MAX_RECORD_AGE_IN_MINUTES from 30 minutes to 15 or 10 - so we could also just decide to pick it up as low-hanging fruit.

Fixes meltano#1064

spacecowboy · 2022-11-16T08:57:52Z

See #1192

Fixes meltano#1064

tayloramurphy · 2022-11-16T15:25:55Z

Even better with the PR @spacecowboy 😄

@aaronsteers I've added the community contributed PR label and added it to the engineering board as well.

…tead of 30 mins (#1192) Co-authored-by: Edgar R. M <edgar@meltano.com> Fixes #1064

spacecowboy added kind/Bug Something isn't working valuestream/SDK labels Oct 11, 2022

spacecowboy changed the title ~~[Bug]: If one stream has an error, then in effect all streams fail~~ [Bug]: If one stream has an error, then all streams fail and no STATE messages are ever sent Oct 11, 2022

aaronsteers changed the title ~~[Bug]: If one stream has an error, then all streams fail and no STATE messages are ever sent~~ Bug: If one stream has an error, then all streams fail and no STATE messages are ever sent Nov 16, 2022

aaronsteers added the Accepting Pull Requests label Nov 16, 2022

spacecowboy added a commit to spacecowboy/sdk that referenced this issue Nov 16, 2022

Updated max record age to 5 instead of 30 mins

d256bc3

Fixes meltano#1064

spacecowboy mentioned this issue Nov 16, 2022

fix(taps): Change max record age for emitting state messages to 5 instead of 30 mins #1192

Merged

spacecowboy added a commit to spacecowboy/sdk that referenced this issue Nov 16, 2022

Updated max record age to 5 instead of 30 mins

f00c774

Fixes meltano#1064

tayloramurphy added the Community-Contributed PR label Nov 16, 2022

edgarrmondragon closed this as completed in #1192 Nov 16, 2022

edgarrmondragon pushed a commit that referenced this issue Nov 16, 2022

fix(taps): Change max record age for emitting state messages to 5 ins…

61ed7b4

…tead of 30 mins (#1192) Co-authored-by: Edgar R. M <edgar@meltano.com> Fixes #1064

dluo-sig mentioned this issue Sep 30, 2024

feat: Introduce setting for max record age on targets #2692

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: If one stream has an error, then all streams fail and no STATE messages are ever sent #1064

Bug: If one stream has an error, then all streams fail and no STATE messages are ever sent #1064

spacecowboy commented Oct 11, 2022

spacecowboy commented Oct 11, 2022

spacecowboy commented Oct 11, 2022

tayloramurphy commented Oct 11, 2022

edgarrmondragon commented Oct 11, 2022

spacecowboy commented Oct 12, 2022

aaronsteers commented Nov 1, 2022 •

edited

Loading

tayloramurphy commented Nov 15, 2022

aaronsteers commented Nov 16, 2022 •

edited

Loading

spacecowboy commented Nov 16, 2022

tayloramurphy commented Nov 16, 2022

Bug: If one stream has an error, then all streams fail and no STATE messages are ever sent #1064

Bug: If one stream has an error, then all streams fail and no STATE messages are ever sent #1064

Comments

spacecowboy commented Oct 11, 2022

Singer SDK Version

Python Version

Bug scope

Operating System

Description

Scenario

What is the actual result

What is the expected result

Code

spacecowboy commented Oct 11, 2022

spacecowboy commented Oct 11, 2022

tayloramurphy commented Oct 11, 2022

edgarrmondragon commented Oct 11, 2022

spacecowboy commented Oct 12, 2022

aaronsteers commented Nov 1, 2022 • edited Loading

tayloramurphy commented Nov 15, 2022

aaronsteers commented Nov 16, 2022 • edited Loading

spacecowboy commented Nov 16, 2022

tayloramurphy commented Nov 16, 2022

aaronsteers commented Nov 1, 2022 •

edited

Loading

aaronsteers commented Nov 16, 2022 •

edited

Loading