Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: possible regression in state messages and progress_markers handling #1704

Closed
1 task done
laurentS opened this issue May 15, 2023 · 8 comments · Fixed by #1708
Closed
1 task done

bug: possible regression in state messages and progress_markers handling #1704

laurentS opened this issue May 15, 2023 · 8 comments · Fixed by #1708
Assignees
Labels
kind/Bug Something isn't working valuestream/SDK

Comments

@laurentS
Copy link
Contributor

Singer SDK Version

0.23.0

Is this a regression?

  • Yes

Python Version

NA

Bug scope

Taps (catalog, state, etc.)

Operating System

linux

Description

CI checks in MeltanoLabs/tap-github#209 are now failing on a test related to progress_markers.

I've narrowed down the regression (?) to between sdk v0.22.1 (pass) and v0.23.0 (fail), and I think it is specifically #1436 which caused the regression. Tagging @aaronsteers as the author of that PR.

The test that fails was added to tap-github in this PR after this related PR was merged in the SDK. In short, progress_markers now appears in the last state message issued by the sdk, which used to signify that the message wasn't "final".

It looks like there are some changes in how state is handled, so I'm opening this as a discussion to understand if there was indeed a regression, or if the test in tap-github needs to be modified to accomodate the new behaviour in the sdk.

Code

No response

@laurentS laurentS added kind/Bug Something isn't working valuestream/SDK labels May 15, 2023
@aaronsteers
Copy link
Contributor

@laurentS - thanks for reporting. 👍👍

Cc @kgpayne for extra pair of eyes and since he contributed also on the noted PR.

@aaronsteers
Copy link
Contributor

aaronsteers commented May 15, 2023

@laurentS - it's possible that this is functioning as designed and that the test should be modified... it will depend on the desired output of the test.

Essentially, the key question is whether the test actually did "complete" the steam that is being tested - or if the stream sync was "aborted abnormally" due to max record abort signal. If the latter, then the correct result would be an unfinalized state.

If you want to update the stream to deal with abort scenarios cleanly, you can mark the stream as sorted/resumable, and/or you can add handling to deal with abort-type exceptions.

Probably there are other ways to address as well, and it's possible that there's another issue going on, but this at least should kick off the conversation... 😁

@laurentS
Copy link
Contributor Author

Thanks for clarifying @aaronsteers!

My understanding so far:

  • the test in tap-github (test_last_state_message_is_valid) was added to check that we do get a valid final state message in cases where state_partitioning_keys is overridden in a stream. With sdk v0.13.1, we were having problems with streams restarting on each run because of a missing final state message. I think this test needs to remain as is.

  • in fix: handle sync abort, reduce duplicate STATE messages, rename _MAX_RECORD_LIMIT as ABORT_AT_RECORD_COUNT #1436, you mention:

    Tests which actually want a sync_all() behavior can call sync_all() directly, as this change reverts the behavior so that sync_all() will once again sync all records. This is the only way to ensure tests will get valid and finalized state messages. Otherwise, the aborted streams will have bookmarks left in an unfinalized, non-resumeable state.

    This is what the test does (call sync_all()), so I'd expect it to pass based on the info I have so far. My understanding is that ABORT_AT_RECORD_COUNT is aimed at speeding up tests when lots of records are expected, but this isn't really what we're testing here. Did I get this right?

@kgpayne kgpayne self-assigned this May 16, 2023
@kgpayne
Copy link
Contributor

kgpayne commented May 16, 2023

@laurentS @aaronsteers @edgarrmondragon I think the ABORT_AT_RECORD_COUNT related changes may be a red herring - as part of those works we also 'fixed' the noisy duplicate STATE messages. I believe that either there is a bug in our record_index logic for choosing when to write state messages, or a missing self._is_state_flushed = False somewhere that would be keeping the final state from firing 🤔 tap-github uses state partitioning, which also affects how/when state is updated.

Looking into this now 👍

@edgarrmondragon
Copy link
Collaborator

Looking into this now 👍

Thanks @kgpayne!

@aaronsteers
Copy link
Contributor

I think the ABORT_AT_RECORD_COUNT related changes may be a red herring...

Yeah, makes sense. Your draft PR #1708 looks like it addresses the issue. 👍 I added a comment there in the PR about some internals, but nothing that would block your PR.

@tayloramurphy
Copy link
Collaborator

@edgarrmondragon reassigning this to you since Ken is out. Can you take a look and let me know if we need to make progress on this?

@edgarrmondragon
Copy link
Collaborator

edgarrmondragon commented Jul 14, 2023

I'll be taking a look at this next week Done this already

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/Bug Something isn't working valuestream/SDK
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants