Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Change restriction to OffsetByteRange to allow functioning with runnerv2. #674

Merged
merged 3 commits into from
Jun 15, 2021

Conversation

dpcollins-google
Copy link
Collaborator

@dpcollins-google dpcollins-google commented Jun 8, 2021

Fixes #676

@dpcollins-google dpcollins-google requested a review from a team as a code owner June 8, 2021 03:58
@product-auto-label product-auto-label bot added the api: pubsublite Issues related to the googleapis/java-pubsublite API. label Jun 8, 2021
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Jun 8, 2021
@dpcollins-google dpcollins-google added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 8, 2021
Copy link

@boyuanzz boyuanzz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you going to fix the coder issue with setPartition() in this PR or a separate PR?

import org.apache.beam.sdk.schemas.annotations.DefaultSchema;

@AutoValue
@DefaultSchema(AutoValueSchema.class)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI this change will introduce backward incompatibility for updating a Dataflow pipeline.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood.

@dpcollins-google
Copy link
Collaborator Author

Are you going to fix the coder issue with setPartition() in this PR or a separate PR?

It has already been removed since it does not provide an access pattern we want to expose.

@boyuanzz
Copy link

boyuanzz commented Jun 9, 2021

Are you going to fix the coder issue with setPartition() in this PR or a separate PR?

It has already been removed since it does not provide an access pattern we want to expose.

Then you should remove this from Beam as well. Or if you decide to keep your own version, you should consider removing your code from Beam.

new OffsetRange(currentRestriction().getRange().getFrom(), nextOffset()),
range.getByteCount());
return SplitResult.of(
this.range, OffsetByteRange.of(new OffsetRange(nextOffset(), Long.MAX_VALUE), 0));
Copy link

@boyuanzz boyuanzz Jun 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By returning OffsetByteRange.of(new OffsetRange(nextOffset(), Long.MAX_VALUE), 0), your getSize() function will use 'MAX_LONG - nextOffset' as backlog. Bad backlog estimation will have side-effect on dataflow autoscaling strategy.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe so? GetSize delegates to the underlying OffsetByteRangeTracker if the range is unbounded on the right (like this is) and so will call out to this.backlogReader.computeMessageStats(range.getRange().getFrom()) to compute forward looking progress stats. For the existing range, it ships the byte size along with it which will be used since the already-finished range is closed. Is this analysis incorrect given how the runner interacts with these methods?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct.

@dpcollins-google
Copy link
Collaborator Author

Are you going to fix the coder issue with setPartition() in this PR or a separate PR?

It has already been removed since it does not provide an access pattern we want to expose.

Then you should remove this from Beam as well. Or if you decide to keep your own version, you should consider removing your code from Beam.

Yes, I've already created apache/beam#14976 to copy this and other changes over.

@boyuanzz
Copy link

Changes which are related to beam look good to me.

@dpcollins-google dpcollins-google added the kokoro:run Add this label to force Kokoro to re-run the tests. label Jun 15, 2021
@yoshi-kokoro yoshi-kokoro removed kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Jun 15, 2021
@dpcollins-google dpcollins-google merged commit 1749ca9 into master Jun 15, 2021
@dpcollins-google dpcollins-google deleted the errorprone branch June 15, 2021 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: pubsublite Issues related to the googleapis/java-pubsublite API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crashes under runnerv2 due to project tracking semantics
4 participants