-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue #42] Add initial batch read support for Flink Pravega Connectors #54
Conversation
12a2fe3
to
84c6078
Compare
84c6078
to
bb3089a
Compare
@tzulitai We will be able to handle better end of input once we implement pravega/pravega#1916. If you want this merged before we complete that work, then I suggest we create an issue to track it and fix it later. |
throw new IOException("Failed to read next event.", e); | ||
} | ||
|
||
// TODO this "end of input" marker is too brittle, as the timeout could easily be a temporary hiccup; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See pravega/pravega#1916. We can fix this here once we complete the work of that issue.
|
||
private static final long serialVersionUID = 1L; | ||
|
||
private static final long DEFAULT_EVENT_READ_TIMEOUT = 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, but on the optimistic side, perhaps a value like 10s would suit better. If this is a batch read and all the stream data is there, then we should in the absence of any glitches never time out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This timeout also affects job completion, since we don't have a true EOF. Elsewhere a code comment suggests using a retry to disambiguate EOF, which would be better than an increased timeout.
this.startTime = startTime; | ||
this.readerGroupName = generateRandomReaderGroupName(); | ||
|
||
// TODO: This will require the client to have access to the pravega controller and handle any temporary errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general suggestion, I like the idea of creating issues for TODOs in the code so that we can track what we need to fix in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested this, looks good. Let's merge!
|
||
private static final long serialVersionUID = 1L; | ||
|
||
private static final long DEFAULT_EVENT_READ_TIMEOUT = 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This timeout also affects job completion, since we don't have a true EOF. Elsewhere a code comment suggests using a retry to disambiguate EOF, which would be better than an increased timeout.
I am going to merge this now and ensure that any unfinished aspect is tracked in a follow-up issue. |
Change log description
FlinkPravegaInputFormat
Purpose of the change
Add batch support to the connector. Users can add the
FlinkPravegaInputFormat
as a source to Flink batch jobs.Known pitfalls / limitations
InitializeOnMaster
andFinalizeOnMaster
hooks only currently works forOutputFormat
s, so we can't use them for the one-time reader group creation / deletion.null
on of the last read eventEventRead.getEvent()
as an indicator. We might want to add some retry logic there to strengthen that, and avoid cases where thenull
return was just caused by some timeout caused by temporary read hiccups.How to verify it
Added integration test:
FlinkPravegaInputFormatTest
. The test basically writes some events to Pravega, and uses a batch Flink job to read all written events and verifies that everything is properly read when the batch job finishes.