Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Pravega batch read API with batch connector #65

Closed
fpj opened this issue Nov 2, 2017 · 1 comment · Fixed by #90
Closed

Use Pravega batch read API with batch connector #65

fpj opened this issue Nov 2, 2017 · 1 comment · Fixed by #90
Assignees

Comments

@fpj
Copy link
Contributor

fpj commented Nov 2, 2017

Problem description
We have merged an experimental API for batch reads of a stream:

pravega/pravega@493a9f4

The general idea is that a job has access to all segments in parallel, assuming that for batch jobs, we don't care about the order of events in a stream.

The idea is to incorporate this API in the preliminary batch connector developed in PR #54.

Problem location
Batch connectors.

Suggestions for an improvement
Use the experimental batch read API

@EronWright
Copy link
Contributor

EronWright commented Dec 8, 2017

@tzulitai thanks again for picking this up. I think it is quite urgent, because we're finding that the current implementation is not workable. As we saw with #77, the reader group imposes a single pass over the data, which is (I believe) a violation of the contract of input format.

Please ensure that the solution is compatible with Flink iteration and with multiple executions. For example, the following should work:

DataSource<Integer> source = env.createInput(new FlinkPravegaInputFormat<>(...));
Assert.assertEquals("count is incorrect (first pass)", expectedCount, source.count());
Assert.assertEquals("count is incorrect (second pass)", expectedCount, source.count());

I find that a single pass does suffice for most scenarios within a single job, except when a failure occurs as discussed in #56, due to intermediate result caching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants