Use Pravega batch read API with batch connector #65

fpj · 2017-11-02T13:39:46Z

Problem description
We have merged an experimental API for batch reads of a stream:

pravega/pravega@493a9f4

The general idea is that a job has access to all segments in parallel, assuming that for batch jobs, we don't care about the order of events in a stream.

The idea is to incorporate this API in the preliminary batch connector developed in PR #54.

Problem location
Batch connectors.

Suggestions for an improvement
Use the experimental batch read API

The text was updated successfully, but these errors were encountered:

EronWright · 2017-12-08T19:15:24Z

@tzulitai thanks again for picking this up. I think it is quite urgent, because we're finding that the current implementation is not workable. As we saw with #77, the reader group imposes a single pass over the data, which is (I believe) a violation of the contract of input format.

Please ensure that the solution is compatible with Flink iteration and with multiple executions. For example, the following should work:

DataSource<Integer> source = env.createInput(new FlinkPravegaInputFormat<>(...));
Assert.assertEquals("count is incorrect (first pass)", expectedCount, source.count());
Assert.assertEquals("count is incorrect (second pass)", expectedCount, source.count());

I find that a single pass does suffice for most scenarios within a single job, except when a failure occurs as discussed in #56, due to intermediate result caching.

EronWright mentioned this issue Nov 30, 2017

[issue-56] Reset reader group on every execution attempt in FlinkPravegaInputFormat #77

Closed

tzulitai self-assigned this Dec 7, 2017

EronWright mentioned this issue Dec 8, 2017

Batch connector doesn't rewind on task failover #56

Closed

tzulitai mentioned this issue Dec 13, 2017

[issue-65] Rework FlinkPravegaInputFormat to use batch client #90

Merged

EronWright closed this as completed in #90 Dec 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Pravega batch read API with batch connector #65

Use Pravega batch read API with batch connector #65

fpj commented Nov 2, 2017 •

edited

Loading

EronWright commented Dec 8, 2017 •

edited

Loading

Use Pravega batch read API with batch connector #65

Use Pravega batch read API with batch connector #65

Comments

fpj commented Nov 2, 2017 • edited Loading

EronWright commented Dec 8, 2017 • edited Loading

fpj commented Nov 2, 2017 •

edited

Loading

EronWright commented Dec 8, 2017 •

edited

Loading