You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The general idea is that a job has access to all segments in parallel, assuming that for batch jobs, we don't care about the order of events in a stream.
The idea is to incorporate this API in the preliminary batch connector developed in PR #54.
Problem location
Batch connectors.
Suggestions for an improvement
Use the experimental batch read API
The text was updated successfully, but these errors were encountered:
@tzulitai thanks again for picking this up. I think it is quite urgent, because we're finding that the current implementation is not workable. As we saw with #77, the reader group imposes a single pass over the data, which is (I believe) a violation of the contract of input format.
Please ensure that the solution is compatible with Flink iteration and with multiple executions. For example, the following should work:
DataSource<Integer> source = env.createInput(new FlinkPravegaInputFormat<>(...));
Assert.assertEquals("count is incorrect (first pass)", expectedCount, source.count());
Assert.assertEquals("count is incorrect (second pass)", expectedCount, source.count());
I find that a single pass does suffice for most scenarios within a single job, except when a failure occurs as discussed in #56, due to intermediate result caching.
Problem description
We have merged an experimental API for batch reads of a stream:
pravega/pravega@493a9f4
The general idea is that a job has access to all segments in parallel, assuming that for batch jobs, we don't care about the order of events in a stream.
The idea is to incorporate this API in the preliminary batch connector developed in PR #54.
Problem location
Batch connectors.
Suggestions for an improvement
Use the experimental batch read API
The text was updated successfully, but these errors were encountered: