[BEAM-5724] [STRM-1519] Create a fixed number of sdk_worker processes by mwylde · Pull Request #10 · lyft/beam

mwylde · 2018-10-13T00:10:58Z

Changes the behavior of --sdk-worker-parallelism=stage to only create a fixed number of sdk workers (currently hard-coded to 16).

Follow this checklist to help us incorporate your contribution quickly and easily:

Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

It will help us expedite review of your Pull Request if you tag someone (e.g. @username) to look at it.

Post-Commit Tests Status (on master branch)

Lang	Apex	Dataflow	Flink	Gearpump	Samza	Spark
Go	---	---	---	---	---	---
Java
Python	---		---	---	---	---

mwylde · 2018-10-13T00:21:31Z

cc @tweise I wanted to send this out internally first for feedback on the overall approach before sending it to beam

tweise · 2018-10-19T19:12:48Z

runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkJobServerDriver.java

Don't change the modifiers (that was a recent change in master that affects several fields).

tweise · 2018-10-19T19:20:39Z

sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java

Description that explains the option and default is required.

Added default information

tweise · 2018-10-19T19:20:45Z

.../org/apache/beam/runners/flink/translation/functions/FlinkDefaultExecutableStageContext.java

This doesn't work across jobs because the don't share the same class loader. It shouldn't be necessary because the job factory isn't shared between jobs, but just within a job.

What is the intention?

I'm not sure what the best option is here. It's true that the map that holds these should only ever have a single instance (corresponding to one job). However that's not how the APIs for FlinkExecutableStageContext.Factory are set up. We have to implement a get(JobInfo) method which could (in theory) get passed different JobInfos.

Since we get the configuration for the JobFactoryState (i.e., the parallelism) from the JobInfo, it's not clear how we'd handle that case without having support for multiple jobs. I think the only other alternative would be to throw an exception if it's called with a different JobInfo (or refactor the Factory interface).

My preference would be to leave the code as is along with a comment explaining this.

SG. The jobInfo needs to be passed on because it is needed to construct the final context, not to identify the job.

tweise · 2018-10-19T19:21:44Z

sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java

We should have a magic value to indicate auto scale.

Added 0 as an auto-scale value which uses num_cores - 1.

tweise · 2018-10-19T19:30:48Z

We should also added test coverage, see ReferenceCountingFlinkExecutableStageContextFactoryTest

…f one per beam operator

…kers per job per TM

mwylde · 2018-10-23T00:25:23Z

Updated for review comments + added a test.

tweise

Looks good. There are some minor check violations such as missing license header that can be dealt with when opening a PR against Beam.

mwylde changed the title ~~Micah one worker per task~~ [BEAM-5724] [STRM-1519] Create a fixed number of sdk_worker processes Oct 13, 2018

tweise force-pushed the release-2.8.0-lyft branch from 0587bc2 to de79fcf Compare October 15, 2018 01:15

tweise force-pushed the micah_one_worker_per_task branch from b94758f to f0fc51b Compare October 16, 2018 00:18

tweise pushed a commit that referenced this pull request Oct 17, 2018

#10 Improve documentation around URI based data-sources/-sinks

320aef8

tweise pushed a commit that referenced this pull request Oct 17, 2018

#10 Fix typos

dc3a35a

mwylde force-pushed the micah_one_worker_per_task branch from f276448 to 0ffd477 Compare October 17, 2018 18:49

mwylde changed the base branch from release-2.8.0-lyft to release-2.9.0-lyft October 17, 2018 18:50

tweise force-pushed the release-2.9.0-lyft branch from 168a533 to 78cb9d5 Compare October 18, 2018 17:33

mwylde force-pushed the micah_one_worker_per_task branch from 0ffd477 to c0fcb6f Compare October 19, 2018 18:41

tweise suggested changes Oct 19, 2018

View reviewed changes

Micah Wylde added 3 commits October 22, 2018 14:14

In parallelism=stage, create one sdk process per flink task instead o…

c7de9d2

…f one per beam operator

Use a fixed number of processes and round-robin tasks across them

f12e27b

Repurpose --sdk-worker-parallelism to represent the number of sdk wor…

3e520cf

…kers per job per TM

mwylde force-pushed the micah_one_worker_per_task branch from c0fcb6f to 3e520cf Compare October 22, 2018 21:14

Micah Wylde added 2 commits October 22, 2018 15:28

Review updates

b562ac0

Added test for FlinkDefaultExecutableStageContext

e14b869

tweise approved these changes Oct 24, 2018

View reviewed changes

tweise merged commit 8183ec5 into release-2.9.0-lyft Oct 24, 2018

tweise mentioned this pull request Oct 26, 2018

[BEAM-5724] Generalize flink executable context to allow more than 1 worker process per task manager apache/beam#6835

Merged

2 tasks

tweise deleted the micah_one_worker_per_task branch December 4, 2018 04:18

Conversation

mwylde commented Oct 13, 2018

Post-Commit Tests Status (on master branch)

Uh oh!

mwylde commented Oct 13, 2018

Uh oh!

tweise Oct 19, 2018

Choose a reason for hiding this comment

Uh oh!

tweise Oct 19, 2018

Choose a reason for hiding this comment

Uh oh!

mwylde Oct 22, 2018

Choose a reason for hiding this comment

Uh oh!

tweise Oct 19, 2018

Choose a reason for hiding this comment

Uh oh!

tweise Oct 19, 2018

Choose a reason for hiding this comment

Uh oh!

mwylde Oct 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tweise Oct 24, 2018

Choose a reason for hiding this comment

Uh oh!

tweise Oct 19, 2018

Choose a reason for hiding this comment

Uh oh!

mwylde Oct 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tweise commented Oct 19, 2018

Uh oh!

mwylde commented Oct 23, 2018

Uh oh!

tweise left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mwylde Oct 22, 2018 •

edited

Loading

mwylde Oct 22, 2018 •

edited

Loading