There are very few documentation on how to run Apache Beam with Flink Runner, especially for how to configure the setup. As a result, I'd like to provide an example on how we set up our infrastructure.
The example contains 2 approaches:
-
Kubernetes: please ensure you have the official flink operator installed on your kubernetes cluster in order to run this.
-
Docker Compose: please ensure you have docker-compose installed
In our use cases, the kubernetes is used for production, the docker-compose is for local testing/development.
-
Beam Staging Artifacts: Although Beam supports using S3 as staging artifacts. However, it seems like if we are using x-language support, the job server will not honor that settings. As a result, this example set up another PVC for job manager and task manager.
-
Separate Python Harness SDK Harness to its own container. It seems hard to control and evaluate the python usage, therefore we decide to separate the python harness so that we can just configure the container resources to control the memory assigned to python code.
- first start the cluster:
docker-compose -f docker-compose.yaml up [-d]
- Trigger the app
python example.py \
--topic test --group test-group --bootstrap-server host.docker.internal:9092 \
--job_endpoint host.docker.internal:8099 \
--artifact_endpoint host.docker.internal:8098 \
--environment_type=EXTERNAL \
--environment_config=host.docker.internal:50000
- The example is tested in a m1 laptop. The issue for docker on mac is that I cannot use
host
and thus have to use the workaround withhost.docker.internal
to point to the local machine port. To ensure this work properly so that you can submit the job locally, please update/etc/hosts
with extra line:
127.0.0.1 host.docker.internal
- Also for M1 (or other apple chips), please ensure to enable docker settings
Use Rosetta for x86_64/amd64 emulation on Apple Silicon
-
Ensure you have installed the flink operator
-
If you're not using Docker desktop, make sure you build the image and upload the image to the repo that your k8s can access.
docker build --platform linux/amd64 -t example_image:v1.0 docker/
- If you've updated the docker image location, make sure you update the
image
in k8s.yaml before apply it
kubectl apply -f k8s.yaml