Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support for python executable #165

Merged
merged 10 commits into from
Dec 22, 2021

Conversation

moonsub-kim
Copy link
Contributor

#164

I implemented to support python executable in the CRD.

To run python or jarFile, the JobSpec.JarFile is changed with optional variable
and validate that only one of them is specified.

- **className** (required): Fully qualified Java class name of the job.
- **jarFile** (optional): JAR file of the job. It could be a local file or remote URI, depending on which
protocols (e.g., `https://`, `gs://`) are supported by the Flink image. `jarFile` or `python` is required
- **className** (optional): Fully qualified Java class name of the job.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moonsub-kim
Copy link
Contributor Author

@regadas could you review it?
thx!

Copy link
Contributor

@regadas regadas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this @moonsub-kim; I'll have a better look at this later today/


// Fully qualified Java class name of the job.
ClassName *string `json:"className,omitempty"`

// Python file of the job.
Python *string `json:"python,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt of PythonFile inline with JarFile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems good to apply your suggestion

Copy link
Contributor

@regadas regadas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the looks good! @moonsub-kim Could you add an example to the examples folder?

@moonsub-kim
Copy link
Contributor Author

Due to flink does not provide an image including python, Users must be set up thier own dockerfile and upload to any docker registry, and use it in the FlinkCluster.
Instead upload my example that contains temporary Dockerfile, I will write an guide how the FlinkCluster executes to run python without beam server

In my case, the Dockerfile copied from Docker Setup - Enabling Python is uploaded into my docker-hub registry.
Then I copied config/samples/flinkoperator_v1beta_flinkjobcluster.yaml and changed some lines including .spec.image.name like below

apiVersion: flinkoperator.k8s.io/v1beta1
kind: FlinkCluster
metadata:
  name: flinkjobcluster-sample
spec:
  flinkVersion: "1.14"
  image:
    name: gos16052/pink-squirrel:0.1.0 # customized flink image with python
  jobManager:
    accessScope: Cluster
    ports:
      ui: 8081
    resources:
      limits:
        memory: "1024Mi"
        cpu: "200m"
  taskManager:
    replicas: 2
    resources:
      limits:
        memory: "1024Mi"
        cpu: "200m"
  job:
    pythonFile: "apps/example/word_count.py" # The image contains example python file
    parallelism: 2
    restartPolicy: "Never"
  flinkProperties:
    taskmanager.numberOfTaskSlots: "1"
    taskmanager.memory.flink.size: "560mb"
    taskmanager.memory.process.size: "1024mb"

Copy link
Contributor

@regadas regadas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@moonsub-kim I think it would be cool if we include support for these 2 use cases, wdyt?

    Run a PyFlink job which will reference Java UDF or external connectors. JAR file specified in --jarfile will be uploaded to the cluster.

$ ./bin/flink run \
      --python examples/python/table/word_count.py \
      --jarfile <jarFile>

    Run a PyFlink job with pyFiles and the main entry module specified in --pyModule:

$ ./bin/flink run \
      --pyModule table.word_count \
      --pyFiles examples/python/table

@moonsub-kim
Copy link
Contributor Author

It's good to support pyModule, pyFiles and python, jarfile as arguments. I will implement it.

@eddiewang
Copy link

would this work with Beam python jobs?

@moonsub-kim
Copy link
Contributor Author

moonsub-kim commented Dec 15, 2021

@eddiewang
Sorry I am not sure about your comment due to I have never used Beam before.
But, this PR is only extending argument requirements to support python executable. Maybe it will work well with Apache Beam.

@moonsub-kim
Copy link
Contributor Author

I have extended to support additional arguments and added the guide doc.
And I removed to support to pull remote file due to I'm not using images/flink/entrypoint.sh in the repository so I cannot test about it

@moonsub-kim
Copy link
Contributor Author

@regadas could you review this?

Copy link
Contributor

@regadas regadas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, it will be good to have remote file support though. I think we can tackle it on separate PR.

@moonsub-kim Thanks for this.

@regadas regadas changed the title Support python executable Initial support for python executable Dec 22, 2021
@regadas regadas merged commit 5d2cd73 into spotify:master Dec 22, 2021
wseaton pushed a commit to wseaton/flink-on-k8s-operator that referenced this pull request Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants