Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update wordcount example to use docker-flink image #135 #136

Merged
merged 1 commit into from
Nov 27, 2019

Conversation

tweise
Copy link
Contributor

@tweise tweise commented Nov 25, 2019

No description provided.

flinkVersion: "1.8"
jarName: "wordcount-operator-example-1.0.0-SNAPSHOT.jar"
parallelism: 3
parallelism: 2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert to 3 before merging.

@@ -7,25 +7,30 @@ metadata:
labels:
environment: development
spec:
image: docker.io/lyft/wordcount-operator-example:{sha}
#image: docker.io/lyft/wordcount-operator-example:{sha}
image: wordcount-operator-example
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert after publishing container.

elif [ "$COMMAND" = "local" ]; then
echo "Starting local cluster"
exec $(drop_privs_cmd) "$FLINK_HOME/bin/jobmanager.sh" start-foreground local
#export FLINK_PROPERTIES=`echo "${OPERATOR_FLINK_CONFIG}" | envsubst`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to propose the addition of envsubst in docker-flink as follow-up.

@tweise
Copy link
Contributor Author

tweise commented Nov 25, 2019

@anandswaminathan below the output from running the example locally:

I also noticed that after the job is finished, deleting the application fails due to attempt to take a savepoint. That should probably be fixed separately.

$ kubectl describe flinkapp wordcount-operator-example -n flink-operator
Name:         wordcount-operator-example
Namespace:    flink-operator
Labels:       environment=development
Annotations:  <none>
API Version:  flink.k8s.io/v1beta1
Kind:         FlinkApplication
Metadata:
  Creation Timestamp:  2019-11-25T23:35:21Z
  Finalizers:
    job.finalizers.flink.k8s.io
  Generation:        2
  Resource Version:  60240
  Self Link:         /apis/flink.k8s.io/v1beta1/namespaces/flink-operator/flinkapplications/wordcount-operator-example
  UID:               3f260ad6-0fdc-11ea-9d76-025000000001
Spec:
  Delete Mode:  None
  Entry Class:  org.apache.flink.WordCount
  Flink Config:
    State . Backend . Fs . Checkpointdir:       file:///checkpoints/flink/checkpoints
    State . Checkpoints . Dir:                  file:///checkpoints/flink/externalized-checkpoints
    State . Savepoints . Dir:                   file:///checkpoints/flink/savepoints
    Taskmanager . Heap . Size:                  200
    Taskmanager . Network . Memory . Fraction:  0.1
    Taskmanager . Network . Memory . Min:       10m
    Web . Upload . Dir:                         /opt/flink
  Flink Version:                                1.8
  Force Rollback:                               false
  Image:                                        wordcount-operator-example
  Jar Name:                                     wordcount-operator-example-1.0.0-SNAPSHOT.jar
  Job Manager Config:
    Env Config:
    Replicas:  1
    Resources:
      Requests:
        Cpu:      100m
        Memory:   200Mi
  Parallelism:    2
  Restart Nonce:  
  Savepoint Info:
  Task Manager Config:
    Env Config:
    Resources:
      Requests:
        Cpu:     100m
        Memory:  200Mi
    Task Slots:  2
Status:
  Cluster Status:
    Available Task Slots:     2
    Health:                   Green
    Healthy Task Managers:    1
    Number Of Task Managers:  1
    Number Of Task Slots:     2
  Deploy Hash:                37c74f2b
  Failed Deploy Hash:         
  Job Status:
    Completed Checkpoint Count:  0
    Entry Class:                 org.apache.flink.WordCount
    Failed Checkpoint Count:     0
    Health:                      Green
    Jar Name:                    wordcount-operator-example-1.0.0-SNAPSHOT.jar
    Job ID:                      576550d82c73f1c3d448820f53c713e0
    Job Restart Count:           0
    Last Checkpoint Time:        <nil>
    Last Failing Time:           <nil>
    Parallelism:                 2
    Restore Path:                
    Restore Time:                <nil>
    Start Time:                  2019-11-25T23:36:02Z
    State:                       FINISHED
  Last Seen Error:               <nil>
  Last Updated At:               2019-11-25T23:36:31Z
  Phase:                         Running
  Retry Count:                   0
  Rollback Hash:                 
Events:
  Type    Reason           Age   From              Message
  ----    ------           ----  ----              -------
  Normal  CreatingCluster  119s  flinkK8sOperator  Creating Flink cluster for deploy 37c74f2b
  Normal  JobSubmitted     79s   flinkK8sOperator  Flink job submitted to cluster with id 576550d82c73f1c3d448820f53c713e0

@anandswaminathan
Copy link
Contributor

@tweise Looks good. Can you also log into the job/taskmanager (You can exec -it bash) and confirm that the config values are set correctly as desired.

@tweise
Copy link
Contributor Author

tweise commented Nov 26, 2019

@anandswaminathan here the appended config:

blob.server.port: 6124
query.server.port: 6125
blob.server.port: 6125
jobmanager.heap.size: 102400k
jobmanager.rpc.port: 6123
jobmanager.web.port: 8081
metrics.internal.query-service.port: 50101
query.server.port: 6124
state.backend.fs.checkpointdir: file:///checkpoints/flink/checkpoints
state.checkpoints.dir: file:///checkpoints/flink/externalized-checkpoints
state.savepoints.dir: file:///checkpoints/flink/savepoints
taskmanager.heap.size: 102400k
taskmanager.network.memory.fraction: 0.1
taskmanager.network.memory.min: 10m
taskmanager.numberOfTaskSlots: 2
web.upload.dir: /opt/flink

jobmanager.rpc.address: wordcount-operator-example-37c74f2b
taskmanager.host: 10.1.0.127

Copy link
Contributor

@anandswaminathan anandswaminathan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once test passes

@tweise tweise closed this Nov 26, 2019
@tweise tweise reopened this Nov 26, 2019
@@ -8,23 +8,27 @@ metadata:
environment: development
spec:
image: docker.io/lyft/wordcount-operator-example:{sha}
deleteMode: None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The job finishes and the operator tries to take a savepoint. That's a bug that needs to be fixed separately.

For the demo, the application should just be deleted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it has a bounded input. We should probably fix that, as the operator doesn't (intentionally) support jobs that complete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't see why the operator shouldn't support such jobs. Created #138 to follow up on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I mean any support currently in the operator is accidental; we haven't tested this use case or thought deeply about how to support it. Would be good to have better support but it's not a use case internally.

@tweise
Copy link
Contributor Author

tweise commented Nov 27, 2019

FYI, describe still does not present the URL, even when the job is RUNNING and I can access it via http://localhost:8001/api/v1/namespaces/flink-operator/services/wordcount-operator-example:8081/proxy/#/overview

$ kubectl describe flinkapp wordcount-operator-example -n flink-operator
Name:         wordcount-operator-example
Namespace:    flink-operator
Labels:       environment=development
Annotations:  <none>
API Version:  flink.k8s.io/v1beta1
Kind:         FlinkApplication
Metadata:
  Creation Timestamp:  2019-11-27T16:45:41Z
  Finalizers:
    job.finalizers.flink.k8s.io
  Generation:        2
  Resource Version:  107606
  Self Link:         /apis/flink.k8s.io/v1beta1/namespaces/flink-operator/flinkapplications/wordcount-operator-example
  UID:               58deaadd-1135-11ea-9d76-025000000001
Spec:
  Delete Mode:  None
  Entry Class:  org.apache.flink.WordCount
  Flink Config:
    State . Backend . Fs . Checkpointdir:       file:///checkpoints/flink/checkpoints
    State . Checkpoints . Dir:                  file:///checkpoints/flink/externalized-checkpoints
    State . Savepoints . Dir:                   file:///checkpoints/flink/savepoints
    Taskmanager . Heap . Size:                  200
    Taskmanager . Network . Memory . Fraction:  0.1
    Taskmanager . Network . Memory . Min:       10m
    Web . Upload . Dir:                         /opt/flink
  Flink Version:                                1.8
  Force Rollback:                               false
  Image:                                        wordcount-operator-example
  Jar Name:                                     wordcount-operator-example-1.0.0-SNAPSHOT.jar
  Job Manager Config:
    Env Config:
    Replicas:  1
    Resources:
      Requests:
        Cpu:      100m
        Memory:   200Mi
  Parallelism:    2
  Restart Nonce:  
  Savepoint Info:
  Task Manager Config:
    Env Config:
    Resources:
      Requests:
        Cpu:     100m
        Memory:  200Mi
    Task Slots:  2
Status:
  Cluster Status:
    Available Task Slots:     0
    Health:                   Yellow
    Healthy Task Managers:    0
    Number Of Task Managers:  1
    Number Of Task Slots:     2
  Deploy Hash:                37c74f2b
  Failed Deploy Hash:         
  Job Status:
    Completed Checkpoint Count:  0
    Entry Class:                 org.apache.flink.WordCount
    Failed Checkpoint Count:     0
    Health:                      Green
    Jar Name:                    wordcount-operator-example-1.0.0-SNAPSHOT.jar
    Job ID:                      f73eae36a2f86444aecfd82f327a6f48
    Job Restart Count:           0
    Last Checkpoint Time:        <nil>
    Last Failing Time:           <nil>
    Parallelism:                 2
    Restore Path:                
    Restore Time:                <nil>
    Start Time:                  2019-11-27T16:46:04Z
    State:                       RUNNING
  Last Seen Error:               <nil>
  Last Updated At:               2019-11-27T16:46:05Z
  Phase:                         Running
  Retry Count:                   0
  Rollback Hash:                 
Events:
  Type    Reason           Age   From              Message
  ----    ------           ----  ----              -------
  Normal  CreatingCluster  36s   flinkK8sOperator  Creating Flink cluster for deploy 37c74f2b
  Normal  JobSubmitted     13s   flinkK8sOperator  Flink job submitted to cluster with id f73eae36a2f86444aecfd82f327a6f48

@tweise tweise merged commit c2158e7 into lyft:master Nov 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants