[Sample] update XGBoost sample #2220

numerology · 2019-09-24T17:53:47Z

Update xgboost sample to adopt new GCP components.
TODO:

wait for updated create_cluster component.
~~add back visualization.~~ Plan to do in future PR. This requires changes to the components.

This change is

…enew-xgboost-sample # Conflicts: # samples/core/xgboost_training_cm/xgboost_training_cm.py

numerology · 2019-09-24T18:02:33Z

cc @gaoning777
One nit Q: Is there anyway to specify/modify the component name after loading the component from url? Now there are four steps in the pipeline are named as 'dataproc_submit_(py)spark_job', which is not very informative.

numerology · 2019-09-25T23:14:29Z

/test kubeflow-pipeline-sample-test

…enew-xgboost-sample

gaoning777 · 2019-10-10T06:58:00Z

cc @gaoning777
One nit Q: Is there anyway to specify/modify the component name after loading the component from url? Now there are four steps in the pipeline are named as 'dataproc_submit_(py)spark_job', which is not very informative.

@Ark-kun do you know if there is a way to change the name in the task factory?

…enew-xgboost-sample # Conflicts: # samples/core/xgboost_training_cm/xgboost_training_cm.py

Ark-kun · 2019-10-11T01:58:32Z

@numerology The updated gcp/dataproc/create_cluster component: https://raw.githubusercontent.com/kubeflow/pipelines/677fbaa281125fd604b81eab2488513efee7b600/components/gcp/dataproc/create_cluster/component.yaml

Ark-kun · 2019-10-11T02:02:09Z

Is there anyway to specify/modify the component name after loading the component from url? Now there are four steps in the pipeline are named as 'dataproc_submit_(py)spark_job', which is not very informative.

@Ark-kun do you know if there is a way to change the name in the task factory?

Yes! You can use set_display_name:

my_task = my_op(...).set_display_name('Custom name')

Ark-kun · 2019-10-11T02:05:40Z

samples/core/xgboost_training_cm/xgboost_training_cm.py

+            schema=schema,
+            train_data=train_data,
+            output=output_template
+        ).after(create_cluster_op).apply(gcp.use_gcp_secret('user-gcp-sa'))


It might be nicer to apply the secrets to all ops in one line in the end of the pipeline function:

kfp.dsl.get_pipeline_conf().add_op_transformer(use_gcp_secret('user-gcp-sa'))

samples/core/xgboost_training_cm/xgboost_training_cm.py

Ark-kun · 2019-10-11T02:09:49Z

samples/core/xgboost_training_cm/xgboost_training_cm.py

+            target=target,
+            analysis=analyze_output,
+            output=output_template
+        ).after(analyze_op).apply(gcp.use_gcp_secret('user-gcp-sa'))


Manual execution order control (.after) is usually not needed as the task dependencies should normally be data dependencies. It can be a sign of component or pipeline design issues.

Agree. But currently we're lacking a way to output arbitrary artifacts that can be consumed by downstream for those GCP components. This is just a workaround.

Ark-kun · 2019-10-11T02:12:05Z

samples/core/xgboost_training_cm/xgboost_training_cm.py

-            analyze_op.output,
-            output_template
-        ).apply(gcp.use_gcp_secret('user-gcp-sa'))
+            project=project,


This is a good change. Explicitly named arguments makes the pipeline robust against the signature changes.

numerology · 2019-10-11T21:25:00Z

/hold cancel

gaoning777 · 2019-10-14T02:44:30Z

samples/core/xgboost_training_cm/xgboost_training_cm.py

    region='us-central1',
    train_data='gs://ml-pipeline-playground/sfpd/train.csv',
    eval_data='gs://ml-pipeline-playground/sfpd/eval.csv',
    schema='gs://ml-pipeline-playground/sfpd/schema.json',
    target='resolution',
-    rounds=200,
+    rounds=5,


If this change is for fast testing, could you set it in the test config?

gaoning777 · 2019-10-14T02:59:20Z

/lgtm
/approve

numerology · 2019-10-14T03:00:55Z

Memo: since the new Dataproc components no longer support custom output, visualization components like confusion matrix and ROC are no longer available. See #2177

gaoning777 · 2019-10-14T03:04:44Z

/hold
Moving the public dataset.

numerology · 2019-10-14T15:42:10Z

/hold
Moving the public dataset.

Done.

numerology · 2019-10-14T16:35:24Z

/retest

numerology · 2019-10-14T22:14:58Z

/hold
Moving the public dataset.

Just remember another thing todo is to make sure it's compatible with the post-submit tests as well.

k8s-ci-robot · 2019-10-15T00:42:28Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: gaoning777

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~samples/OWNERS~~ [gaoning777]
~~test/OWNERS~~ [gaoning777]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gaoning777 · 2019-10-15T05:58:27Z

/hold cancel

gaoning777 · 2019-10-15T05:58:47Z

/lgtm

numerology · 2019-10-15T15:03:00Z

Auto-merging does not seem to work. Manually merged

Ark-kun · 2020-10-12T11:08:18Z

samples/core/xgboost_training_cm/xgboost_training_cm.py

+  """
+
+  # Remove existing [output]/train and [output]/eval if they exist.
+  delete_directory_from_gcs(os.path.join(output, 'train'))


Are you sure this is going to work?

delete_directory_from_gcs is not a component, so it won't be executed as part of the pipeline.

numerology added 11 commits September 18, 2019 17:47

init.

2d071e6

add delete op

962526a

Add new transform and analyze op

68eb22d

Save.

6816f9b

Partial work

ddd1888

partial work

383d2f8

WIP

775335f

Clean up code.

1bc0664

Disable xgboost cm check for now.

991daf1

Merge branch 'master' of https://github.com/kubeflow/pipelines into r…

d7dc3d1

…enew-xgboost-sample # Conflicts: # samples/core/xgboost_training_cm/xgboost_training_cm.py

Update commit SHA

fbc06d6

k8s-ci-robot added the do-not-merge/work-in-progress label Sep 24, 2019

k8s-ci-robot requested review from gaoning777 and hongye-sun September 24, 2019 17:53

k8s-ci-robot added the size/L label Sep 24, 2019

numerology added 2 commits September 24, 2019 11:56

Update kw arg

402ce97

Correct the url format.

2c76650

numerology added 3 commits September 30, 2019 10:26

Merge branch 'master' of https://github.com/kubeflow/pipelines into r…

f4ab88e

…enew-xgboost-sample

Switch to gcp component for create cluster.

72b25d8

add secret

76858d9

Merge branch 'master' of https://github.com/kubeflow/pipelines into r…

849b9dd

…enew-xgboost-sample # Conflicts: # samples/core/xgboost_training_cm/xgboost_training_cm.py

Ark-kun reviewed Oct 11, 2019

View reviewed changes

samples/core/xgboost_training_cm/xgboost_training_cm.py Show resolved Hide resolved

Ark-kun reviewed Oct 11, 2019

View reviewed changes

doc

d2a56dc

gaoning777 reviewed Oct 14, 2019

View reviewed changes

k8s-ci-robot assigned gaoning777 Oct 14, 2019

k8s-ci-robot added lgtm approved labels Oct 14, 2019

k8s-ci-robot added the do-not-merge/hold label Oct 14, 2019

Solve

ce3adab

k8s-ci-robot removed the lgtm label Oct 14, 2019

numerology mentioned this pull request Oct 14, 2019

[Sample] Add back visualization in XGBoost sample #2384

Merged

3 tasks

numerology added 2 commits October 14, 2019 15:31

update sample test launcher.

8ad6d64

Fix component test yaml

6098ddd

k8s-ci-robot added size/XL and removed size/L labels Oct 15, 2019

k8s-ci-robot removed the do-not-merge/hold label Oct 15, 2019

k8s-ci-robot added the lgtm label Oct 15, 2019

numerology merged commit dbac974 into kubeflow:master Oct 15, 2019

numerology deleted the renew-xgboost-sample branch October 15, 2019 15:03

numerology mentioned this pull request Oct 18, 2019

Convert the XGBoost sample to use the new GCP components #1913

Closed

Ark-kun reviewed Oct 12, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Sample] update XGBoost sample #2220

[Sample] update XGBoost sample #2220

numerology commented Sep 24, 2019 •

edited

Loading

numerology commented Sep 24, 2019

numerology commented Sep 25, 2019

gaoning777 commented Oct 10, 2019

Ark-kun commented Oct 11, 2019

Ark-kun commented Oct 11, 2019

Ark-kun Oct 11, 2019 •

edited

Loading

gaoning777 Oct 11, 2019

Ark-kun Oct 11, 2019

numerology Oct 11, 2019

Ark-kun Oct 11, 2019

numerology commented Oct 11, 2019

gaoning777 Oct 14, 2019 •

edited

Loading

gaoning777 commented Oct 14, 2019

numerology commented Oct 14, 2019

gaoning777 commented Oct 14, 2019

numerology commented Oct 14, 2019

numerology commented Oct 14, 2019

numerology commented Oct 14, 2019

k8s-ci-robot commented Oct 15, 2019

gaoning777 commented Oct 15, 2019

gaoning777 commented Oct 15, 2019

numerology commented Oct 15, 2019

Ark-kun Oct 12, 2020

[Sample] update XGBoost sample #2220

[Sample] update XGBoost sample #2220

Conversation

numerology commented Sep 24, 2019 • edited Loading

numerology commented Sep 24, 2019

numerology commented Sep 25, 2019

gaoning777 commented Oct 10, 2019

Ark-kun commented Oct 11, 2019

Ark-kun commented Oct 11, 2019

Ark-kun Oct 11, 2019 • edited Loading

Choose a reason for hiding this comment

gaoning777 Oct 11, 2019

Choose a reason for hiding this comment

Ark-kun Oct 11, 2019

Choose a reason for hiding this comment

numerology Oct 11, 2019

Choose a reason for hiding this comment

Ark-kun Oct 11, 2019

Choose a reason for hiding this comment

numerology commented Oct 11, 2019

gaoning777 Oct 14, 2019 • edited Loading

Choose a reason for hiding this comment

gaoning777 commented Oct 14, 2019

numerology commented Oct 14, 2019

gaoning777 commented Oct 14, 2019

numerology commented Oct 14, 2019

numerology commented Oct 14, 2019

numerology commented Oct 14, 2019

k8s-ci-robot commented Oct 15, 2019

gaoning777 commented Oct 15, 2019

gaoning777 commented Oct 15, 2019

numerology commented Oct 15, 2019

Ark-kun Oct 12, 2020

Choose a reason for hiding this comment

numerology commented Sep 24, 2019 •

edited

Loading

Ark-kun Oct 11, 2019 •

edited

Loading

gaoning777 Oct 14, 2019 •

edited

Loading