Add Katib Pipeline Step #77

mbu93 · 2020-05-14T07:23:46Z

- Katib Component was added to the pipeline.py
- initially considering number of layers and units as tuning parameters
- train.py nlp.py and data.py have been adapted to be compatible with the new step
- cli has been extended

sascha-bot · 2020-05-14T07:23:50Z

Hey @mbu93 👋,

I predicted that this release note qualifies as kind/bug to 16.12%.

A release note with the kind/bug needs a prediction rate with at least 60%.

sascha-bot · 2020-05-14T07:24:01Z

Hi @mbu93. Thanks for your PR.

I'm waiting for a kubernetes-analysis member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sascha-bot · 2020-05-14T07:24:35Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mbu93
To complete the pull request process, please assign saschagrunert
You can assign the PR to them by writing /assign @saschagrunert in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

saschagrunert · 2020-05-14T07:27:56Z

Thank you for the PR, please rebase your changes on top of the latest master branch.

saschagrunert · 2020-05-14T07:28:08Z

/hold

saschagrunert

Let me check if the trigger plugin is correctly configured. This is the first time an external PR comes in.

saschagrunert · 2020-05-14T07:39:54Z

src/pipeline.py

@@ -201,6 +227,7 @@ def container(
                file_outputs[k] = out
                output_artifact_copy_args += dedent("""
                    mkdir -p {d}
+                    echo copying outputs


Suggested change

echo copying outputs

saschagrunert · 2020-05-14T07:40:40Z

src/pipeline.py

+        # ssh_key = "ssh-key"
+        # ctr.add_volume(
+        #     k8s.V1Volume(name=ssh_key,
+        #                  secret=k8s.V1SecretVolumeSource(default_mode=0o600,
+        #                                                  secret_name=ssh_key)))
+        # ctr.container.add_volume_mount(
+        #     k8s.V1VolumeMount(name=ssh_key,
+        #                       read_only=True,
+        #                       mount_path="/root/.ssh"))


This will be needed to work on the master branch

saschagrunert · 2020-05-14T07:41:40Z

/ok-to-test

saschagrunert · 2020-05-14T07:44:06Z

Prow is not able to apply the patch, please rebase and squash your commits into a single one via git rebase -i master.

saschagrunert · 2020-05-14T13:36:47Z

/ok-to-test

saschagrunert · 2020-05-14T13:38:57Z

The pipeline within this commit is not up to date. Please run make pipeline and commit your changes.

mbu93 · 2020-05-14T13:41:53Z

Should I take care of the linting errors as well? What standard do you use? Pylint pep8?

sascha-bot · 2020-05-14T13:45:37Z

@mbu93: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
lint	`be7d04a`	link	`/test lint`

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

saschagrunert · 2020-05-14T13:45:59Z

Should I take care of the linting errors as well? What standard do you use? Pylint pep8?

make lint executes the ci/lint script which runs with the provided requirements.txt, aka within the quay.io/saschagrunert/kubernetes-analysis:latest container image

mbu93 · 2020-05-14T13:48:22Z

Should I take care of the linting errors as well? What standard do you use? Pylint pep8?

make lint executes the ci/lint script which runs with the provided requirements.txt, aka within the quay.io/saschagrunert/kubernetes-analysis:latest container image

Awesome!

saschagrunert · 2020-05-14T13:57:04Z

The workflow is marked as unschedulable because the PVC does not exist:

data/pipeline.yaml

saschagrunert · 2020-05-15T18:19:58Z

/wip

saschagrunert · 2020-05-15T18:20:37Z

/hold cancel

saschagrunert · 2020-05-15T18:20:45Z

/hold

saschagrunert · 2020-05-16T09:35:20Z

src/katib.py

+    def create(self):
+        self.op = dedent("""
+        rm -rf {outdir}/{repo}
+        git clone https://github.com/saschagrunert/{repo} {outdir}/{repo}


I recommend to speed up the clone with --depth=1 and just apply the patch (see pipeline.py)

We also have to clone into a unique directory on the PV to make parallel runs possible.

saschagrunert · 2020-05-16T09:36:18Z

src/katib.py

+        rm -rf {outdir}/{repo}
+        git clone https://github.com/saschagrunert/{repo} {outdir}/{repo}
+        python /ml/launch_experiment.py \
+--name {} \


I think you can still indent here.

saschagrunert · 2020-05-16T09:37:23Z

src/katib.py

+        self.output = output
+        self.name = name
+        self.image = image
+        self.controller_img = "mbu93/katib-launcher:latest"


Please commit the container file (Dockerfile) into the PR as well as the source code of the ml_runner.py.

saschagrunert · 2020-05-16T09:39:31Z

src/katib.py

+def trial_template(name, image, replicas, command, mount_path, pr=None):
+    target = "pull/{pr}/head:{pr}".format(pr=pr) if pr else "master"
+    revision = pr if pr else "master"
+    args = ["cd data/kubernetes-analysis ;",


You have to && here instead of ; to fail early.

I recommend to combine the clone and checkout to have them in one place from a source code perspective.

saschagrunert · 2020-05-16T09:44:25Z

src/pipeline.py

@@ -85,6 +97,19 @@ def __run(pr: str = "", commit: str = ""):
        update_data.after(update_api)
        data = update_data_outputs["data"]

+        katib_op = KatibOp(image=Pipeline.IMAGE, name="katib-experiment",
+                           output=Pipeline.PV_DIR, repo="kubernetes-analysis", pr="")


Please use Pipeline.REPO for repo and pass the pr

saschagrunert · 2020-05-16T09:46:46Z

src/pipeline.py

+        )
+        katib.add_pvolumes({Pipeline.PV_DIR: PipelineVolume(pvc="pipeline-pv")})
+        katib.after(update_data)
+        tune_params = func_to_container_op(to_args)(katib.outputs["params"])


Please add a TODO comment here that we have to find a better solution for this.

saschagrunert · 2020-05-16T09:47:47Z

src/nlp.py

-                            layers, units)
+                accuracy, _ = self.__train(
+                    layers=layers, units=units, epochs=epochs, learning_rate=learning_rate)
+                print(


Please use the logger here.

saschagrunert · 2020-05-16T09:53:13Z

src/nlp.py

@@ -9,6 +10,9 @@
 from sklearn.feature_extraction.text import TfidfVectorizer
 from sklearn.feature_selection import SelectKBest, f_classif

+TuneParams = namedtuple("TuneParams", ["layers", "units"])


Can we move those into the Nlp class?

saschagrunert · 2020-05-16T09:54:55Z

src/nlp.py

-        for _ in range(layers - 1):
-            model.add(tf.keras.layers.Dense(units=units, activation="relu"))
+        for scale in range(1, units // 2):
+            model.add(tf.keras.layers.Dense(units=units // (2**scale), activation="relu"))


Why do we need this change?

saschagrunert · 2020-05-16T09:56:26Z

Please fixup the linter as well. 😀 Give me a hint if we can go for a retest, I’ll create the pvc in the meanwhile.

saschagrunert · 2020-05-16T09:58:19Z

Beside that, let’s cleanup the PV after the Katib run (deletion of the repository) that it does not endlessly fill-up.

sascha-bot added the release-note label May 14, 2020

sascha-bot added the needs-ok-to-test label May 14, 2020

sascha-bot added the size/L label May 14, 2020

sascha-bot added the do-not-merge/hold label May 14, 2020

saschagrunert reviewed May 14, 2020

View reviewed changes

sascha-bot added ok-to-test and removed needs-ok-to-test labels May 14, 2020

saschagrunert added needs-ok-to-test and removed ok-to-test labels May 14, 2020

mbu93 force-pushed the master branch 2 times, most recently from d367d02 to cbad958 Compare May 14, 2020 13:34

sascha-bot added ok-to-test and removed needs-ok-to-test labels May 14, 2020

Add all the required code to integrate katib into the pipeline

be7d04a

mbu93 force-pushed the master branch from cbad958 to be7d04a Compare May 14, 2020 13:44

saschagrunert reviewed May 14, 2020

View reviewed changes

data/pipeline.yaml Show resolved Hide resolved

saschagrunert added needs-ok-to-test and removed ok-to-test labels May 14, 2020

sascha-bot removed the do-not-merge/hold label May 15, 2020

sascha-bot added the do-not-merge/hold label May 15, 2020

saschagrunert reviewed May 16, 2020

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Katib Pipeline Step #77

Add Katib Pipeline Step #77

mbu93 commented May 14, 2020

sascha-bot commented May 14, 2020 •

edited

Loading

sascha-bot commented May 14, 2020

sascha-bot commented May 14, 2020

saschagrunert commented May 14, 2020

saschagrunert commented May 14, 2020

saschagrunert left a comment

saschagrunert May 14, 2020

saschagrunert May 14, 2020

saschagrunert commented May 14, 2020

saschagrunert commented May 14, 2020

saschagrunert commented May 14, 2020

saschagrunert commented May 14, 2020

mbu93 commented May 14, 2020

sascha-bot commented May 14, 2020 •

edited

Loading

saschagrunert commented May 14, 2020 •

edited

Loading

mbu93 commented May 14, 2020

saschagrunert commented May 14, 2020 •

edited

Loading

saschagrunert commented May 15, 2020

saschagrunert commented May 15, 2020

saschagrunert commented May 15, 2020

saschagrunert May 16, 2020

saschagrunert May 16, 2020

saschagrunert May 16, 2020

saschagrunert May 16, 2020

saschagrunert May 16, 2020

saschagrunert May 16, 2020

saschagrunert May 16, 2020

saschagrunert May 16, 2020

saschagrunert May 16, 2020

saschagrunert May 16, 2020

saschagrunert May 16, 2020

saschagrunert commented May 16, 2020

saschagrunert commented May 16, 2020

Add Katib Pipeline Step #77

Are you sure you want to change the base?

Add Katib Pipeline Step #77

Conversation

mbu93 commented May 14, 2020

sascha-bot commented May 14, 2020 • edited Loading

sascha-bot commented May 14, 2020

sascha-bot commented May 14, 2020

saschagrunert commented May 14, 2020

saschagrunert commented May 14, 2020

saschagrunert left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saschagrunert commented May 14, 2020

saschagrunert commented May 14, 2020

saschagrunert commented May 14, 2020

saschagrunert commented May 14, 2020

mbu93 commented May 14, 2020

sascha-bot commented May 14, 2020 • edited Loading

saschagrunert commented May 14, 2020 • edited Loading

mbu93 commented May 14, 2020

saschagrunert commented May 14, 2020 • edited Loading

saschagrunert commented May 15, 2020

saschagrunert commented May 15, 2020

saschagrunert commented May 15, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saschagrunert commented May 16, 2020

saschagrunert commented May 16, 2020

sascha-bot commented May 14, 2020 •

edited

Loading

sascha-bot commented May 14, 2020 •

edited

Loading

saschagrunert commented May 14, 2020 •

edited

Loading

saschagrunert commented May 14, 2020 •

edited

Loading