Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Katib Pipeline Step #77

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

mbu93
Copy link

@mbu93 mbu93 commented May 14, 2020

- Katib Component was added to the pipeline.py
- initially considering number of layers and units as tuning parameters
- train.py nlp.py and data.py have been adapted to be compatible with the new step
- cli has been extended

@sascha-bot
Copy link

sascha-bot commented May 14, 2020

Hey @mbu93 👋,

I predicted that this release note qualifies as kind/bug to 16.12%.

A release note with the kind/bug needs a prediction rate with at least 60%.

@sascha-bot
Copy link

Hi @mbu93. Thanks for your PR.

I'm waiting for a kubernetes-analysis member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sascha-bot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mbu93
To complete the pull request process, please assign saschagrunert
You can assign the PR to them by writing /assign @saschagrunert in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@saschagrunert
Copy link
Member

Thank you for the PR, please rebase your changes on top of the latest master branch.

@saschagrunert
Copy link
Member

/hold

Copy link
Member

@saschagrunert saschagrunert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check if the trigger plugin is correctly configured. This is the first time an external PR comes in.

@@ -201,6 +227,7 @@ def container(
file_outputs[k] = out
output_artifact_copy_args += dedent("""
mkdir -p {d}
echo copying outputs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
echo copying outputs

src/pipeline.py Outdated
Comment on lines 305 to 313
# ssh_key = "ssh-key"
# ctr.add_volume(
# k8s.V1Volume(name=ssh_key,
# secret=k8s.V1SecretVolumeSource(default_mode=0o600,
# secret_name=ssh_key)))
# ctr.container.add_volume_mount(
# k8s.V1VolumeMount(name=ssh_key,
# read_only=True,
# mount_path="/root/.ssh"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be needed to work on the master branch

@saschagrunert
Copy link
Member

/ok-to-test

@saschagrunert
Copy link
Member

Prow is not able to apply the patch, please rebase and squash your commits into a single one via git rebase -i master.

@saschagrunert
Copy link
Member

/ok-to-test

@saschagrunert
Copy link
Member

The pipeline within this commit is not up to date. Please run make pipeline and commit your changes.

@mbu93
Copy link
Author

mbu93 commented May 14, 2020

Should I take care of the linting errors as well? What standard do you use? Pylint pep8?

@sascha-bot
Copy link

sascha-bot commented May 14, 2020

@mbu93: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
lint be7d04a link /test lint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@saschagrunert
Copy link
Member

saschagrunert commented May 14, 2020

Should I take care of the linting errors as well? What standard do you use? Pylint pep8?

make lint executes the ci/lint script which runs with the provided requirements.txt, aka within the quay.io/saschagrunert/kubernetes-analysis:latest container image

@mbu93
Copy link
Author

mbu93 commented May 14, 2020

Should I take care of the linting errors as well? What standard do you use? Pylint pep8?

make lint executes the ci/lint script which runs with the provided requirements.txt, aka within the quay.io/saschagrunert/kubernetes-analysis:latest container image

Awesome!

@saschagrunert
Copy link
Member

saschagrunert commented May 14, 2020

The workflow is marked as unschedulable because the PVC does not exist:
screenshot

@saschagrunert
Copy link
Member

/wip

@saschagrunert
Copy link
Member

/hold cancel

@saschagrunert
Copy link
Member

/hold

def create(self):
self.op = dedent("""
rm -rf {outdir}/{repo}
git clone https://github.com/saschagrunert/{repo} {outdir}/{repo}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend to speed up the clone with --depth=1 and just apply the patch (see pipeline.py)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have to clone into a unique directory on the PV to make parallel runs possible.

rm -rf {outdir}/{repo}
git clone https://github.com/saschagrunert/{repo} {outdir}/{repo}
python /ml/launch_experiment.py \
--name {} \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can still indent here.

self.output = output
self.name = name
self.image = image
self.controller_img = "mbu93/katib-launcher:latest"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please commit the container file (Dockerfile) into the PR as well as the source code of the ml_runner.py.

def trial_template(name, image, replicas, command, mount_path, pr=None):
target = "pull/{pr}/head:{pr}".format(pr=pr) if pr else "master"
revision = pr if pr else "master"
args = ["cd data/kubernetes-analysis ;",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to && here instead of ; to fail early.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend to combine the clone and checkout to have them in one place from a source code perspective.

@@ -85,6 +97,19 @@ def __run(pr: str = "", commit: str = ""):
update_data.after(update_api)
data = update_data_outputs["data"]

katib_op = KatibOp(image=Pipeline.IMAGE, name="katib-experiment",
output=Pipeline.PV_DIR, repo="kubernetes-analysis", pr="")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use Pipeline.REPO for repo and pass the pr

)
katib.add_pvolumes({Pipeline.PV_DIR: PipelineVolume(pvc="pipeline-pv")})
katib.after(update_data)
tune_params = func_to_container_op(to_args)(katib.outputs["params"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a TODO comment here that we have to find a better solution for this.

layers, units)
accuracy, _ = self.__train(
layers=layers, units=units, epochs=epochs, learning_rate=learning_rate)
print(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the logger here.

@@ -9,6 +10,9 @@
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_selection import SelectKBest, f_classif

TuneParams = namedtuple("TuneParams", ["layers", "units"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move those into the Nlp class?

for _ in range(layers - 1):
model.add(tf.keras.layers.Dense(units=units, activation="relu"))
for scale in range(1, units // 2):
model.add(tf.keras.layers.Dense(units=units // (2**scale), activation="relu"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this change?

@saschagrunert
Copy link
Member

Please fixup the linter as well. 😀 Give me a hint if we can go for a retest, I’ll create the pvc in the meanwhile.

@saschagrunert
Copy link
Member

Beside that, let’s cleanup the PV after the Katib run (deletion of the repository) that it does not endlessly fill-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants