Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm chart for Flyte #550

Closed
wants to merge 5 commits into from

Conversation

rstanevich
Copy link
Contributor

This PR contains Helm chart for Flyte with sandbox and EKS configurations.

The configuration for sandbox (values-sandbox.yaml) is ready for deploying in Minikube. But EKS config (values-eks.yaml) should be edited before installation in the cloud: s3 bucket, RDS hosts, iam roles, secrets and etc need to be configured and modified.

@kumare3 kumare3 linked an issue Oct 15, 2020 that may be closed by this pull request
23 tasks
@kumare3
Copy link
Contributor

kumare3 commented Jan 8, 2021

@rstanevich i have been looking at Helm now and I am liking it. I will review your PR and we can build on it I feel. One of the problem seems to be how to use from remote configurations like - pytorch operator etc

@@ -0,0 +1,136 @@
{{- if .Values.contour.enabled }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on EKS now you can use alb

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, yes I've just found this announce https://aws.amazon.com/blogs/aws/new-application-load-balancer-support-for-end-to-end-http-2-and-grpc/
looks like for now it is possible, also it requires aws-load-balancer-controller 2.0+ installed in kubernetes.
thank you!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya I did set it up already on a personal
Account and works really well. I will be updating the eks manifests

@sbrunk
Copy link
Member

sbrunk commented Jan 25, 2021

Since we're stuck with Helm for the time being we'd like to contribute here. We could help test the chart and perhaps work on the GKE config.

@rstanevich
Copy link
Contributor Author

So, I still didn't try new feature of AWS ALB with gRPC support. For provisioning it in K8s it requires new AWS loadbalancer controller for Kubernetes. I need some time to setup own devbox with new controller for testing this stuff.

@kumare3
Copy link
Contributor

kumare3 commented Feb 7, 2021

Since we're stuck with Helm for the time being we'd like to contribute here. We could help test the chart and perhaps work on the GKE config.

@sbrunk i would love to help you with some testing as well

@sbrunk
Copy link
Member

sbrunk commented Feb 8, 2021

@kumare3 @rstanevich what do you think about an approach that minimizes the diff between kustomize output and helm template output first for core Flyte w.o. dependencies, and then iterate from there?

That way we can make sure the helm installation is on par with what we have right now and it could also provide a smoother upgrade path.

My first crude try looks like this:

gh pr checkout 550
kustomize build kustomize/base/single_cluster/complete > base_deployment.yaml
kubectl apply -f base_deployment.yaml
helm template . -f values-sandbox.yaml | kubectl diff -f - 

Then incrementally work through the errors (and the diff later), change the helm chart accordingly to minimize the diff and run helm template again.

A slightly better approach could be using a structural diff of the rendered yaml output. That's because kubectl diff will check some API constraints (immutable fields etc.) that can slow us down here. I just havn't tried that yet because I couldn't find good tooling on first sight.

@rstanevich
Copy link
Contributor Author

@kumare3 @rstanevich what do you think about an approach that minimizes the diff between kustomize output and helm template output first for core Flyte w.o. dependencies, and then iterate from there?

That way we can make sure the helm installation is on par with what we have right now and it could also provide a smoother upgrade path.

My first crude try looks like this:

gh pr checkout 550
kustomize build kustomize/base/single_cluster/complete > base_deployment.yaml
kubectl apply -f base_deployment.yaml
helm template . -f values-sandbox.yaml | kubectl diff -f - 

Then incrementally work through the errors (and the diff later), change the helm chart accordingly to minimize the diff and run helm template again.

A slightly better approach could be using a structural diff of the rendered yaml output. That's because kubectl diff will check some API constraints (immutable fields etc.) that can slow us down here. I just havn't tried that yet because I couldn't find good tooling on first sight.

@sbrunk, do you mean we just need to compare the generated helm manifest and flyte_generated.yaml? Do we need run this once for this PR or some script to check it regularly? So, at first glance, I see one evident problem:

  • ConfigMap generated by Kustomize has a hash suffix in the name, but helm does not. So diff for ConfigMap won't work.

If the main goal is just to check smooth update from kustomize to helm installation I can check it out.

And an obvious note: If we'd like using helm install (I don't like this option :) ) - we cannot override existent k8s resources, we'll get smth like resource already exists.

@sbrunk
Copy link
Member

sbrunk commented Feb 9, 2021

@rstanevich yes I meant to use the diff only to help during development of the chart. It actually came up when I was looking into this PR to see how far you got compared with the kustomize based deployment, i.e. is the sandbox on par. I guess this is something you can answer, too. 😉

For us the upgrade path is actually not important because we don't run Flyte in prod yet but I guess for most people running Flyte in prod it's quite important.

@rstanevich
Copy link
Contributor Author

resolved in #916

@rstanevich rstanevich closed this Jun 14, 2021
eapolinario pushed a commit to eapolinario/flyte that referenced this pull request Dec 20, 2022
Signed-off-by: Niels Bantilan <niels.bantilan@gmail.com>
eapolinario pushed a commit to eapolinario/flyte that referenced this pull request Dec 20, 2022
* fix tag issue in ci

Signed-off-by: Yuvraj <evalsocket@users.noreply.github.com>

* remove welcome bot from boilerplate config

Signed-off-by: Yuvraj <evalsocket@users.noreply.github.com>

Co-authored-by: Yuvraj <evalsocket@users.noreply.github.com>
eapolinario pushed a commit to eapolinario/flyte that referenced this pull request Jul 24, 2023
* Infer GOOS and GOARCH from environment

Signed-off-by: Jeev B <jeevb@users.noreply.github.com>

* Multiarch builds for flytescheduler

Signed-off-by: Jeev B <jeevb@users.noreply.github.com>

* fix makefile to read variables from environment and overrides

Signed-off-by: Jeev B <jeevb@users.noreply.github.com>

---------

Signed-off-by: Jeev B <jeevb@users.noreply.github.com>
eapolinario pushed a commit to eapolinario/flyte that referenced this pull request Aug 9, 2023
* updated flyteidl to local to get ArrayNode

Signed-off-by: Daniel Rammer <daniel@union.ai>

* added boilerplate to support ArrayNode

Signed-off-by: Daniel Rammer <daniel@union.ai>

* pushing forward

Signed-off-by: Daniel Rammer <daniel@union.ai>

* refactored node executor interfaces to fix dependency cycle

Signed-off-by: Daniel Rammer <daniel@union.ai>

* refactoring almost complete

Signed-off-by: Daniel Rammer <daniel@union.ai>

* refactor complete

Signed-off-by: Daniel Rammer <daniel@union.ai>

* supporting environment variables

Signed-off-by: Daniel Rammer <daniel@union.ai>

* minimum viable product

Signed-off-by: Daniel Rammer <daniel@union.ai>

* update print statements for debugging

Signed-off-by: Daniel Rammer <daniel@union.ai>

* massive refactor fixing NodeExecutionContext override for ArrayNode

Signed-off-by: Daniel Rammer <daniel@union.ai>

* refactoring TODOs

Signed-off-by: Daniel Rammer <daniel@union.ai>

* subnode retries working

Signed-off-by: Daniel Rammer <daniel@union.ai>

* parallelism working

Signed-off-by: Daniel Rammer <daniel@union.ai>

* cache and cache_serialize working - first new functionality in maptask

Signed-off-by: Daniel Rammer <daniel@union.ai>

* adding implementation notes

Signed-off-by: Daniel Rammer <daniel@union.ai>

* removed eventing from subtasks

Signed-off-by: Daniel Rammer <daniel@union.ai>

* adding correct requirements

Signed-off-by: Daniel Rammer <daniel@union.ai>

* working end-2-end with flytekit

Signed-off-by: Daniel Rammer <daniel@union.ai>

* reporting output directory on success

Signed-off-by: Daniel Rammer <daniel@union.ai>

* fixed output directory append

Signed-off-by: Daniel Rammer <daniel@union.ai>

* mocking TaskTemplate interface to enable caching

Signed-off-by: Daniel Rammer <daniel@union.ai>

* capture failure reasons

Signed-off-by: Daniel Rammer <daniel@union.ai>

* wrapped up abort and finalize functionality

Signed-off-by: Daniel Rammer <daniel@union.ai>

* mocking initialization events

Signed-off-by: Daniel Rammer <daniel@union.ai>

* sending all events

Signed-off-by: Daniel Rammer <daniel@union.ai>

* minor refactoring of debug prints and formatting

Signed-off-by: Daniel Rammer <daniel@union.ai>

* intratask checkpointing working

Signed-off-by: Daniel Rammer <daniel@union.ai>

* support for  and

Signed-off-by: Daniel Rammer <daniel@union.ai>

* setting node log ids correctly

Signed-off-by: Daniel Rammer <daniel@union.ai>

* reporting cache status

Signed-off-by: Daniel Rammer <daniel@union.ai>

* correctly setting subnode abort phase

Signed-off-by: Daniel Rammer <daniel@union.ai>

* removing dead code

Signed-off-by: Daniel Rammer <daniel@union.ai>

* cleaned up most random TODO items

Signed-off-by: Daniel Rammer <daniel@union.ai>

* refactored into new files

Signed-off-by: Daniel Rammer <daniel@union.ai>

* refactoring for ArrayNode unit tests

Signed-off-by: Daniel Rammer <daniel@union.ai>

* refactored for unit testing to allow creation of NodeExecutor in array package

Signed-off-by: Daniel Rammer <daniel@union.ai>

* first unit test for handling ArrayNodePhaseNone

Signed-off-by: Daniel Rammer <daniel@union.ai>

* most of executing unit tests completed

Signed-off-by: Daniel Rammer <daniel@union.ai>

* finished executing unit tests

Signed-off-by: Daniel Rammer <daniel@union.ai>

* finished succeeding unit tests

Signed-off-by: Daniel Rammer <daniel@union.ai>

* wrote failing phase unit tests

Signed-off-by: Daniel Rammer <daniel@union.ai>

* moving towards complete unit_test success

Signed-off-by: Daniel Rammer <daniel@union.ai>

* unit tests passing

Signed-off-by: Daniel Rammer <daniel@union.ai>

* fixed lint issues

Signed-off-by: Daniel Rammer <daniel@union.ai>

* updated flyteidl dep

Signed-off-by: Daniel Rammer <daniel@union.ai>

* added unit tests for Abort

Signed-off-by: Daniel Rammer <daniel@union.ai>

* adding unit test for Finalize

Signed-off-by: Daniel Rammer <daniel@union.ai>

* added utils unit tests

Signed-off-by: Daniel Rammer <daniel@union.ai>

* moved state structs to handler package

Signed-off-by: Daniel Rammer <daniel@union.ai>

* added docs

Signed-off-by: Daniel Rammer <daniel@union.ai>

* cleaned up abort event reporting

Signed-off-by: Daniel Rammer <daniel@union.ai>

* fixed RecordNodeEvent unit tests

Signed-off-by: Daniel Rammer <daniel@union.ai>

* removed taskEventRecorder from nodes package

Signed-off-by: Daniel Rammer <daniel@union.ai>

* adding interface checking for arraynode

Signed-off-by: Daniel Rammer <daniel@union.ai>

* added transform unit test

Signed-off-by: Daniel Rammer <daniel@union.ai>

* fixed input bindings issue

Signed-off-by: Daniel Rammer <daniel@union.ai>

* fixed unit tests

Signed-off-by: Daniel Rammer <daniel@union.ai>

* fixed unit tests

Signed-off-by: Daniel Rammer <daniel@union.ai>

* go generate

Signed-off-by: Daniel Rammer <daniel@union.ai>

* addressing random TODO

Signed-off-by: Daniel Rammer <daniel@union.ai>

* fixed unit tests

Signed-off-by: Daniel Rammer <daniel@union.ai>

* addressing pr comments

Signed-off-by: Daniel Rammer <daniel@union.ai>

---------

Signed-off-by: Daniel Rammer <daniel@union.ai>
eapolinario pushed a commit to eapolinario/flyte that referenced this pull request Aug 21, 2023
* Infer GOOS and GOARCH from environment

Signed-off-by: Jeev B <jeevb@users.noreply.github.com>

* Multiarch builds for flytescheduler

Signed-off-by: Jeev B <jeevb@users.noreply.github.com>

* fix makefile to read variables from environment and overrides

Signed-off-by: Jeev B <jeevb@users.noreply.github.com>

---------

Signed-off-by: Jeev B <jeevb@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Convert Flyte deployment from Kustomize to Helm!
3 participants