Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve robustness of Orchest Operator and update mechanism #966

Merged
merged 105 commits into from May 27, 2022

Conversation

nhaghighat
Copy link
Contributor

@nhaghighat nhaghighat commented May 19, 2022

Description

This PR fixes order of deployment by introducing new internal CRD named OrchestComponent. The OrchestCluster will be created by the user or the orchest-cli, then OrchestCluster controller creates different OrchestComponent for each service, then a dedicated component controller for each component, controls the status of the underlying objects of OrchestComponent

Fixes: #952, #991

Testing the PR

minikube addons enable ingress
eval $(minikube docker-env)
scripts/build_container.sh -M -t "v2022.05.3" -o "v2022.05.3"
pip install -e orchest-cli

# yes, this is ALL that is needed now to install Orchest
orchest install --dev

Testing orchest update through the CLI

orchest uninstall
scripts/build_container.sh -M -t "v2022.04.4" -o "v2022.04.4"
orchest install --dev
scripts/build_container.sh -M -t "v2022.04.5" -o "v2022.04.5"
orchest update --dev --version=v2022.04.5

Testing orchest update through the UI

NOTE: You need to have created the minikube with the orchest-dev-repo mount

orchest uninstall
scripts/build_container.sh -M -t "v2022.04.4" -o "v2022.04.4"
orchest install --dev
orchest patch --dev
pnpm run dev
scripts/build_container.sh -M -t "v2022.04.5" -o "v2022.04.5"
INVOKE THROUGH UI (go to http://localorchest.io/update)
scripts/build_container.sh -M -t "v2022.04.6" -o "v2022.04.6"
INVOKE THROUGH UI (go to http://localorchest.io/update)
... (as often as you like)

Checklist

  • The documentation reflects the changes.
  • The PR branch is set up to merge into dev instead of master.
  • In case I changed code in the orchest-cli I followed its release
    checklist
  • I haven't introduced breaking changes that would disrupt existing jobs, i.e. backwards
    compatibility is maintained.
  • In case I changed the dependencies in any requirements.in I have run pip-compile to update
    the corresponding requirements.txt
  • Start reports that Orchest is successfully started, but some deployments still have to start.
  • webserver hangs then restarts after a while on start if it's started concurrently w.r.t. to the orchest-api.
  • Celery fails to boot on fresh install
  • reliably report the status from orchest start about availability
  • check Ingress/deployment and service status of each component.
  • Updating the default pvc sizes: 50 GiB, 25 GiB, 25 GiB (userdir, registry, builder cache)
  • make possible through CLI to specify it as well as singleNode installation. --> To be done later so we can get this PR merged.
  • Enable to update all controller manifest in update.
    • Update CRD changes through update.
    • Add all manifests into one big file
    • orchest-cli use the manifest from release assets
    • GitHub Action to add the yaml files as assets to the release
    • Update GitHub Action of updating controller image on manifests (removed)
  • Add annotation in the namespace manifest. to avoid kubectl warning (Not needed)
  • Testing behavior
    • orchest update through CLI
    • orchest update through UI
      • The UpdateView.tsx needs to parse the response from the controller endpoint correctly and show it as logs
    • orchest restart through CLI and UI
    • orchest patch
    • orchest install
  • Update documentation
    • Update installation docs
    • mention in the docs that ingress controller needs to be present to move Running state
    • Note that the docker-registry is managed through Helm and not the orchest-controller.
    • Document about how to run the operator outside of the cluster for easy debugging
    • Improve docstrings/comments around key functionality of the controller
    • Controller readme.
    • Remove unused Helm charts, e.g. orchest-api ones, from services/orchest-controller/deploy
    • Mention in docs that pvc size can only be increased & that the default storage class is used if not specified. --> Will be added to the CLI in another PR
    • Update internal document about the release process given the changes that took place
  • renaming pause/unpause to start/stop
  • Migrate the CLI to install in one namespace only. Needs to parse release asset to set the namespace.
    • Check whether K8s installation into pre-existing namespace? #866 is resolved. --> Not yet, the Helm deployer in the orchest-controller doesn't seem to be picking up custom namespaces
    • Depending on the choice, the CLI orchest uninstall might need to change slightly. If we choose to go for it, then there should be a default flag.

@nhaghighat nhaghighat force-pushed the feat/controller-fix-deployment-order-readiness branch from 56b6947 to 31de0ee Compare May 22, 2022 14:24
nhaghighat and others added 12 commits May 25, 2022 19:52
In case you want to test it:
```bash
orchest uninstall
scripts/build_container.sh -M -t "v2022.04.4" -o "v2022.04.4"
orchest install --dev
orchest patch --dev
pnpm run dev
scripts/build_container.sh -M -t "v2022.04.5" -o "v2022.04.5"
INVOKE THROUGH UI (go to http://localorchest.io/update)
```
This change does result in very poor logging messages on update. It does
still satisfy the main requirement of indicating an update so that the
user knows it is not stuck.
…ub.com:orchest/orchest into feat/controller-fix-deployment-order-readiness
Otherwise the install and the controller manifest will both try to
create the namespace, which won't work.
…ub.com:orchest/orchest into feat/controller-fix-deployment-order-readiness
@fruttasecca
Copy link
Member

Fixes: #991

@nhaghighat nhaghighat added the improvement An improvement or enhancement to an existing feature. label May 26, 2022
@yannickperrenet yannickperrenet merged commit 47b6e28 into dev May 27, 2022
@yannickperrenet yannickperrenet deleted the feat/controller-fix-deployment-order-readiness branch May 27, 2022 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change PR introduces breaking changes improvement An improvement or enhancement to an existing feature.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants