Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate from minikube to k3d as backend implementation #29

Merged
merged 1 commit into from
Jan 27, 2023

Conversation

renatolfc
Copy link
Contributor

@renatolfc renatolfc commented Jan 27, 2023

The main change proposed in this PR is migrating our kubernetes cluster implementation from minikube to k3d. After running large workflows for a while now, we noticed that minikube with the Docker driver was too unstable, causing cascading crashes of both kubernetes services and our services.

We've been experimenting with k3d internally for more than a month now and it seems to be much more stable as a docker-based kubernetes platform, and we believe FarmVibes.AI users will benefit from that change.

When running the farmvibes-ai.sh script, it will automatically backup workflow state and suggest migrating a cluster from minikube to k3d.

Apart from that, we've made the following changes, listed below:

Client

  • [馃搱 IMPROVEMENT] We've improved the monitor client library to display correct durations for workflows that run for more than 24h
  • [馃搱 IMPROVEMENT] The client now differentiates between tasks that have been queued, and tasks that have actually started running
  • [馃搱 IMPROVEMENT] We now add the status of the workflow run to the header of the monitor method

Notebooks

  • [馃帀 NEW] Added a notebook for computing a timelapse of NDVI values using SpaceEye
  • [馃帀 NEW] Added a notebook for calculating various spectral indices (all indices listed in the Awesome Spectral Indices are supported)
  • [馃帀 NEW] Added a notebook that illustrates how to use the chunk_onnx workflow to run time-series analysis over raster images (in parallel)
  • [馃搱 IMPROVEMENT] Updated the DeepMC notebooks to download recent and archived numerical weather prediction (NWP) model outputs from different cloud archive sources

Workflows

  • [馃帀 NEW] Added a workflow to download forecast data using the herbie python package
  • [馃帀 NEW] Added workflows for downloading LIDAR/GEDI data
  • [馃帀 NEW] Added a workflows for downloading ERA5 monthly data
  • [馃帀 NEW] Added a workflow for applying ONNX models to chunks of raster images that span multiple time steps (to enable, for example, time-series analyses over raster images)
  • [馃搱 IMPROVEMENT] We now list Sentinel-1 products from the Planetary Computer, as opposed to SciHub
  • [馃搱 IMPROVEMENT] Updated the index workflow to support more spectral indices

Backend

  • [鈿掞笍 FIX] We fixed the workflow spec validator to check for destination ports also being a source
  • [鈿掞笍 FIX] The kubernetes backend has become much more stable due to the port to k3d
  • [鈿掞笍 FIX] We changed how we distribute work to farmvibes.ai workers, reducing the overhead of computation
  • [鈿掞笍 FIX] We now use dapr distributed locks to prevent duplication of work, making workflows run faster
  • [馃搱 IMPROVEMENT] The history of workflow execution is now maintained between clusters
    • As a consequence, migrating from minikube to k3d will keep all the workflow data users already had
  • [馃搱 IMPROVEMENT] The workflow validator now checks whether parameters have multiple default values
  • [馃搱 IMPROVEMENT] We now resolve workflow parameters to infer default values and descriptions from tasks (ops)
  • [馃搱 IMPROVEMENT] We now persist all pod logs to ~/.cache/farmvibes-ai/logs
  • [馃搱 IMPROVEMENT] Logs are persisted in a JSON-like format
  • [馃搱 IMPROVEMENT] We now propagate the failed and cancelled states to all ops that did not run when we cancel/fail a workflow
  • [馃搱 IMPROVEMENT] We now support fan-out of workflow sources, enabling parallel execution of source nodes
  • [馃搱 IMPROVEMENT] We added an ACKnowledgement message type for differentiating between started and queued tasks
  • [馃搱 IMPROVEMENT] Allow STAC query parameters in Planetary Computer collections
  • [馃搱 IMPROVEMENT] We've changed how tasks are assigned to workers, reducing the amount of messages interchanged between the worker and orchestrator components
  • [馃搱 IMPROVEMENT] Added the option to skip chips that have no data when running models over large rasters

Co-authored-by: Bruno Silva brunosilva@microsoft.com
Co-authored-by: Eduardo Rodrigues edrodrigues@microsoft.com
Co-authored-by: Naga Bilwanth Gangarapu Naga@zensa.co
Co-authored-by: Rafael Padilha rpadilha@microsoft.com
Co-authored-by: Renato Luiz de Freitas Cunha renato.cunha@microsoft.com
Co-authored-by: Roberto de Moura Estev茫o Filho robertode@microsoft.com
Co-authored-by: Sara Malvar saramalvar@microsoft.com

Co-authored-by: Bruno Silva <brunosilva@microsoft.com>
Co-authored-by: Eduardo Rodrigues <edrodrigues@microsoft.com>
Co-authored-by: Naga Bilwanth Gangarapu <Naga@zensa.co>
Co-authored-by: Rafael Padilha <rpadilha@microsoft.com>
Co-authored-by: Renato Luiz de Freitas Cunha <renato.cunha@microsoft.com>
Co-authored-by: Roberto de Moura Estev茫o Filho <robertode@microsoft.com>
Co-authored-by: Sara Malvar <saramalvar@microsoft.com>
@renatolfc renatolfc requested a review from lonnes January 27, 2023 00:13
@renatolfc renatolfc marked this pull request as ready for review January 27, 2023 13:08
@renatolfc renatolfc merged commit 978803b into main Jan 27, 2023
@renatolfc renatolfc deleted the from-devops-release-186 branch February 24, 2023 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants