Skip to content

Commit

Permalink
add support for flux option flags (#49)
Browse files Browse the repository at this point in the history
* First testing workflows added for unit tests (not fully written yet) and headless workflow tests.
  A headless workflow test means that we can add a file to examples/tests and then automate
  running it with a simple script. The GitHub CI can now start the minicluster, install the operator,
  and then manage apply the configs to it and shutting down.
* testing workflows include laamps, hello-world, and a pokemon workflow
* template wait.sh now is properly rendered with named variables instead of string formatting
* testing workflow attempts to cache go deps to (hopefully) not go over ratelimits. We might need
  additional caching - TBA.
* Addition of TestMode, SleepTime, and FluxRestful.Branch to CRD. The test mode silences all
  output except for the job, SleepTime allows the user to control that, and branch is for the
  Flux Restful API (if being used).
* Support for PreCommand in the CRD! This is huge because it allows for more fined tuning of
  running logic in the wait.sh script. E.g., to source an environment or modify the command to
  run things as the flux user with sudo.
* wait script and associated logic is moved to the level of the container. The reason is because we
  eventually might want custom logic associated with each container, and this means rendering from
  different sources.
* clear definition of rules needed for a valid running container in the docs here, and we now require
  sudo to be installed along with flux, and better control creating the flux user, and using the correct
  id if the user is already created.
* add support for flux option flags
* ensure we do not erase existing flags in the container
* entire addition of examples/flux-restful and examples/tests folders that we can provide to the user base, and
  ensure they continue working with tests. I'd like to eventually turn this into a gallery
* entire new docs section that explain running these examples, and the tests.
* testing scripts associated with the above to clean the namespace, run the operator, apply the example,
   wait for output, compare output with expected, and check return codes of the job containers.

this adds support for fluxOptionFlags to be defined on the level
of the cluster. We check if the variable is defined, and if not,
we do not attempt to export in the environment, as the base container
might already specify a preference. To test this fully I am waiting
for the osu benchmarks container to build, and then I can test it
here with an example workflow.

* updates so precommand and flux option args are on the level of the container

this has been tested with and without option args and seems
to work for both! We are also using more proper templating
in go instead of what I was doing before. Next I need to be
able to write tests to ensure these workflos do not break.

Signed-off-by: vsoch <vsoch@users.noreply.github.com>
  • Loading branch information
vsoch committed Dec 22, 2022
1 parent 5bf0152 commit 073b4ef
Show file tree
Hide file tree
Showing 30 changed files with 968 additions and 72 deletions.
95 changes: 90 additions & 5 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,11 @@
name: test flux-operator

on:
# This should run on a push to any branch except main, gh-pages, and binoc
push:
branches-ignore:
- main
- gh-pages
pull_request: []

jobs:
formatting:
name: Formatting
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
Expand All @@ -20,3 +17,91 @@ jobs:
uses: crate-ci/typos@7ad296c72fa8265059cc03d1eda562fbdfcd6df2 # v1.9.0
with:
files: ./docs/*/*.md ./docs/*.md ./README.md ./config/samples ./docs/*.md

unit-tests:
name: Unit Tests
runs-on: ubuntu-latest
outputs:
go-mod: ${{ steps.go-cache-paths.outputs.go-mod }}
go-build: ${{ steps.go-cache-paths.outputs.go-build }}

steps:
- uses: actions/checkout@v3
- name: Setup Go
uses: actions/setup-go@v3
with:
go-version: ^1.18
- name: fmt check
run: make fmt

- id: go-cache-paths
run: |
echo go-build=$(go env GOCACHE) >> $GITHUB_OUTPUT
echo go-mod=$(go env GOMODCACHE) >> $GITHUB_OUTPUT
- name: Build and Install Operator
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
make
make manifests
make kustomize
# These aren't written yet)
- name: Run Unit tests
run: make test

# Cache the operator build because it can use up GITHUB API
- name: Cache Dependencies
uses: actions/cache@v3
with:
path: ${{ steps.go-cache-paths.outputs.go-mod }}
key: ${{ runner.os }}-go-mod-${{ hashFiles('**/go.sum') }}

test-jobs:
needs: [unit-tests]
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
test: [["hello-world", "ghcr.io/flux-framework/flux-restful-api:latest", 10],
["lammps", "ghcr.io/rse-ops/lammps:flux-sched-focal-v0.24.0", 15],
["pokemon", "ghcr.io/rse-ops/pokemon:app-latest", 10]]

steps:
- name: Clone the code
uses: actions/checkout@v3

- name: Setup Go
uses: actions/setup-go@v3
with:
go-version: ^1.18

- name: Start minikube
uses: medyagh/setup-minikube@697f2b7aaed5f70bf2a94ee21a4ec3dde7b12f92 # v0.0.9

- name: Create the namespace
run: kubectl create namespace flux-operator

- name: Add Cached Dependencies
uses: actions/cache@v3
with:
path: ${{ needs.unit-tests.outputs.go-mod }}
key: ${{ runner.os }}-go-mod-${{ hashFiles('**/go.sum') }}

- name: Pull Docker Containers to MiniKube
env:
container: ${{ matrix.test[1] }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
export SHELL=/bin/bash
eval $(minikube -p minikube docker-env)
minikube ssh docker pull ${container}
make
make install
- name: Test ${{ matrix.test[0] }}
env:
name: ${{ matrix.test[0] }}
jobtime: ${{ matrix.test[2] }}
run: /bin/bash ./script/test.sh ${name} ${jobtime}
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ bundle_*

# Output of the go coverage tool, specifically when used with LiteIDE
*.out
*.err

# Kubernetes Generated files - skip generated files, except for vendored files

Expand Down
53 changes: 44 additions & 9 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,14 @@ BUNDLE_METADATA_OPTS ?= $(BUNDLE_CHANNELS) $(BUNDLE_DEFAULT_CHANNEL)
# For example, running 'make bundle-build bundle-push catalog-build catalog-push' will build and push both
# flux-framework.org/operator-bundle:$VERSION and flux-framework.org/operator-catalog:$VERSION.
IMAGE_TAG_BASE ?= ghcr.io/flux-framework/flux-operator
KIND_VERSION=v0.11.1
# This kubectl version supports -k for kustomization, taken from mpi
KUBECTL_VERSION=v1.21.4

# BUNDLE_IMG defines the image:tag used for the bundle.
# You can use it as an arg. (E.g make bundle-build BUNDLE_IMG=<some-registry>/<project-name-bundle>:<tag>)
BUNDLE_IMG ?= $(IMAGE_TAG_BASE)-bundle:v$(VERSION)
IMG_BUILDER=docker

# BUNDLE_GEN_FLAGS are the flags passed to the operator-sdk generate bundle command
BUNDLE_GEN_FLAGS ?= -q --overwrite --version $(VERSION) $(BUNDLE_METADATA_OPTS)
Expand Down Expand Up @@ -131,25 +135,56 @@ clean:
kubectl delete -n flux-operator MiniCluster --all
rm -rf yaml/*.yaml

applyall:
bin/kustomize build config/samples | kubectl apply -f -

# This applies the basic minicluster (and not extended examples)
apply:
kubectl apply -f config/samples/flux-framework.org_v1alpha1_minicluster.yaml
kubectl apply -f examples/flux-restful/minicluster-lammps.yaml

applyui:
kubectl apply -f examples/flux-restful/minicluster-$(name).yaml

applytest:
kubectl apply -f examples/tests/${name}/minicluster-$(name).yaml

applymulti:
kubectl apply -f config/samples/flux-framework.org_v1alpha1_minicluster_multiple_containers.yaml
example:
kubectl apply -f examples/flux-restful/minicluster-$(name).yaml

# Clean, apply and run, and apply the job
redo: clean apply run

# Clean, apply and run multiple container example
multi: clean applymulti run
redo_example: clean example run
redo_test: clean applytest run

log:
kubectl logs -n flux-operator job.batch/flux-sample $@

##@ Test
# NOTE these are not fully developed yet

bin/kubectl:
mkdir -p bin
curl -L -o bin/kubectl https://dl.k8s.io/release/${KUBECTL_VERSION}/bin/linux/amd64/kubectl
chmod +x bin/kubectl

.PHONY: test_e2e
test_e2e: export TEST_FLUX_OPERATOR_IMAGE = ${IMAGE_TAG_BASE}:latest
test_e2e: bin/kubectl kind images dev_manifest
go test -tags e2e ./tests/e2e/...

.PHONY: dev_manifest
dev_manifest:
# Use `~` instead of `/` because image name might contain `/`.
sed -e "s~%IMAGE_NAME%~${IMAGE_TAG_BASE}~g" -e "s~%IMAGE_TAG%~${VERSION}~g" config/manifests/overlays/dev/kustomization.yaml.template > config/manifests/overlays/dev/kustomization.yaml

.PHONY: kind
kind:
go install sigs.k8s.io/kind@${KIND_VERSION}


# TODO add build arg for version
.PHONY: images
images:
@echo "VERSION: ${VERSION}"
${IMG_BUILDER} build -t ${IMAGE_TAG_BASE}:local .

##@ Build

.PHONY: build
Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,13 @@ And the following external resources might be useful:

- [Flux HPC Examples](https://github.com/rse-ops/flux-hpc) containers and CRD for the operator to run Flux with HPC workloads (under development)

**Note** this project is actively under development, and you can expect change and improvements!
We apologize for bugs you run into, and hope you tell us soon so we can work on resolving them.

#### License

This work is licensed under the [Apache-2.0](https://github.com/kubernetes-sigs/kueue/blob/ec9b75eaadb5c78dab919d8ea6055d33b2eb09a2/LICENSE) license.

SPDX-License-Identifier: Apache-2.0

LLNL-CODE-764420
LLNL-CODE-764420
43 changes: 43 additions & 0 deletions api/v1alpha1/minicluster_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,22 @@ type MiniClusterSpec struct {
// There should only be one container to run flux with runFlux
Containers []MiniClusterContainer `json:"containers"`

// Test mode silences all output so the job only shows the test running
// +kubebuilder:default=false
// +optional
TestMode bool `json:"test"`

// Customize sleep time if job takes longer to setup
// This isn't ideal, but we have to control order that workers start
// +kubebuilder:default=-1
// +optional
SleepTime int `json:"sleeptime"`

// Customization to Flux Restful API
// There should only be one container to run flux with runFlux
// +optional
FluxRestful FluxRestful `json:"fluxRestful"`

// Size (number of jobs to run)
// +kubebuilder:default=1
// +optional
Expand Down Expand Up @@ -64,6 +80,14 @@ type MiniClusterStatus struct {
Conditions []metav1.Condition `json:"conditions,omitempty"`
}

type FluxRestful struct {

// Branch to clone Flux Restful API from
// +kubebuilder:default="main"
// +optional
Branch string `json:"branch"`
}

type MiniClusterContainer struct {

// Container image must contain flux and flux-sched install
Expand Down Expand Up @@ -101,6 +125,19 @@ type MiniClusterContainer struct {
// +optional
FluxRunner bool `json:"runFlux"`

// Flux option flags, usually provided with -o
// optional - if needed, default option flags for the server
// These can also be set in the user interface to override here.
// This is only valid for a FluxRunner
// +optional
FluxOptionFlags string `json:"fluxOptionFlags"`

// Special command to run at beginning of script, directly after asFlux
// is defined as sudo -u flux -E (so you can change that if desired.)
// This is only valid if FluxRunner is set (that writes a wait.sh script)
// +optional
PreCommand string `json:"preCommand"`

// Lifecycle can handle post start commands, etc.
// +optional
LifeCyclePostStartExec string `json:"postStartExec"`
Expand Down Expand Up @@ -130,6 +167,12 @@ func (f *MiniCluster) Validate() bool {
valid := true
fluxRunners := 0

// If we only have one container, assume we want to run flux with it
// This makes it easier for the user to not require the flag
if len(f.Spec.Containers) == 1 {
f.Spec.Containers[0].FluxRunner = true
}

for i, container := range f.Spec.Containers {
name := fmt.Sprintf("MiniCluster.Container.%d", i)
fmt.Printf("🤓 %s.Image %s\n", name, container.Image)
Expand Down
16 changes: 16 additions & 0 deletions api/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

31 changes: 31 additions & 0 deletions config/crd/bases/flux-framework.org_miniclusters.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,12 @@ spec:
IMPORTANT: This is left here, but not used in favor of exposing
Flux via a Restful API. We Can remove this when that is finalized.'
type: string
fluxOptionFlags:
description: Flux option flags, usually provided with -o optional
- if needed, default option flags for the server These can
also be set in the user interface to override here. This is
only valid for a FluxRunner
type: string
image:
default: fluxrm/flux-sched:focal
description: Container image must contain flux and flux-sched
Expand All @@ -62,6 +68,12 @@ spec:
postStartExec:
description: Lifecycle can handle post start commands, etc.
type: string
preCommand:
description: Special command to run at beginning of script,
directly after asFlux is defined as sudo -u flux -E (so you
can change that if desired.) This is only valid if FluxRunner
is set (that writes a wait.sh script)
type: string
pullAlways:
default: false
description: Allow the user to dictate pulling By default we
Expand All @@ -88,6 +100,15 @@ spec:
diagnostics:
description: Run flux diagnostics on start instead of command
type: boolean
fluxRestful:
description: Customization to Flux Restful API There should only be
one container to run flux with runFlux
properties:
branch:
default: main
description: Branch to clone Flux Restful API from
type: string
type: object
localDeploy:
default: false
description: localDeploy should be true for development, or deploying
Expand All @@ -100,6 +121,16 @@ spec:
description: Size (number of jobs to run)
format: int32
type: integer
sleeptime:
default: -1
description: Customize sleep time if job takes longer to setup This
isn't ideal, but we have to control order that workers start
type: integer
test:
default: false
description: Test mode silences all output so the job only shows the
test running
type: boolean
required:
- containers
type: object
Expand Down

0 comments on commit 073b4ef

Please sign in to comment.