-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* summary of changes - We have a strategy for /etc/hosts - _One_ curve cert is generated by the main pod and shared between them via a persistent volume - If we use a cloud K8s we might be able to use persistent volume claim and should be able to switch between the two - A flux user is created in each container, assumes a creation command and this might not be the case depending on the base OS (we might have different templates for different OS and ask the user as part of the MiniCluster config) - Events are moved into events.go and can be deleted if we don't need them. If this is the case we can also move the Client from an attribute on the Reconciler to be inherited (e.g., r.Get vs r.Client.Get) - My first attempt at a podExec function are moved into extra.go - if we need/want this we can debug further, otherwise it's not used and safe to delete. - Having the wait / startup script generate the certificate, and only given the main pod hostname, made the initContainers redundant (and I removed them). - Configs/templates are moved into templates.go so they are easier to find. - GetHostfileConfig is not GetConfigMap (and more generalized) - listed pods are now sorted by name so they are returned consistently - A ConfigMap volume at `/flux_operator` is where we are writing the entrypoint script (wait.sh) the start script, and the update_hosts.sh script. * adding update to/etc/hosts this is a bit of a hack that adds a script wrapper to the pod start, and the wrapper waits until it sees a file populated with ip addresses (or more specifically, echos to update /etc/hosts.). I think we can do this because when the pod is re-created, the ip address does not change! And what is happening while it is waiting is that a config map is updated with the (now known) ip addresses. This seems to allow the /etc/hosts to be correctly populated (determined by ping working) and I think next I need to debug why the broker still thinks it is waiting. * clean up unused functions * separating main node (to generate cert and start) from workers my flux still is not connecting "Unable to connect to Flux" so I need to debug this. But I (?)think it is more correct that only one of the nodes is running the start command and generating the certificate. Technically if this node knows that it can use the other ones we should not need to run it multiple times. * ensure we use a persistent volume for curve with the emptydir strategy each node had its own mount. If we use a persistent volume claim each node has access to the same certificate, and we do not need to worry about race because it is specifically written by just one hostname * good state - we have flux almost running, quorum delayed Signed-off-by: vsoch <vsoch@users.noreply.github.com>
- Loading branch information
Showing
14 changed files
with
691 additions
and
385 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
/* | ||
Copyright 2022 Lawrence Livermore National Security, LLC | ||
(c.f. AUTHORS, NOTICE.LLNS, COPYING) | ||
This is part of the Flux resource manager framework. | ||
For details, see https://github.com/flux-framework. | ||
SPDX-License-Identifier: Apache-2.0 | ||
*/ | ||
|
||
package controllers | ||
|
||
// Events are added to the Reconciler directly. If we don't need them: | ||
// 1. Delete this file | ||
// 2. Delete the AddEventFilter(r) | ||
// 3. (Optionally) the Reconciler Client can be inherited directly | ||
|
||
import ( | ||
jobctrl "flux-framework/flux-operator/pkg/job" | ||
|
||
api "flux-framework/flux-operator/api/v1alpha1" | ||
|
||
"k8s.io/klog/v2" | ||
"sigs.k8s.io/controller-runtime/pkg/event" | ||
) | ||
|
||
// Notify watchers (the FluxSetup) that we have a new job request | ||
func (r *MiniClusterReconciler) notifyWatchers(job *api.MiniCluster) { | ||
for _, watcher := range r.watchers { | ||
watcher.NotifyMiniClusterUpdate(job) | ||
} | ||
} | ||
|
||
// Called when a new job is created | ||
func (r *MiniClusterReconciler) Create(e event.CreateEvent) bool { | ||
|
||
// Only respond to job events! | ||
job, match := e.Object.(*api.MiniCluster) | ||
if !match { | ||
return true | ||
} | ||
|
||
// Add conditions - they should never exist for a new job | ||
job.Status.Conditions = jobctrl.GetJobConditions() | ||
|
||
// We will tell FluxSetup there is a new job request | ||
defer r.notifyWatchers(job) | ||
r.log.Info("🌀 MiniCluster create event", "Name:", job.Name) | ||
|
||
// Continue to creation event | ||
r.log.Info("🌀 MiniCluster was added!", "Name:", job.Name, "Condition:", jobctrl.GetCondition(job)) | ||
return true | ||
} | ||
|
||
func (r *MiniClusterReconciler) Delete(e event.DeleteEvent) bool { | ||
|
||
job, match := e.Object.(*api.MiniCluster) | ||
if !match { | ||
return true | ||
} | ||
|
||
defer r.notifyWatchers(job) | ||
log := r.log.WithValues("job", klog.KObj(job)) | ||
log.Info("🌀 MiniCluster delete event") | ||
|
||
// TODO should trigger a delete here | ||
// Reconcile should clean up resources now | ||
return true | ||
} | ||
|
||
func (r *MiniClusterReconciler) Update(e event.UpdateEvent) bool { | ||
oldMC, match := e.ObjectOld.(*api.MiniCluster) | ||
if !match { | ||
return true | ||
} | ||
|
||
// Figure out the state of the old job | ||
mc := e.ObjectNew.(*api.MiniCluster) | ||
|
||
r.log.Info("🌀 MiniCluster update event") | ||
|
||
// If the job hasn't changed, continue reconcile | ||
// There aren't any explicit updates beyond conditions | ||
if jobctrl.JobsEqual(mc, oldMC) { | ||
return true | ||
} | ||
|
||
// TODO: check if ready or running, shouldn't be able to update | ||
// OR if we want update, we need to completely delete and recreate | ||
return true | ||
} | ||
|
||
func (r *MiniClusterReconciler) Generic(e event.GenericEvent) bool { | ||
r.log.V(3).Info("Ignore generic event", "obj", klog.KObj(e.Object), "kind", e.Object.GetObjectKind().GroupVersionKind()) | ||
return false | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
package controllers | ||
|
||
// This file has extra (not used) functions that might be useful | ||
// (and I didn't want to delete just yet) | ||
|
||
import ( | ||
"context" | ||
"os" | ||
|
||
corev1 "k8s.io/api/core/v1" | ||
"k8s.io/apimachinery/pkg/runtime" | ||
"k8s.io/client-go/tools/remotecommand" | ||
|
||
api "flux-framework/flux-operator/api/v1alpha1" | ||
) | ||
|
||
// podExec executes a command to a named pod | ||
// This is not currenty in use. This seems to run but I don't see expected output | ||
func (r *MiniClusterReconciler) podExec(pod corev1.Pod, ctx context.Context, cluster *api.MiniCluster) error { | ||
|
||
command := []string{ | ||
"/bin/sh", | ||
"-c", | ||
"echo", | ||
"hello", | ||
"world", | ||
} | ||
|
||
// Prepare a request to execute to the pod in the statefulset | ||
execReq := r.RESTClient.Post().Namespace(cluster.Namespace).Resource("pods"). | ||
Name(pod.Name). | ||
Namespace(cluster.Namespace). | ||
SubResource("exec"). | ||
VersionedParams(&corev1.PodExecOptions{ | ||
Command: command, | ||
Container: pod.Spec.Containers[0].Name, | ||
Stdin: true, | ||
Stdout: true, | ||
Stderr: true, | ||
TTY: true, | ||
}, runtime.NewParameterCodec(r.Scheme)) | ||
|
||
exec, err := remotecommand.NewSPDYExecutor(r.RESTConfig, "POST", execReq.URL()) | ||
if err != nil { | ||
r.log.Error(err, "🌀 Error preparing command to execute to pod", "Name:", pod.Name) | ||
return err | ||
} | ||
|
||
// This is just for debugging for now :) | ||
err = exec.Stream(remotecommand.StreamOptions{ | ||
Stdin: os.Stdin, | ||
Stdout: os.Stdout, | ||
Stderr: nil, | ||
Tty: true, | ||
}) | ||
r.log.Info("🌀 PodExec", "Container", pod.Spec.Containers[0].Name) | ||
return err | ||
} |
Oops, something went wrong.