New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NETOBSERV-473 - Loki and strimzi operator installation #172
Conversation
Skipping CI for Draft Pull Request. |
6c7dabf
to
7067927
Compare
aed63c3
to
cd996aa
Compare
We should keep in mind that we will also probably need a similar prometheus operator installation to expose the flp metrics. |
Yes @KalmanMeth these are followups: And grafana could also be a good candidate: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried but the following pods are stuck in ContainerCreating
status:
- netobserv-ebpf-agent
- flowlogs-pipeline-transformer
- netobserv-plugin
Due to this errors in the description's events list:
Normal Scheduled 2m53s default-scheduler Successfully assigned netobserv-privileged/netobserv-ebpf-agent-99tzk to ip-10-0-144-117.ec2.internal by ip-10-0-169-224
Warning FailedMount 50s kubelet Unable to attach or mount volumes: unmounted volumes=[kafka-certs-ca kafka-certs-user], unattached volumes=[kube-api-access-2zbnc kafka-certs-ca kafka-certs-user]: timed out waiting for the condition
Warning FailedMount 46s (x9 over 2m53s) kubelet MountVolume.SetUp failed for volume "kafka-certs-ca" : secret "kafka-cluster-cluster-ca-cert" not found
Warning FailedMount 46s (x9 over 2m53s) kubelet MountVolume.SetUp failed for volume "kafka-certs-user" : secret "flp-kafka" not found
Warning FailedMount 4m10s (x7 over 4m41s) kubelet MountVolume.SetUp failed for volume "kafka-certs-ca" : secret "kafka-cluster-cluster-ca-cert" not found
Warning FailedMount 3m38s (x8 over 4m41s) kubelet MountVolume.SetUp failed for volume "loki-certs-ca" : configmap "lokistack-ca-bundle" not found
To be addressed in another PR/task: There are some ugly stacktraces in the manager logs due to trying to reconcile elements while the related CRD is still not applied:
1.6649732503408134e+09 INFO controller.flowcollector checking for lokistack in ns netobserv ... {"reconciler group": "flows.netobserv.io", "reconciler kind": "FlowCollector", "name": "cluster", "namespace": "", "component": "ClientHelper", "function": "ApplyWithNamespaceOverride"}
1.6649732503408515e+09 ERROR controller.flowcollector Can't apply embed/loki_instance.yaml yaml {"reconciler group": "flows.netobserv.io", "reconciler kind": "FlowCollector", "name": "cluster", "namespace": "", "component": "OperatorsController", "function": "manageOperator", "error": "no matches for kind \"LokiStack\" in version \"loki.grafana.com/v1\""}
github.com/netobserv/network-observability-operator/controllers/operators.(*Reconciler).Reconcile
/opt/app-root/controllers/operators/operators_reconciler.go:124
github.com/netobserv/network-observability-operator/controllers.(*FlowCollectorReconciler).Reconcile
/opt/app-root/controllers/flowcollector_controller.go:165
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
/opt/app-root/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/opt/app-root/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/opt/app-root/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
1.66497323313885e+09 ERROR controller.flowcollector Can't apply embed/loki_instance.yaml yaml {"reconciler group": "flows.netobserv.io", "reconciler kind": "FlowCollector", "name": "cluster", "namespace": "", "component": "OperatorsController", "function": "manageOperator", "error": "no matches for kind \"LokiStack\" in version \"loki.grafana.com/v1\""}
github.com/netobserv/network-observability-operator/controllers/operators.(*Reconciler).Reconcile
/opt/app-root/controllers/operators/operators_reconciler.go:124
github.com/netobserv/network-observability-operator/controllers.(*FlowCollectorReconciler).Reconcile
/opt/app-root/controllers/flowcollector_controller.go:165
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
/opt/app-root/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/opt/app-root/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
objectStorageType: s3 | ||
size: 1x.extra-small | ||
storageClassName: gp2 | ||
retentionDays: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must it be in days? maybe if the storage is too high, some users might want to set the retention to hours.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is in days on loki-operator side. We can sync with logging team if we really need less
Loki roles are now automatically set 552e3f5 |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1 similar comment
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
PR has been rebased. No more changes on my side.
|
@jpinsonneau I tried to install it but I had some problems with the PersistentVolume claims:
That's ok, I just changed the size of the storage and reapplied the Flowcollector (also decreasing the instances for Kafka and Zookeper) but the changes didn't take effect. Then I tried to deleting the FlowCollector but the reconciliation of kafka was still blocked:
I had to completely undeploy the operator to be able to remove the pods. |
@jpinsonneau I redeployed with smaller persistent volume sizes and the status of the flowcollector is
Tested in:
|
apiVersion: operators.coreos.com/v1 | ||
kind: OperatorGroup | ||
metadata: | ||
name: netobserv-dependend-operators |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: dependend => dependent
Also, what is it / how does that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a mandatory resource generating RBACs
https://olm.operatorframework.io/docs/tasks/install-operator-with-olm/#prerequisites
https://docs.openshift.com/container-platform/4.8/operators/understanding/olm/olm-understanding-operatorgroups.html
You can't create a subscription in a particular namespace without this
I've finished reviewing the code. I have some remarks but I think they can be addressed in a follow-up (we should probably try to go ahead and iterate from there, in order to have it more tested before GA, at least for non-regressions. I just hope this feature will not bring too much maintenance cost for the benefit, because it is quite complex and depends on external factors). I haven't tested, except some non-regression tests. I leave it to people who test to put the lgtm mark :) What I'd like to see in a follow-up:
|
@@ -630,7 +630,7 @@ func GetReadyCR(key types.NamespacedName) *flowsv1alpha1.FlowCollector { | |||
return err | |||
} | |||
cond := meta.FindStatusCondition(cr.Status.Conditions, conditions.TypeReady) | |||
if cond.Status == metav1.ConditionFalse { | |||
if cond != nil && cond.Status == metav1.ConditionFalse { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpinsonneau maybe the problem is here: if cond is nil, then it should return an error ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks it seems to work as expected 👍
/hold |
@jpinsonneau: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
NETOBSERV-234 remove slashes in debug json flows
This PR adds an option for both loki & kafka to install dependent operators automatically using:
It create subscription based on the environment currently forced to
openshift
for testing.Then it apply related yaml from
controllers/operators/embed/
folder overriding optional configuration provided ininstanceSpec
./!\ you still need to manage some manual tasks:
loki-secret
for storage accessapply loki role (currently returning an error since controller service account doesn't have these roles)loki roles are automatically applied 552e3f5
copy kafka secrets innetobserv-privileged
namespace (make fix-ebpf-kafka-tls)kafka secret are automatically copied 55dbebf
loki-operator
until release 5.6.x (Nov 15th)Create the catalog using the following yaml:
Then specify it in our CRD: