-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate to operator-sdk 1.16.0 #108
Conversation
Migrate the Smart Gateway Operator from operator-sdk 0.19.4 to operator-sdk 1.16.0 for OpenShift v4.9 and later. A lot of changes here are boilerplate as generated by operator-sdk init and operator-sdk create api commands. Once the boilerplate was created, then the existing Ansible role was moved over. As the Ansible Operator framework is now using collections, the name of the k8s module and filters were updated to reflect this in the Ansible roles/ directory. RBAC changes were also implemented to satisfy for the existing RBAC rules that were in place prior to the migration. A new smart-gateway role and service account were created in order to satisfy for the running sg-core workload along with the lated oauth-proxy image shipping with OCP v4.9. GitHub actions have been updated to reflect the changes to the ./build/generate_bundle.sh script, which was updated to use the new Makefile approach to building and running the Operator. Additionally, the sample CustomResource for a SmartGateway has been changed to leverage the dummy metrics plugin removing the requirement to have a running AMQ Interconnect router system and data source. This should make testing via molecule (or otherwise) easier in the future, allowing for component testing outside of system e2e testing.
config/manifests/bases/smart-gateway-operator.clusterserviceversion.yaml
Outdated
Show resolved
Hide resolved
Bump the maxOpenShiftVersion to OCP 4.10 which is the target for the next y-stream release of STF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some initial comments while this is still draft.
I'm going to need to get up to speed with kustomize, since I've avoided it thus far.
Do the molecule tests work now? I don't have them working locally, but it's due to a problem with kustomize that I haven't started looking into.
My only concern so far is I think the RBAC on the new SG service account is over-broad.
@@ -0,0 +1,182 @@ | |||
# VERSION defines the project version for the bundle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this file based off boiler-plate generated by the new operator-sdk?
ETA: Yes it is https://sdk.operatorframework.io/docs/cli/operator-sdk_init/
@@ -0,0 +1,16 @@ | |||
domain: infra.watch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New boilerplate as well I guess? Answering my own questions here https://sdk.operatorframework.io/docs/cli/operator-sdk_init/
- name: smartgateway-sample | ||
ports: | ||
- name: prom-http | ||
port: 8083 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the benefit of other reviewers... This looks wrong (8081 != 8083) but is actually correct since that endpoint is protected by the oauth-proxy on port 8083
@@ -34,15 +35,15 @@ spec: | |||
- -tls-cert=/etc/tls/private/tls.crt | |||
- -tls-key=/etc/tls/private/tls.key | |||
- -cookie-secret-file=/etc/proxy/secrets/session_secret | |||
- -openshift-service-account=NA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll chalk this up to a change in the oauth container. It kinda felt weird to me that it worked without the extra RBAC to begin with.
Agreed on the RBAC. It's definitely overly broad and I'd like to review again and make sure things are appropriate before merge. The molecule tests may or may not work. I will need to test them, but we have never used them previously, so I consider that a new feature. The old molecule tests probably didn't work either. |
I'd cut it down to just the tokenreviews and subjectaccessreviews perms and see if it works like that. I don't recall off-hand if the service account itself needs read perms to mount secrets or configmaps to the pod, but I don't think so.
They definitely didn't work before, which is why I was excited to see them changing. |
Initial testing looks really good. Need to do some more, but was able to build and deploy everything, and metrics SGs are operational and receiving data. No errors in logs wrt RBAC etc. I have not made any changes to the RBAC yet to bring scope back into compliance. git diff
diff --git a/config/default/kustomization.yaml b/config/default/kustomization.yaml
index 65c4160..1828ce5 100644
--- a/config/default/kustomization.yaml
+++ b/config/default/kustomization.yaml
@@ -1,5 +1,5 @@
# Adds namespace to all resources.
-namespace: smart-gateway-operator-system
+namespace: service-telemetry
# Value of this field is prepended to the
# names of all resources, e.g. a deployment named make docker-build docker-push VERSION=5.0.0-dev5
make deploy VERSION=5.0.0-dev5
oc get pods
NAME READY STATUS RESTARTS AGE
alertmanager-default-0 3/3 Running 0 9d
default-cops04-ceil-meter-smartgateway-9899557d5-rmpng 3/3 Running 0 11m
default-cops04-coll-meter-smartgateway-77ccb555d5-bbln7 3/3 Running 0 11m
default-cops04-sens-meter-smartgateway-7ffbcc5557-6nhbv 3/3 Running 0 11m
default-interconnect-675dd97bc4-bfkkn 1/1 Running 0 9d
elastic-operator-7489cf8bd6-56c8x 1/1 Running 152 14d
interconnect-operator-6bf74c4ffb-p9qdf 1/1 Running 0 14d
prometheus-default-0 3/3 Running 1 9d
prometheus-operator-7f6948966b-h47px 1/1 Running 0 14d
service-telemetry-operator-5f6488bbbf-t42vf 1/1 Running 0 3d22h
smart-gateway-operator-controller-manager-5895b86b59-df9ct 2/2 Running 0 12m
# ran various oc logs ... commands to look at contents of pods, no errors, everything loaded properly |
|
||
- name: Generate bundle | ||
run: WORKING_DIR=/tmp/bundle ./build/generate_bundle.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed use of WORKING_DIR here because it's not easily controlled without making changes to the Makefile, so I'm just going to use the default locations. It seems to be working fine in CI so I don't see the need to extend Makefile to allow separate directories.
Dropping the Edit: at least once #109 merges into this branch. |
@@ -1,3 +1,6 @@ | |||
.cache/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
* Test via molecule with KinD Implement a basic molecule test with KinD in order to load the operator and the workload, then perform a basic assertion test to verify that everything is operational. More work will be required here as the test is effectively a stub at this point. In the molecule test needed to fake the creation of a TLS certificate that is normally dealt with in OpenShift without this effort. Created a locally self-signed certificate that is then loaded as the smartgateway-sample-proxy-tls Secret that is mounted by the Smart Gateway Deployment as a mounted volume for oauth-proxy. In an OpenShift environment the oauth-proxy is available in the local registry. Within a KinD environment that is not available. The existing oauth_proxy_image path is defaulted in the role, but is now exposed as a parameter to the SmartGateway custom resource. Because of the migration to CRD/v1 unknown parameters are no longer preserved by default. This new parameter is exposed primarily for testing but should be useful in the future as well. Adds another Makefile option called kustomize-local which doesn't have the same checks as the standard kustomize command. In the testing environment an incompatible version of kustomize may be installed, and it's not always possible to remove it. Using kustomize-local results in a specific kustomize version being added locally and then referenced by the testing infrastructure for consistency. * Remove extra --namespace parameter from molecule test
Update the README.md to match the currently known methods to install Smart Gateway Operator using the new operator-sdk migration. Update the metadata to cover the new version of the Operator from 4.0 to 5.0 major release number.
I think I'm happy enough with this now that I'm going to merge it down. Some additional clean up and changes may be necessary but at this point it looks like everything is working as intended. |
Revert changes to migrate Smart Gateway Operator to operator-sdk 1.16.0 until the same set of changes can be made to Service Telemetry Operator. Changing how the operator is built causes a regression in the testing framework for STF. Instead of making temporary changes to the testing framework to deal with the changes in Operator-SDK 1.16.0, we'll make changes to the testing framework when migrating Service Telemetry Operator to operator-sdk 1.16.0 as well. In the meantime, revert these changes which will just end up landing in conjunction with the changes to Service Telemetry Operator as a bundled change set. This reverts commit 8328ed8.
Revert changes to migrate Smart Gateway Operator to operator-sdk 1.16.0 until the same set of changes can be made to Service Telemetry Operator. Changing how the operator is built causes a regression in the testing framework for STF. Instead of making temporary changes to the testing framework to deal with the changes in Operator-SDK 1.16.0, we'll make changes to the testing framework when migrating Service Telemetry Operator to operator-sdk 1.16.0 as well. In the meantime, revert these changes which will just end up landing in conjunction with the changes to Service Telemetry Operator as a bundled change set. This reverts commit 8328ed8.
* Migrate to operator-sdk 1.16.0 Migrate the Smart Gateway Operator from operator-sdk 0.19.4 to operator-sdk 1.16.0 for OpenShift v4.9 and later. A lot of changes here are boilerplate as generated by operator-sdk init and operator-sdk create api commands. Once the boilerplate was created, then the existing Ansible role was moved over. As the Ansible Operator framework is now using collections, the name of the k8s module and filters were updated to reflect this in the Ansible roles/ directory. RBAC changes were also implemented to satisfy for the existing RBAC rules that were in place prior to the migration. A new smart-gateway role and service account were created in order to satisfy for the running sg-core workload along with the lated oauth-proxy image shipping with OCP v4.9. GitHub actions have been updated to reflect the changes to the ./build/generate_bundle.sh script, which was updated to use the new Makefile approach to building and running the Operator. Additionally, the sample CustomResource for a SmartGateway has been changed to leverage the dummy metrics plugin removing the requirement to have a running AMQ Interconnect router system and data source. This should make testing via molecule (or otherwise) easier in the future, allowing for component testing outside of system e2e testing. * Bump maxOpenShiftVersion to 4.10 Bump the maxOpenShiftVersion to OCP 4.10 which is the target for the next y-stream release of STF. * Resolve issues with CI testing * Ignore bin/ directory holding local kustomize binary * Resolve indentation inconsistency * Reduce RBAC permission for Smart Gateway instances * Drop superfluous resource from authentication API group * Bump operator version annotation * Test via molecule with KinD (#109) * Test via molecule with KinD Implement a basic molecule test with KinD in order to load the operator and the workload, then perform a basic assertion test to verify that everything is operational. More work will be required here as the test is effectively a stub at this point. In the molecule test needed to fake the creation of a TLS certificate that is normally dealt with in OpenShift without this effort. Created a locally self-signed certificate that is then loaded as the smartgateway-sample-proxy-tls Secret that is mounted by the Smart Gateway Deployment as a mounted volume for oauth-proxy. In an OpenShift environment the oauth-proxy is available in the local registry. Within a KinD environment that is not available. The existing oauth_proxy_image path is defaulted in the role, but is now exposed as a parameter to the SmartGateway custom resource. Because of the migration to CRD/v1 unknown parameters are no longer preserved by default. This new parameter is exposed primarily for testing but should be useful in the future as well. Adds another Makefile option called kustomize-local which doesn't have the same checks as the standard kustomize command. In the testing environment an incompatible version of kustomize may be installed, and it's not always possible to remove it. Using kustomize-local results in a specific kustomize version being added locally and then referenced by the testing infrastructure for consistency. * Remove extra --namespace parameter from molecule test * Update documentation and metadata Update the README.md to match the currently known methods to install Smart Gateway Operator using the new operator-sdk migration. Update the metadata to cover the new version of the Operator from 4.0 to 5.0 major release number. * Update LICENSE copyright date range * Adjust documentation and bundle script generator
* Migrate to operator-sdk 1.16.0 Migrate the Smart Gateway Operator from operator-sdk 0.19.4 to operator-sdk 1.16.0 for OpenShift v4.9 and later. A lot of changes here are boilerplate as generated by operator-sdk init and operator-sdk create api commands. Once the boilerplate was created, then the existing Ansible role was moved over. As the Ansible Operator framework is now using collections, the name of the k8s module and filters were updated to reflect this in the Ansible roles/ directory. RBAC changes were also implemented to satisfy for the existing RBAC rules that were in place prior to the migration. A new smart-gateway role and service account were created in order to satisfy for the running sg-core workload along with the lated oauth-proxy image shipping with OCP v4.9. GitHub actions have been updated to reflect the changes to the ./build/generate_bundle.sh script, which was updated to use the new Makefile approach to building and running the Operator. Additionally, the sample CustomResource for a SmartGateway has been changed to leverage the dummy metrics plugin removing the requirement to have a running AMQ Interconnect router system and data source. This should make testing via molecule (or otherwise) easier in the future, allowing for component testing outside of system e2e testing. * Bump maxOpenShiftVersion to 4.10 Bump the maxOpenShiftVersion to OCP 4.10 which is the target for the next y-stream release of STF. * Resolve issues with CI testing * Ignore bin/ directory holding local kustomize binary * Resolve indentation inconsistency * Reduce RBAC permission for Smart Gateway instances * Drop superfluous resource from authentication API group * Bump operator version annotation * Test via molecule with KinD (#109) * Test via molecule with KinD Implement a basic molecule test with KinD in order to load the operator and the workload, then perform a basic assertion test to verify that everything is operational. More work will be required here as the test is effectively a stub at this point. In the molecule test needed to fake the creation of a TLS certificate that is normally dealt with in OpenShift without this effort. Created a locally self-signed certificate that is then loaded as the smartgateway-sample-proxy-tls Secret that is mounted by the Smart Gateway Deployment as a mounted volume for oauth-proxy. In an OpenShift environment the oauth-proxy is available in the local registry. Within a KinD environment that is not available. The existing oauth_proxy_image path is defaulted in the role, but is now exposed as a parameter to the SmartGateway custom resource. Because of the migration to CRD/v1 unknown parameters are no longer preserved by default. This new parameter is exposed primarily for testing but should be useful in the future as well. Adds another Makefile option called kustomize-local which doesn't have the same checks as the standard kustomize command. In the testing environment an incompatible version of kustomize may be installed, and it's not always possible to remove it. Using kustomize-local results in a specific kustomize version being added locally and then referenced by the testing infrastructure for consistency. * Remove extra --namespace parameter from molecule test * Update documentation and metadata Update the README.md to match the currently known methods to install Smart Gateway Operator using the new operator-sdk migration. Update the metadata to cover the new version of the Operator from 4.0 to 5.0 major release number. * Update LICENSE copyright date range * Adjust documentation and bundle script generator
Migrate the Smart Gateway Operator from operator-sdk 0.19.4 to
operator-sdk 1.16.0 for OpenShift v4.9 and later.
A lot of changes here are boilerplate as generated by operator-sdk init
and operator-sdk create api commands. Once the boilerplate was created,
then the existing Ansible role was moved over. As the Ansible Operator
framework is now using collections, the name of the k8s module and
filters were updated to reflect this in the Ansible roles/ directory.
RBAC changes were also implemented to satisfy for the existing RBAC
rules that were in place prior to the migration. A new smart-gateway
role and service account were created in order to satisfy for the
running sg-core workload along with the lated oauth-proxy image shipping
with OCP v4.9.
GitHub actions have been updated to reflect the changes to the
./build/generate_bundle.sh script, which was updated to use the new
Makefile approach to building and running the Operator.
Additionally, the sample CustomResource for a SmartGateway has been
changed to leverage the dummy metrics plugin removing the requirement to
have a running AMQ Interconnect router system and data source. This
should make testing via molecule (or otherwise) easier in the future,
allowing for component testing outside of system e2e testing.