Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to operator-sdk 1.16.0 #108

Merged
merged 12 commits into from
Mar 4, 2022
Merged

Migrate to operator-sdk 1.16.0 #108

merged 12 commits into from
Mar 4, 2022

Conversation

leifmadsen
Copy link
Member

Migrate the Smart Gateway Operator from operator-sdk 0.19.4 to
operator-sdk 1.16.0 for OpenShift v4.9 and later.

A lot of changes here are boilerplate as generated by operator-sdk init
and operator-sdk create api commands. Once the boilerplate was created,
then the existing Ansible role was moved over. As the Ansible Operator
framework is now using collections, the name of the k8s module and
filters were updated to reflect this in the Ansible roles/ directory.

RBAC changes were also implemented to satisfy for the existing RBAC
rules that were in place prior to the migration. A new smart-gateway
role and service account were created in order to satisfy for the
running sg-core workload along with the lated oauth-proxy image shipping
with OCP v4.9.

GitHub actions have been updated to reflect the changes to the
./build/generate_bundle.sh script, which was updated to use the new
Makefile approach to building and running the Operator.

Additionally, the sample CustomResource for a SmartGateway has been
changed to leverage the dummy metrics plugin removing the requirement to
have a running AMQ Interconnect router system and data source. This
should make testing via molecule (or otherwise) easier in the future,
allowing for component testing outside of system e2e testing.

Migrate the Smart Gateway Operator from operator-sdk 0.19.4 to
operator-sdk 1.16.0 for OpenShift v4.9 and later.

A lot of changes here are boilerplate as generated by operator-sdk init
and operator-sdk create api commands. Once the boilerplate was created,
then the existing Ansible role was moved over. As the Ansible Operator
framework is now using collections, the name of the k8s module and
filters were updated to reflect this in the Ansible roles/ directory.

RBAC changes were also implemented to satisfy for the existing RBAC
rules that were in place prior to the migration. A new smart-gateway
role and service account were created in order to satisfy for the
running sg-core workload along with the lated oauth-proxy image shipping
with OCP v4.9.

GitHub actions have been updated to reflect the changes to the
./build/generate_bundle.sh script, which was updated to use the new
Makefile approach to building and running the Operator.

Additionally, the sample CustomResource for a SmartGateway has been
changed to leverage the dummy metrics plugin removing the requirement to
have a running AMQ Interconnect router system and data source. This
should make testing via molecule (or otherwise) easier in the future,
allowing for component testing outside of system e2e testing.
Bump the maxOpenShiftVersion to OCP 4.10 which is the target for the next y-stream release of STF.
@leifmadsen leifmadsen self-assigned this Feb 24, 2022
@leifmadsen leifmadsen added do-not-merge Do Not Merge (WIP) needs testing Needs to be tested prior to merge labels Feb 24, 2022
Copy link
Contributor

@csibbitt csibbitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some initial comments while this is still draft.

I'm going to need to get up to speed with kustomize, since I've avoided it thus far.

Do the molecule tests work now? I don't have them working locally, but it's due to a problem with kustomize that I haven't started looking into.

My only concern so far is I think the RBAC on the new SG service account is over-broad.

.github/workflows/main.yml Show resolved Hide resolved
LICENSE Outdated Show resolved Hide resolved
@@ -0,0 +1,182 @@
# VERSION defines the project version for the bundle.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file based off boiler-plate generated by the new operator-sdk?

ETA: Yes it is https://sdk.operatorframework.io/docs/cli/operator-sdk_init/

@@ -0,0 +1,16 @@
domain: infra.watch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New boilerplate as well I guess? Answering my own questions here https://sdk.operatorframework.io/docs/cli/operator-sdk_init/

.gitignore Show resolved Hide resolved
config/rbac/sg_role.yaml Show resolved Hide resolved
config/rbac/sg_role.yaml Outdated Show resolved Hide resolved
- name: smartgateway-sample
ports:
- name: prom-http
port: 8083
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the benefit of other reviewers... This looks wrong (8081 != 8083) but is actually correct since that endpoint is protected by the oauth-proxy on port 8083

config/rbac/role.yaml Outdated Show resolved Hide resolved
@@ -34,15 +35,15 @@ spec:
- -tls-cert=/etc/tls/private/tls.crt
- -tls-key=/etc/tls/private/tls.key
- -cookie-secret-file=/etc/proxy/secrets/session_secret
- -openshift-service-account=NA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll chalk this up to a change in the oauth container. It kinda felt weird to me that it worked without the extra RBAC to begin with.

@leifmadsen
Copy link
Member Author

Just some initial comments while this is still draft.

I'm going to need to get up to speed with kustomize, since I've avoided it thus far.

Do the molecule tests work now? I don't have them working locally, but it's due to a problem with kustomize that I haven't started looking into.

My only concern so far is I think the RBAC on the new SG service account is over-broad.

Agreed on the RBAC. It's definitely overly broad and I'd like to review again and make sure things are appropriate before merge.

The molecule tests may or may not work. I will need to test them, but we have never used them previously, so I consider that a new feature. The old molecule tests probably didn't work either.

@csibbitt
Copy link
Contributor

Agreed on the RBAC. It's definitely overly broad and I'd like to review again and make sure things are appropriate before merge.

I'd cut it down to just the tokenreviews and subjectaccessreviews perms and see if it works like that. I don't recall off-hand if the service account itself needs read perms to mount secrets or configmaps to the pod, but I don't think so.

The molecule tests may or may not work. I will need to test them, but we have never used them previously, so I consider that a new feature. The old molecule tests probably didn't work either.

They definitely didn't work before, which is why I was excited to see them changing.

@leifmadsen
Copy link
Member Author

Initial testing looks really good. Need to do some more, but was able to build and deploy everything, and metrics SGs are operational and receiving data. No errors in logs wrt RBAC etc. I have not made any changes to the RBAC yet to bring scope back into compliance.

git diff

diff --git a/config/default/kustomization.yaml b/config/default/kustomization.yaml
index 65c4160..1828ce5 100644
--- a/config/default/kustomization.yaml
+++ b/config/default/kustomization.yaml
@@ -1,5 +1,5 @@
 # Adds namespace to all resources.
-namespace: smart-gateway-operator-system
+namespace: service-telemetry
 
 # Value of this field is prepended to the
 # names of all resources, e.g. a deployment named
make docker-build docker-push VERSION=5.0.0-dev5
make deploy VERSION=5.0.0-dev5

oc get pods
NAME                                                         READY   STATUS    RESTARTS   AGE
alertmanager-default-0                                       3/3     Running   0          9d
default-cops04-ceil-meter-smartgateway-9899557d5-rmpng       3/3     Running   0          11m
default-cops04-coll-meter-smartgateway-77ccb555d5-bbln7      3/3     Running   0          11m
default-cops04-sens-meter-smartgateway-7ffbcc5557-6nhbv      3/3     Running   0          11m
default-interconnect-675dd97bc4-bfkkn                        1/1     Running   0          9d
elastic-operator-7489cf8bd6-56c8x                            1/1     Running   152        14d
interconnect-operator-6bf74c4ffb-p9qdf                       1/1     Running   0          14d
prometheus-default-0                                         3/3     Running   1          9d
prometheus-operator-7f6948966b-h47px                         1/1     Running   0          14d
service-telemetry-operator-5f6488bbbf-t42vf                  1/1     Running   0          3d22h
smart-gateway-operator-controller-manager-5895b86b59-df9ct   2/2     Running   0          12m

# ran various oc logs ... commands to look at contents of pods, no errors, everything loaded properly


- name: Generate bundle
run: WORKING_DIR=/tmp/bundle ./build/generate_bundle.sh
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed use of WORKING_DIR here because it's not easily controlled without making changes to the Makefile, so I'm just going to use the default locations. It seems to be working fine in CI so I don't see the need to extend Makefile to allow separate directories.

@leifmadsen leifmadsen removed the needs testing Needs to be tested prior to merge label Mar 1, 2022
@leifmadsen
Copy link
Member Author

leifmadsen commented Mar 1, 2022

Dropping the needs-testing label since this is has been testing in-cluster with existing STF deployment and is now automatically tested (minimally) via KinD in GitHub Actions using molecule.

Edit: at least once #109 merges into this branch.

build/generate_bundle.sh Outdated Show resolved Hide resolved
@@ -1,3 +1,6 @@
.cache/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

* Test via molecule with KinD

Implement a basic molecule test with KinD in order to load the operator
and the workload, then perform a basic assertion test to verify that
everything is operational. More work will be required here as the test
is effectively a stub at this point.

In the molecule test needed to fake the creation of a TLS certificate
that is normally dealt with in OpenShift without this effort. Created a
locally self-signed certificate that is then loaded as the
smartgateway-sample-proxy-tls Secret that is mounted by the Smart
Gateway Deployment as a mounted volume for oauth-proxy.

In an OpenShift environment the oauth-proxy is available in the local
registry. Within a KinD environment that is not available. The existing
oauth_proxy_image path is defaulted in the role, but is now exposed as a
parameter to the SmartGateway custom resource. Because of the migration
to CRD/v1 unknown parameters are no longer preserved by default. This
new parameter is exposed primarily for testing but should be useful in
the future as well.

Adds another Makefile option called kustomize-local which doesn't have
the same checks as the standard kustomize command. In the testing
environment an incompatible version of kustomize may be installed, and
it's not always possible to remove it. Using kustomize-local results in
a specific kustomize version being added locally and then referenced by
the testing infrastructure for consistency.

* Remove extra --namespace parameter from molecule test
Update the README.md to match the currently known methods to install
Smart Gateway Operator using the new operator-sdk migration. Update the
metadata to cover the new version of the Operator from 4.0 to 5.0 major
release number.
@leifmadsen leifmadsen marked this pull request as ready for review March 4, 2022 20:25
@leifmadsen
Copy link
Member Author

I think I'm happy enough with this now that I'm going to merge it down. Some additional clean up and changes may be necessary but at this point it looks like everything is working as intended.

@leifmadsen leifmadsen merged commit 8328ed8 into master Mar 4, 2022
@leifmadsen leifmadsen deleted the operator-sdk-1.16 branch March 4, 2022 20:40
leifmadsen added a commit that referenced this pull request Mar 9, 2022
Revert changes to migrate Smart Gateway Operator to operator-sdk 1.16.0
until the same set of changes can be made to Service Telemetry Operator.

Changing how the operator is built causes a regression in the testing
framework for STF. Instead of making temporary changes to the testing
framework to deal with the changes in Operator-SDK 1.16.0, we'll make
changes to the testing framework when migrating Service Telemetry
Operator to operator-sdk 1.16.0 as well.

In the meantime, revert these changes which will just end up landing in
conjunction with the changes to Service Telemetry Operator as a bundled
change set.

This reverts commit 8328ed8.
leifmadsen added a commit that referenced this pull request Mar 9, 2022
Revert changes to migrate Smart Gateway Operator to operator-sdk 1.16.0
until the same set of changes can be made to Service Telemetry Operator.

Changing how the operator is built causes a regression in the testing
framework for STF. Instead of making temporary changes to the testing
framework to deal with the changes in Operator-SDK 1.16.0, we'll make
changes to the testing framework when migrating Service Telemetry
Operator to operator-sdk 1.16.0 as well.

In the meantime, revert these changes which will just end up landing in
conjunction with the changes to Service Telemetry Operator as a bundled
change set.

This reverts commit 8328ed8.
leifmadsen added a commit that referenced this pull request Mar 10, 2022
* Migrate to operator-sdk 1.16.0

Migrate the Smart Gateway Operator from operator-sdk 0.19.4 to
operator-sdk 1.16.0 for OpenShift v4.9 and later.

A lot of changes here are boilerplate as generated by operator-sdk init
and operator-sdk create api commands. Once the boilerplate was created,
then the existing Ansible role was moved over. As the Ansible Operator
framework is now using collections, the name of the k8s module and
filters were updated to reflect this in the Ansible roles/ directory.

RBAC changes were also implemented to satisfy for the existing RBAC
rules that were in place prior to the migration. A new smart-gateway
role and service account were created in order to satisfy for the
running sg-core workload along with the lated oauth-proxy image shipping
with OCP v4.9.

GitHub actions have been updated to reflect the changes to the
./build/generate_bundle.sh script, which was updated to use the new
Makefile approach to building and running the Operator.

Additionally, the sample CustomResource for a SmartGateway has been
changed to leverage the dummy metrics plugin removing the requirement to
have a running AMQ Interconnect router system and data source. This
should make testing via molecule (or otherwise) easier in the future,
allowing for component testing outside of system e2e testing.

* Bump maxOpenShiftVersion to 4.10

Bump the maxOpenShiftVersion to OCP 4.10 which is the target for the next y-stream release of STF.

* Resolve issues with CI testing

* Ignore bin/ directory holding local kustomize binary

* Resolve indentation inconsistency

* Reduce RBAC permission for Smart Gateway instances

* Drop superfluous resource from authentication API group

* Bump operator version annotation

* Test via molecule with KinD (#109)

* Test via molecule with KinD

Implement a basic molecule test with KinD in order to load the operator
and the workload, then perform a basic assertion test to verify that
everything is operational. More work will be required here as the test
is effectively a stub at this point.

In the molecule test needed to fake the creation of a TLS certificate
that is normally dealt with in OpenShift without this effort. Created a
locally self-signed certificate that is then loaded as the
smartgateway-sample-proxy-tls Secret that is mounted by the Smart
Gateway Deployment as a mounted volume for oauth-proxy.

In an OpenShift environment the oauth-proxy is available in the local
registry. Within a KinD environment that is not available. The existing
oauth_proxy_image path is defaulted in the role, but is now exposed as a
parameter to the SmartGateway custom resource. Because of the migration
to CRD/v1 unknown parameters are no longer preserved by default. This
new parameter is exposed primarily for testing but should be useful in
the future as well.

Adds another Makefile option called kustomize-local which doesn't have
the same checks as the standard kustomize command. In the testing
environment an incompatible version of kustomize may be installed, and
it's not always possible to remove it. Using kustomize-local results in
a specific kustomize version being added locally and then referenced by
the testing infrastructure for consistency.

* Remove extra --namespace parameter from molecule test

* Update documentation and metadata

Update the README.md to match the currently known methods to install
Smart Gateway Operator using the new operator-sdk migration. Update the
metadata to cover the new version of the Operator from 4.0 to 5.0 major
release number.

* Update LICENSE copyright date range

* Adjust documentation and bundle script generator
leifmadsen added a commit that referenced this pull request Mar 10, 2022
* Migrate to operator-sdk 1.16.0

Migrate the Smart Gateway Operator from operator-sdk 0.19.4 to
operator-sdk 1.16.0 for OpenShift v4.9 and later.

A lot of changes here are boilerplate as generated by operator-sdk init
and operator-sdk create api commands. Once the boilerplate was created,
then the existing Ansible role was moved over. As the Ansible Operator
framework is now using collections, the name of the k8s module and
filters were updated to reflect this in the Ansible roles/ directory.

RBAC changes were also implemented to satisfy for the existing RBAC
rules that were in place prior to the migration. A new smart-gateway
role and service account were created in order to satisfy for the
running sg-core workload along with the lated oauth-proxy image shipping
with OCP v4.9.

GitHub actions have been updated to reflect the changes to the
./build/generate_bundle.sh script, which was updated to use the new
Makefile approach to building and running the Operator.

Additionally, the sample CustomResource for a SmartGateway has been
changed to leverage the dummy metrics plugin removing the requirement to
have a running AMQ Interconnect router system and data source. This
should make testing via molecule (or otherwise) easier in the future,
allowing for component testing outside of system e2e testing.

* Bump maxOpenShiftVersion to 4.10

Bump the maxOpenShiftVersion to OCP 4.10 which is the target for the next y-stream release of STF.

* Resolve issues with CI testing

* Ignore bin/ directory holding local kustomize binary

* Resolve indentation inconsistency

* Reduce RBAC permission for Smart Gateway instances

* Drop superfluous resource from authentication API group

* Bump operator version annotation

* Test via molecule with KinD (#109)

* Test via molecule with KinD

Implement a basic molecule test with KinD in order to load the operator
and the workload, then perform a basic assertion test to verify that
everything is operational. More work will be required here as the test
is effectively a stub at this point.

In the molecule test needed to fake the creation of a TLS certificate
that is normally dealt with in OpenShift without this effort. Created a
locally self-signed certificate that is then loaded as the
smartgateway-sample-proxy-tls Secret that is mounted by the Smart
Gateway Deployment as a mounted volume for oauth-proxy.

In an OpenShift environment the oauth-proxy is available in the local
registry. Within a KinD environment that is not available. The existing
oauth_proxy_image path is defaulted in the role, but is now exposed as a
parameter to the SmartGateway custom resource. Because of the migration
to CRD/v1 unknown parameters are no longer preserved by default. This
new parameter is exposed primarily for testing but should be useful in
the future as well.

Adds another Makefile option called kustomize-local which doesn't have
the same checks as the standard kustomize command. In the testing
environment an incompatible version of kustomize may be installed, and
it's not always possible to remove it. Using kustomize-local results in
a specific kustomize version being added locally and then referenced by
the testing infrastructure for consistency.

* Remove extra --namespace parameter from molecule test

* Update documentation and metadata

Update the README.md to match the currently known methods to install
Smart Gateway Operator using the new operator-sdk migration. Update the
metadata to cover the new version of the Operator from 4.0 to 5.0 major
release number.

* Update LICENSE copyright date range

* Adjust documentation and bundle script generator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge Do Not Merge (WIP)
Development

Successfully merging this pull request may close these issues.

None yet

3 participants