Skip to content

Commit

Permalink
Support EFS on ROSA for ReadWriteMany PVCs (#387)
Browse files Browse the repository at this point in the history
Closes #386
  • Loading branch information
ahus1 committed Jun 26, 2023
1 parent 4d46884 commit 45df050
Show file tree
Hide file tree
Showing 14 changed files with 374 additions and 12 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/rosa-cluster-create.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@ on:
inputs:
clusterName:
description: 'Name of the cluster'
type: text
type: string
computeMachineType:
description: 'Instance type for the compute nodes'
required: true
default: m5.xlarge
type: text
type: string
multiAz:
description: 'Deploy to multiple availability zones in the region'
required: true
Expand All @@ -19,8 +19,8 @@ on:
replicas:
description: 'Number of worker nodes to provision'
required: true
default: 2
type: text
default: '2'
type: string

env:
OPENSHIFT_VERSION: 4.12.21
Expand Down
10 changes: 8 additions & 2 deletions .github/workflows/rosa-cluster-delete.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@ on:
inputs:
clusterName:
description: 'Name of the cluster'
type: text
type: string
deleteAll:
description: 'Delete all clusters'
required: true
default: 'no'
type: text
type: string
jobs:

delete:
Expand All @@ -28,12 +28,18 @@ jobs:
aws-default-region: ${{ vars.AWS_DEFAULT_REGION }}
rosa-token: ${{ secrets.ROSA_TOKEN }}

- name: Login to OpenShift cluster
uses: ./.github/actions/oc-keycloak-login
with:
clusterName: ${{ inputs.clusterName || format('gh-{0}', github.repository_owner) }}

- name: Delete a ROSA Cluster
if: ${{ inputs.deleteAll == 'no' }}
run: ./rosa_delete_cluster.sh
working-directory: provision/aws
env:
CLUSTER_NAME: ${{ inputs.clusterName || format('gh-{0}', github.repository_owner) }}
REGION: ${{ vars.AWS_DEFAULT_REGION }}

- name: Delete all ROSA Clusters
if: ${{ inputs.deleteAll == 'yes' }}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,16 @@
= Installing OpenShift on AWS
:description: OpenShift is a pre-requisite if the setup is about to be tested on OpenShift.
:description: Red Hat OpenShift Service on AWS (ROSA) provides an OpenShift instance to run Keycloak.

{description}

== About

This module is intended to automate tasks around provisioning OpenShift clusters in AWS via ROSA tool, as described in the https://console.redhat.com/openshift/create/rosa/getstarted[ROSA installation guide].
The scripts are located in the folder `provision/aws` in this repository.

It will also install EFS as the storage provider for ReadWriteMany PersistentVolumeClaims with the storage class `efs-sc`.
See <<aws-efs-as-readwritemany-storage>> for more information.

== Prerequisites

. Install the https://aws.amazon.com/cli/[AWS CLI]
Expand All @@ -15,6 +22,11 @@ curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip
unzip awscliv2.zip
sudo ./aws/install
----
.. Enable https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-completion.html[shell auto-completion] for bash by adding the following to your `~/.bashrc`:
+
----
complete -C '/usr/local/bin/aws_completer' aws
----
.. Create Access keys in https://us-east-1.console.aws.amazon.com/iamv2/home?region=us-east-1#/users[AWS Identity and Access Management ]
... Click on your user account
... Click on *Security credentials*
Expand Down Expand Up @@ -108,9 +120,27 @@ rosa describe cluster -c _cluster-name_

The above installation script creates an admin user automatically but in case the user needs to be re-created it can be done via the `rosa_recreate_admin.sh` script, providing the `CLUSTER_NAME` and optionally `ADMIN_PASSWORD` parameter.

[#aws-efs-as-readwritemany-storage]
== AWS Elastic File Service as ReadWriteMany storage

This setup installs EFS as the storage provider for ReadWriteMany PersistentVolumeClaims with the storage class `efs-sc`.

Using the scripts `rosa_efs_create.sh` and `rosa_efs_delete.sh`, the EFS configuration can be added and removed.
Those are intended to be called from `rosa_create_cluster.sh` and `rosa_delete_cluster.sh` respectively.

Even when the scripts have completed, it might take a little while until the DNS in the PVC picks up the new IP address of the mount point.
In the meantime, you might see an error message like "`Failed to resolve server _file-system-id_.efs._aws-region_.amazonaws.com`".

The following docs have been used to set up EFS:

* https://docs.openshift.com/rosa/storage/container_storage_interface/osd-persistent-storage-aws-efs-csi.html[Official OpenShift docs: Setting up AWS Elastic File Service CSI Driver Operator]
* https://mobb.ninja/docs/rosa/aws-efs/[Community docs: Enabling the AWS EFS CSI Driver Operator on ROSA]
* https://access.redhat.com/articles/6966373[Red Hat knowledge base article: AWS EFS CSI Driver Operator installation guide in OCP]

== Rotate admin user password

The admin user password can be rotated via the `rosa_rotate_admin_password.sh` script. Note admin password for existing clusters are not updated. The new password can be applied using script `rosa_recreate_admin.sh` with corresponding `CLUSTER_NAME` variable.
The admin user password can be rotated via the `rosa_rotate_admin_password.sh` script.Note admin password for existing clusters are not updated.
The new password can be applied using script `rosa_recreate_admin.sh` with corresponding `CLUSTER_NAME` variable.

== Uninstallation

Expand Down
2 changes: 2 additions & 0 deletions provision/aws/efs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
manifests
ccoctl
19 changes: 19 additions & 0 deletions provision/aws/efs/aws-efs-csi-driver-operator.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
generateName: openshift-cluster-csi-drivers-
namespace: openshift-cluster-csi-drivers
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
labels:
operators.coreos.com/aws-efs-csi-driver-operator.openshift-cluster-csi-drivers: ""
name: aws-efs-csi-driver-operator
namespace: openshift-cluster-csi-drivers
spec:
channel: stable
installPlanApproval: Automatic
name: aws-efs-csi-driver-operator
source: redhat-operators
sourceNamespace: openshift-marketplace
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
apiVersion: cloudcredential.openshift.io/v1
kind: CredentialsRequest
metadata:
name: openshift-aws-efs-csi-driver
namespace: openshift-cloud-credential-operator
spec:
providerSpec:
apiVersion: cloudcredential.openshift.io/v1
kind: AWSProviderSpec
statementEntries:
- action:
- elasticfilesystem:*
effect: Allow
resource: '*'
secretRef:
name: aws-efs-cloud-credentials
namespace: openshift-cluster-csi-drivers
serviceAccountNames:
- aws-efs-csi-driver-operator
- aws-efs-csi-driver-controller-sa
6 changes: 6 additions & 0 deletions provision/aws/efs/efs-csi-aws-com-cluster-csi-driver.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
apiVersion: operator.openshift.io/v1
kind: ClusterCSIDriver
metadata:
name: efs.csi.aws.com
spec:
managementState: Managed
6 changes: 4 additions & 2 deletions provision/aws/rosa_create_cluster.sh
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@ else

ROSA_CMD="rosa create cluster \
--sts \
--cluster-name "${CLUSTER_NAME}" \
--version "${VERSION}" \
--cluster-name ${CLUSTER_NAME} \
--version ${VERSION} \
--role-arn arn:aws:iam::${AWS_ACCOUNT}:role/ManagedOpenShift-Installer-Role \
--support-role-arn arn:aws:iam::${AWS_ACCOUNT}:role/ManagedOpenShift-Support-Role \
--controlplane-iam-role arn:aws:iam::${AWS_ACCOUNT}:role/ManagedOpenShift-ControlPlane-Role \
Expand Down Expand Up @@ -69,3 +69,5 @@ echo "Cluster installation complete."
echo

./rosa_recreate_admin.sh

./rosa_efs_create.sh
3 changes: 3 additions & 0 deletions provision/aws/rosa_delete_cluster.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ fi
CLUSTER_NAME=${CLUSTER_NAME:-$(whoami)}
if [ -z "$CLUSTER_NAME" ]; then echo "Variable CLUSTER_NAME needs to be set."; exit 1; fi

# Cleanup might fail eif EFS hasn't been configured for the cluster. Ignore any failures and continue
./rosa_efs_delete.sh || true

CLUSTER_ID=$(rosa describe cluster --cluster "$CLUSTER_NAME" | grep -oPm1 "^ID:\s*\K\w+")
echo "CLUSTER_ID: $CLUSTER_ID"

Expand Down
119 changes: 119 additions & 0 deletions provision/aws/rosa_efs_create.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
#!/bin/bash
# This automated the setup of EFS as a RWX storage in ROSA. It is based on the following information:
# * https://access.redhat.com/articles/6966373
# * https://mobb.ninja/docs/rosa/aws-efs/
# * https://docs.openshift.com/rosa/storage/container_storage_interface/osd-persistent-storage-aws-efs-csi.html

set -xeo pipefail

if [ -f ./.env ]; then
source ./.env
fi

AWS_REGION=${REGION}
OIDC_PROVIDER=$(oc get authentication.config.openshift.io cluster -o json \
| jq -r .spec.serviceAccountIssuer| sed -e "s/^https:\/\///")
AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)

cd efs

oc create -f aws-efs-csi-driver-operator.yaml

CCO_POD_NAME=$(oc get po -n openshift-cloud-credential-operator -l app=cloud-credential-operator -o jsonpath='{.items[*].metadata.name}')

oc cp -c cloud-credential-operator openshift-cloud-credential-operator/${CCO_POD_NAME}:/usr/bin/ccoctl ./ccoctl --retries=999

chmod 775 ./ccoctl

./ccoctl aws create-iam-roles --name=${CLUSTER_NAME} --region=${AWS_REGION} --credentials-requests-dir=credentialRequests/ --identity-provider-arn=arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}

oc create -f manifests/openshift-cluster-csi-drivers-aws-efs-cloud-credentials-credentials.yaml

oc create -f efs-csi-aws-com-cluster-csi-driver.yaml

kubectl wait --for=condition=AWSEFSDriverNodeServiceControllerAvailable --timeout=300s clustercsidriver.operator.openshift.io/efs.csi.aws.com
kubectl wait --for=condition=AWSEFSDriverControllerServiceControllerAvailable --timeout=300s clustercsidriver.operator.openshift.io/efs.csi.aws.com

NODE=$(oc get nodes --selector=node-role.kubernetes.io/worker \
-o jsonpath='{.items[0].metadata.name}')
VPC=$(aws ec2 describe-instances \
--filters "Name=private-dns-name,Values=$NODE" \
--output json \
--query 'Reservations[*].Instances[*].{VpcId:VpcId}' \
--region $AWS_REGION \
| jq -r '.[0][0].VpcId')
CIDR=$(aws ec2 describe-vpcs \
--filters "Name=vpc-id,Values=$VPC" \
--query 'Vpcs[*].CidrBlock' \
--output json \
--region $AWS_REGION \
| jq -r '.[0]')
SG=$(aws ec2 describe-instances --filters \
"Name=private-dns-name,Values=$NODE" \
--query 'Reservations[*].Instances[*].{SecurityGroups:SecurityGroups}' \
--output json \
--region $AWS_REGION \
| jq -r '.[0][0].SecurityGroups[0].GroupId')
echo "CIDR - $CIDR, SG - $SG"

aws ec2 authorize-security-group-ingress \
--group-id $SG \
--protocol tcp \
--port 2049 \
--output json \
--region $AWS_REGION \
--cidr $CIDR | jq .

SUBNET=$(aws ec2 describe-subnets \
--filters Name=vpc-id,Values=$VPC Name=tag:Name,Values='*-private*' \
--query 'Subnets[*].{SubnetId:SubnetId}' \
--output json \
--region $AWS_REGION \
| jq -r '.[0].SubnetId')
AWS_ZONE=$(aws ec2 describe-subnets --filters Name=subnet-id,Values=$SUBNET \
--output json \
--region $AWS_REGION | jq -r '.Subnets[0].AvailabilityZone')

EFS=$(aws efs create-file-system --creation-token efs-token-${CLUSTER_NAME} \
--availability-zone-name $AWS_ZONE \
--output json \
--tags Key=Name,Value=${CLUSTER_NAME} \
--region $AWS_REGION \
--encrypted | jq -r '.FileSystemId')
echo $EFS

cat <<EOF | oc apply -f -
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: efs-sc
provisioner: efs.csi.aws.com
parameters:
provisioningMode: efs-ap
fileSystemId: $EFS
directoryPerms: "700"
gidRangeStart: "1000"
gidRangeEnd: "2000"
basePath: "/dynamic_provisioning"
EOF

while true; do
LIFECYCLE_STATE="$(aws efs describe-file-systems --file-system-id $EFS --region $AWS_REGION --output json | jq -r '.FileSystems[0].LifeCycleState')"
if [[ "${LIFECYCLE_STATE}" == "available" ]]; then break; fi
sleep 1
echo -n '.'
done

for SUBNET in $(aws ec2 describe-subnets \
--filters Name=vpc-id,Values=$VPC Name=tag:Name,Values='*-private*' \
--query 'Subnets[*].{SubnetId:SubnetId}' \
--output json \
--region $AWS_REGION \
| jq -r '.[].SubnetId'); do \
MOUNT_TARGET=$(aws efs create-mount-target --file-system-id $EFS \
--subnet-id $SUBNET --security-groups $SG \
--output json \
--region $AWS_REGION \
| jq -r '.MountTargetId'); \
echo $MOUNT_TARGET; \
done
82 changes: 82 additions & 0 deletions provision/aws/rosa_efs_delete.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
#!/bin/bash
# don't use 'set -e' here, as we want to cleanup also half-installed EFS setups

if [ -f ./.env ]; then
source ./.env
fi

export AWS_REGION=${REGION}

EFS=$(oc get sc/efs-sc -o jsonpath='{.parameters.fileSystemId}')

for MOUNT_TARGET in $(aws efs describe-mount-targets \
--region=$AWS_REGION \
--file-system-id=$EFS \
--output json \
| jq -r '.MountTargets[].MountTargetId'); do
aws efs delete-mount-target --mount-target-id $MOUNT_TARGET --region $AWS_REGION
done

while true; do
LIFECYCLE_STATE="$(aws efs describe-mount-targets \
--region=$AWS_REGION \
--file-system-id=$EFS \
--output json \
| jq -r '.MountTargets[].MountTargetId')"
if [[ "${LIFECYCLE_STATE}" == "" ]]; then break; fi
sleep 1
echo -n '.'
done

aws efs delete-file-system --file-system-id $EFS --region $AWS_REGION

NODE=$(oc get nodes --selector=node-role.kubernetes.io/worker \
-o jsonpath='{.items[0].metadata.name}')
VPC=$(aws ec2 describe-instances \
--filters "Name=private-dns-name,Values=$NODE" \
--output json \
--query 'Reservations[*].Instances[*].{VpcId:VpcId}' \
--region $AWS_REGION \
| jq -r '.[0][0].VpcId')
CIDR=$(aws ec2 describe-vpcs \
--filters "Name=vpc-id,Values=$VPC" \
--query 'Vpcs[*].CidrBlock' \
--output json \
--region $AWS_REGION \
| jq -r '.[0]')
SG=$(aws ec2 describe-instances --filters \
"Name=private-dns-name,Values=$NODE" \
--query 'Reservations[*].Instances[*].{SecurityGroups:SecurityGroups}' \
--output json \
--region $AWS_REGION \
| jq -r '.[0][0].SecurityGroups[0].GroupId')
echo "CIDR - $CIDR, SG - $SG"

aws ec2 revoke-security-group-ingress \
--group-id $SG \
--protocol tcp \
--region $AWS_REGION \
--port 2049 \
--cidr $CIDR

cd efs

CCO_POD_NAME=$(oc get po -n openshift-cloud-credential-operator -l app=cloud-credential-operator -o jsonpath='{.items[*].metadata.name}')

oc cp -c cloud-credential-operator openshift-cloud-credential-operator/${CCO_POD_NAME}:/usr/bin/ccoctl ./ccoctl --retries=999

chmod 775 ./ccoctl

./ccoctl aws delete --name=${CLUSTER_NAME} --region=${AWS_REGION}

oc delete storageclass efs-sc

oc delete -n openshift-cluster-csi-drivers Subscription aws-efs-csi-driver-operator

oc delete -n openshift-cluster-csi-drivers Secret aws-efs-cloud-credentials

oc delete ClusterCSIDriver efs.csi.aws.com

for OPERATOR_GROUP in $(oc get -n openshift-cluster-csi-drivers OperatorGroup -o name); do
oc delete -n openshift-cluster-csi-drivers $OPERATOR_GROUP
done
Loading

0 comments on commit 45df050

Please sign in to comment.