Skip to content

Commit

Permalink
Active/Active XSite fencing. Resolves keycloak/keycloak#29303
Browse files Browse the repository at this point in the history
- User alert routing enabled on ROSA clusters

- PrometheusRule used to trigger AWS Lambda webhook in the event of a
  split-brain so that only a single site remains in the global accelerator endpoints

- Global Accelerator scripts refactored to use OpenTofu when creating
  AWS resources

- Task created to deploy/undeploy Active/Active

- Task created to simulate split-brain scenarios

- 'active-active' flag added to GH actions to differentiate between
  active/passive and active/active deployments

- 'active-active' and 'active-passive' tags added to crossdc-tests to
  allow different behaviours/tests to be executed for the given
  deployment type.

- Active/Active specific test cases added. Testsuite now interacts
  directly with k8s clusters in order to have greater control over
  deployments being tested. This is necessary so that we can simulate
  split-brain scenarios between sites.

- Daily scheduled job updated to run tests against both active/passive
  and active/active deployments

Signed-off-by: Ryan Emerson <remerson@redhat.com>
Co-authored-by: Michal Hajas <mhajas@redhat.com>
Co-authored-by: Pedro Ruivo <pruivo@users.noreply.github.com>
Signed-off-by: Ryan Emerson <remerson@redhat.com>
  • Loading branch information
3 people committed Jun 10, 2024
1 parent 6e81de2 commit d612cda
Show file tree
Hide file tree
Showing 49 changed files with 1,703 additions and 279 deletions.
40 changes: 39 additions & 1 deletion .github/workflows/rosa-cluster-auto-provision-on-schedule.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,11 +53,49 @@ jobs:
createCluster: false
secrets: inherit

run-scaling-benchmark-with-peristent-sessions:
run-scaling-benchmark-with-persistent-sessions:
needs: keycloak-deploy-with-persistent-sessions
uses: ./.github/workflows/rosa-scaling-benchmark.yml
with:
clusterName: gh-keycloak-a # ${{ env.CLUSTER_PREFIX }}-a -- unfortunately 'env.' doesn't work here ${{ env.CLUSTER_PREFIX }}-a
skipCreateDataset: true
outputArchiveSuffix: 'persistent-sessions'
secrets: inherit

keycloak-undeploy-with-persistent-sessions:
needs: run-scaling-benchmark-with-persistent-sessions
name: Undeploy Keycloak deployment on the multi-az cluster
if: github.event_name != 'schedule' || github.repository == 'keycloak/keycloak-benchmark'
uses: ./.github/workflows/rosa-multi-az-cluster-undeploy.yml
with:
clusterPrefix: gh-keycloak # ${{ env.CLUSTER_PREFIX }} -- unfortunately 'env.' doesn't work here
skipAuroraDeletion: true
secrets: inherit

keycloak-deploy-active-active:
needs: keycloak-undeploy-with-persistent-sessions
name: ROSA Scheduled Create Active/Active cluster with Persistent Sessions
if: github.event_name != 'schedule' || github.repository == 'keycloak/keycloak-benchmark'
uses: ./.github/workflows/rosa-multi-az-cluster-create.yml
with:
clusterPrefix: gh-keycloak # ${{ env.CLUSTER_PREFIX }} -- unfortunately 'env.' doesn't work here
enablePersistentSessions: true
createCluster: false
activeActive: true
secrets: inherit

run-functional-tests-active-active:
needs: keycloak-deploy-active-active
uses: ./.github/workflows/rosa-run-crossdc-func-tests.yml
with:
activeActive: true
clusterPrefix: gh-keycloak # ${{ env.CLUSTER_PREFIX }} -- unfortunately 'env.' doesn't work here
secrets: inherit

run-scaling-benchmark-active-active:
needs: run-functional-tests-active-active
uses: ./.github/workflows/rosa-scaling-benchmark.yml
with:
clusterName: gh-keycloak-a # ${{ env.CLUSTER_PREFIX }}-a -- unfortunately 'env.' doesn't work here ${{ env.CLUSTER_PREFIX }}-a
outputArchiveSuffix: 'active-active'
secrets: inherit
63 changes: 58 additions & 5 deletions .github/workflows/rosa-multi-az-cluster-create.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ on:
keycloakRepository:
description: 'The repository to deploy Keycloak from. If not set nightly image is used'
type: string
activeActive:
description: 'When true deploy an Active/Active Keycloak deployment'
type: boolean
default: false
enablePersistentSessions:
description: 'To enable Persistent user and client sessions to the DB'
type: boolean
Expand All @@ -32,16 +36,20 @@ on:
description: 'The AWS region to create both clusters in. Defaults to "vars.AWS_DEFAULT_REGION" if omitted.'
type: string
createCluster:
description: 'Check to Create Cluster'
description: 'Check to Create Cluster.'
type: boolean
default: true
keycloakRepository:
description: 'The repository to deploy Keycloak from. If not set nightly image is used'
type: string
activeActive:
description: 'When true deploy an Active/Active Keycloak deployment'
type: boolean
default: false
enablePersistentSessions:
description: 'To enable Persistent user and client sessions to the DB'
type: boolean
default: false
keycloakRepository:
description: 'The repository to deploy Keycloak from. If not set nightly image is used'
type: string
keycloakBranch:
description: 'The branch to deploy Keycloak from. If not set nightly image is used'
type: string
Expand Down Expand Up @@ -109,6 +117,11 @@ jobs:
- name: Checkout repository
uses: actions/checkout@v4

- name: Setup OpenTofu
uses: opentofu/setup-opentofu@v1
with:
tofu_wrapper: false

- name: Setup ROSA CLI
uses: ./.github/actions/rosa-cli-setup
with:
Expand Down Expand Up @@ -140,6 +153,7 @@ jobs:
ROSA_CLUSTER_NAME_2: ${{ env.CLUSTER_PREFIX }}-b

- name: Create Route53 Loadbalancer
if: ${{ !inputs.activeActive }}
working-directory: provision/rosa-cross-dc
run: |
task route53 > route53
Expand All @@ -150,10 +164,49 @@ jobs:
ROSA_CLUSTER_NAME_1: ${{ env.CLUSTER_PREFIX }}-a
ROSA_CLUSTER_NAME_2: ${{ env.CLUSTER_PREFIX }}-b

- name: Deploy
- name: Deploy Active/Passive
if: ${{ !inputs.activeActive }}
working-directory: provision/rosa-cross-dc
run: task
env:
AURORA_CLUSTER: ${{ env.CLUSTER_PREFIX }}
AURORA_REGION: ${{ env.REGION }}
ROSA_CLUSTER_NAME_1: ${{ env.CLUSTER_PREFIX }}-a
ROSA_CLUSTER_NAME_2: ${{ env.CLUSTER_PREFIX }}-b
KC_ACTIVE_ACTIVE: ${{ inputs.activeActive }}
KC_CPU_REQUESTS: 6
KC_INSTANCES: 3
KC_DISABLE_STICKY_SESSION: true
KC_PERSISTENT_SESSIONS: ${{ env.KC_PERSISTENT_SESSIONS }}
KC_MEMORY_REQUESTS_MB: 3000
KC_MEMORY_LIMITS_MB: 4000
KC_DB_POOL_INITIAL_SIZE: 30
KC_DB_POOL_MAX_SIZE: 30
KC_DB_POOL_MIN_SIZE: 30
KC_DATABASE: "aurora-postgres"
MULTI_AZ: "true"
KC_REPOSITORY: ${{ inputs.keycloakRepository }}
KC_BRANCH: ${{ inputs.keycloakBranch }}

- name: Create Accelerator Loadbalancer
if: ${{ inputs.activeActive }}
working-directory: provision/rosa-cross-dc
run: |
task global-accelerator-create 2>&1 | tee accelerator
echo "ACCELERATOR_DNS=$(grep -Po 'ACCELERATOR DNS: \K.*' accelerator)" >> $GITHUB_ENV
echo "ACCELERATOR_WEBHOOK=$(grep -Po 'ACCELERATOR WEBHOOK: \K.*' accelerator)" >> $GITHUB_ENV
env:
ACCELERATOR_NAME: ${{ env.CLUSTER_PREFIX }}
ROSA_CLUSTER_NAME_1: ${{ env.CLUSTER_PREFIX }}-a
ROSA_CLUSTER_NAME_2: ${{ env.CLUSTER_PREFIX }}-b

- name: Deploy Active/Active
if: ${{ inputs.activeActive }}
working-directory: provision/rosa-cross-dc
run: task active-active
env:
ACCELERATOR_DNS: ${{ env.ACCELERATOR_DNS }}
ACCELERATOR_WEBHOOK_URL: ${{ env.ACCELERATOR_WEBHOOK }}
AURORA_CLUSTER: ${{ env.CLUSTER_PREFIX }}
AURORA_REGION: ${{ env.REGION }}
ROSA_CLUSTER_NAME_1: ${{ env.CLUSTER_PREFIX }}-a
Expand Down
19 changes: 16 additions & 3 deletions .github/workflows/rosa-multi-az-cluster-delete.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ on:
type: string

jobs:
route53:
loadbalancer:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
Expand Down Expand Up @@ -40,19 +40,32 @@ jobs:
echo "SUBDOMAIN=$(echo $KEYCLOAK_URL | grep -oP '(?<=client.).*?(?=.keycloak-benchmark.com)')" >> $GITHUB_ENV
- name: Delete Route53 Records
run: |
./provision/aws/route53/route53_delete.sh
run: ./provision/aws/route53/route53_delete.sh
env:
SUBDOMAIN: ${{ env.SUBDOMAIN }}

- name: Set ACCELERATOR_DNS env variable for Global Accelerator processing
run: |
echo "ACCELERATOR_DNS=${KEYCLOAK_URL#"https://"}" >> $GITHUB_ENV
- name: Delete Global Accelerator
run: ./provision/aws/global-accelerator/accelerator_multi_az_delete.sh
env:
ACCELERATOR_DNS: ${{ env.ACCELERATOR_DNS }}
CLUSTER_1: ${{ inputs.clusterPrefix }}-a
CLUSTER_2: ${{ inputs.clusterPrefix }}-b
KEYCLOAK_NAMESPACE: runner-keycloak

cluster1:
needs: loadbalancer
uses: ./.github/workflows/rosa-cluster-delete.yml
with:
clusterName: ${{ inputs.clusterPrefix }}-a
deleteAll: no
secrets: inherit

cluster2:
needs: loadbalancer
uses: ./.github/workflows/rosa-cluster-delete.yml
with:
clusterName: ${{ inputs.clusterPrefix }}-b
Expand Down
38 changes: 22 additions & 16 deletions .github/workflows/rosa-run-crossdc-func-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,20 @@ on:
clusterPrefix:
description: 'The prefix used when creating the Cross DC clusters'
type: string
activeActive:
description: 'Must be true when testing against an Active/Active Keycloak deployment'
type: boolean
default: false

workflow_dispatch:
inputs:
clusterPrefix:
description: 'The prefix used when creating the Cross DC clusters'
type: string
activeActive:
description: 'Must be true when testing against an Active/Active Keycloak deployment'
type: boolean
default: false

concurrency:
# Only run once for the latest commit per ref and cancel other (previous) runs.
Expand All @@ -32,6 +40,7 @@ jobs:
distribution: 'temurin'
java-version: '17'
cache: 'maven'

- name: Cache Maven Wrapper
uses: actions/cache@v4
with:
Expand All @@ -40,37 +49,34 @@ jobs:
key: ${{ runner.os }}-maven-wrapper-${{ hashFiles('**/maven-wrapper.properties') }}
restore-keys: |
${{ runner.os }}-maven-wrapper-
- name: Setup ROSA CLI
uses: ./.github/actions/rosa-cli-setup
with:
aws-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-default-region: ${{ vars.AWS_DEFAULT_REGION }}
rosa-token: ${{ secrets.ROSA_TOKEN }}

- name: Login to OpenShift cluster A
uses: ./.github/actions/oc-keycloak-login
with:
clusterName: ${{ inputs.clusterPrefix }}-a
- name: Get DC1 URLs

- name: Get DC1 Context
shell: bash
run: |
KEYCLOAK_DC1_URL=https://$(kubectl get routes -n "${{ env.PROJECT }}" aws-health-route -o jsonpath='{.spec.host}')
echo "KEYCLOAK_DC1_URL=$KEYCLOAK_DC1_URL" >> "$GITHUB_ENV"
LOAD_BALANCER_URL=https://$(kubectl get routes -n "${{ env.PROJECT }}" -l app=keycloak -o jsonpath='{.items[*].spec.host}')
echo "LOAD_BALANCER_URL=$LOAD_BALANCER_URL" >> "$GITHUB_ENV"
ISPN_DC1_URL=https://$(kubectl get routes -n "${{ env.PROJECT }}" -l app=infinispan-service-external -o jsonpath='{.items[*].spec.host}')
echo "ISPN_DC1_URL=$ISPN_DC1_URL" >> "$GITHUB_ENV"
run: echo "KUBERNETES_1_CONTEXT=$(kubectl config current-context)" >> "$GITHUB_ENV"

- name: Login to OpenShift cluster B
uses: ./.github/actions/oc-keycloak-login
with:
clusterName: ${{ inputs.clusterPrefix }}-b
- name: Get DC2 URLs

- name: Get DC2 Context
shell: bash
run: |
KEYCLOAK_DC2_URL=https://$(kubectl get routes -n "${{ env.PROJECT }}" aws-health-route -o jsonpath='{.spec.host}')
echo "KEYCLOAK_DC2_URL=$KEYCLOAK_DC2_URL" >> "$GITHUB_ENV"
ISPN_DC2_URL=https://$(kubectl get routes -n "${{ env.PROJECT }}" -l app=infinispan-service-external -o jsonpath='{.items[*].spec.host}')
echo "ISPN_DC2_URL=$ISPN_DC2_URL" >> "$GITHUB_ENV"
run: echo "KUBERNETES_2_CONTEXT=$(kubectl config current-context)" >> "$GITHUB_ENV"

- name: Run CrossDC functional tests
run: |
./provision/rosa-cross-dc/keycloak-benchmark-crossdc-tests/run-crossdc-tests.sh
run: ./provision/rosa-cross-dc/keycloak-benchmark-crossdc-tests/run-crossdc-tests.sh
env:
ACTIVE_ACTIVE: ${{ inputs.activeActive }}
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -101,3 +101,4 @@ provision/environment_data.json
**/*.tfstate*
**/*.terraform*
!**/*.terraform.lock.hcl
provision/opentofu/modules/aws/accelerator/builds/*
1 change: 1 addition & 0 deletions doc/kubernetes/modules/ROOT/examples/stonith_lambda.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,9 +76,10 @@ oc login https://api.**<domain name>**:6443 -u **<username>**

NOTE: The session will expire approximately one a day, and you'll need to re-login.

== Enable user workload monitoring
== Enable alert routing for user-defined projects

By default, OpenShift HCP doesn't enable alert routing for user-defined projects.

By default, OpenShift doesn't monitor user workloads.
Apply the following ConfigMap link:{github-files}/provision/openshift/cluster-monitoring-config.yaml[cluster-monitoring-config.yaml] which is located in the `/provision/openshift` folder to OpenShift:

[source,bash]
Expand All @@ -93,14 +94,11 @@ After this has been deployed, several new pods spin up in the *openshift-user-wo
kubectl get pods -n openshift-user-workload-monitoring
----

The metrics and targets are then available in the menu entry *Observe* in the OpenShift console.

Additional steps are necessary to enable persistent volumes for the recorded metrics.
Alerts defined in `PrometheusRule` CR are then available to view in the menu entry *Observe->Alerting* in the OpenShift console.

Further reading:

* https://docs.openshift.com/container-platform/4.12/monitoring/configuring-the-monitoring-stack.html[Configure OpenShift monitoring stack]
* https://docs.openshift.com/container-platform/4.12/monitoring/enabling-monitoring-for-user-defined-projects.html[Enabling monitoring for user-defined projects]
* https://docs.openshift.com/rosa/observability/monitoring/enabling-alert-routing-for-user-defined-projects.html[Enabling alert routing for user-defined projects]

[#switching-between-different-kubernetes-clusters]
== Switching between different Kubernetes clusters
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
= Bring Active/Active site offline
:description: This guide describes how to bring an Active/Active site online so that it can process client requests.

{description}

== When to use procedure

This procedure describes how to re-add a Keycloak site to the Global Accelerator, after it has previously been taken offline,
so that it can once again service client requests.

== Procedure

Follow these steps to re-add a Keycloak site to the AWS Global Accelerator so that it can handle client requests.

=== Global Accelerator

. Determine the ARN of the Network Load Balancer (NLB) associated with the site to be brought online
+
include::partial$nlb-arn.adoc[]
+
. Update the Accelerator EndpointGroup to include both sites

include::partial$accelerator-endpoint-group.adoc[]
+
.Output:
[source,bash]
----
{
"EndpointGroups": [
{
"EndpointGroupArn": "arn:aws:globalaccelerator::606671647913:accelerator/d280fc09-3057-4ab6-9330-6cbf1f450748/listener/8769072f/endpoint-group/a30b64ec1700",
"EndpointGroupRegion": "eu-west-1",
"EndpointDescriptions": [
{
"EndpointId": "arn:aws:elasticloadbalancing:eu-west-1:606671647913:loadbalancer/net/a3c75f239541c4a6e9c48cf8d48d602f/5ba333e87019ccf0",
"Weight": 50,
"HealthState": "HEALTHY",
"ClientIPPreservationEnabled": false
}
],
"TrafficDialPercentage": 100.0,
"HealthCheckPort": 443,
"HealthCheckProtocol": "TCP",
"HealthCheckIntervalSeconds": 30,
"ThresholdCount": 3
}
]
}
----
+
.. Update the EndpointGroup to include the existing Endpoint and the NLB retrieved in step 1.
+
.Command:
[source,bash]
----
aws globalaccelerator update-endpoint-group \
--endpoint-group-arn arn:aws:globalaccelerator::606671647913:accelerator/d280fc09-3057-4ab6-9330-6cbf1f450748/listener/8769072f/endpoint-group/a30b64ec1700 \
--region us-west-2 \
--endpoint-configurations '
[
{
"EndpointId": "arn:aws:elasticloadbalancing:eu-west-1:606671647913:loadbalancer/net/a3c75f239541c4a6e9c48cf8d48d602f/5ba333e87019ccf0",
"Weight": 50,
"ClientIPPreservationEnabled": false
},
{
"EndpointId": "arn:aws:elasticloadbalancing:eu-west-1:606671647913:loadbalancer/net/a49e56e51e16843b9a3bc686327c907b/9b786f80ed4eba3d",
"Weight": 50,
"ClientIPPreservationEnabled": false
}
]
'
----
Loading

0 comments on commit d612cda

Please sign in to comment.