Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 88 additions & 0 deletions content/en/docs/release-oversight/backend_queries.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
---
title: "Testing Backends For Availability"
description: This is an overview for how backends are queried for their availability status.
---

### Overview Diagram

This diagram shows how backends are queried to determine their availability:

![Query Backends1](/query_backends1.png)


* (1) Starting from a call to
[StartAllAPIMonitoring](https://github.com/openshift/origin/blob/08eb7795276c45f2be16e980a9687e34f6d2c8ec/test/extended/util/disruption/controlplane/known_backends.go#L13),
one of several BackendSamplers are created:

{{% card-code header="[origin/test/extended/util/disruption/controlplane/known_backends.go](https://github.com/openshift/origin/blob/08eb7795276c45f2be16e980a9687e34f6d2c8ec/test/extended/util/disruption/controlplane/known_backends.go#L54)" %}}

```go
backendSampler, err := createKubeAPIMonitoringWithNewConnections(clusterConfig)
```

{{% /card-code %}}

* (2) Then a disruptionSampler is created with that BackendSampler
https://github.com/openshift/origin/blob/08eb7795276c45f2be16e980a9687e34f6d2c8ec/pkg/monitor/backenddisruption/disruption_backend_sampler.go#L410

{{% card-code header="[origin/pkg/monitor/backenddisruption/backenddisruption/disruption_backend_sampler.go](https://github.com/openshift/origin/blob/08eb7795276c45f2be16e980a9687e34f6d2c8ec/pkg/monitor/backenddisruption/disruption_backend_sampler.go#L410)" %}}

```go
disruptionSampler := newDisruptionSampler(b)
go disruptionSampler.produceSamples(producerContext, interval)
go disruptionSampler.consumeSamples(consumerContext, interval, monitorRecorder, eventRecorder)
```

{{% /card-code %}}

* (3) The `produceSamples` function is called to produce the disruptionSamples. This function is built around
a [`Ticker`](https://go.dev/src/time/tick.go) that fires every 1 second. The `checkConnection` function is
called to send an Http GET to the backend and look for a response from the backend.

{{% card-code header="[origin/pkg/monitor/backenddisruption/disruption_backend_sampler.go](https://github.com/openshift/origin/blob/08eb7795276c45f2be16e980a9687e34f6d2c8ec/pkg/monitor/backenddisruption/disruption_backend_sampler.go#L506)" %}}


```go
func (b *disruptionSampler) produceSamples(ctx context.Context, interval time.Duration) {
ticker := time.NewTicker(interval)
defer ticker.Stop()
for {
// the sampleFn may take a significant period of time to run. In such a case, we want our start interval
// for when a failure started to be the time when the request was first made, not the time when the call
// returned. Imagine a timeout set on a DNS lookup of 30s: when the GET finally fails and returns, the outage
// was actually 30s before.
currDisruptionSample := b.newSample(ctx)
go func() {
sampleErr := b.backendSampler.checkConnection(ctx)
currDisruptionSample.setSampleError(sampleErr)
close(currDisruptionSample.finished)
}()

select {
case <-ticker.C:
case <-ctx.Done():
return
}
}
}
```

{{% /card-code %}}

* (4) The `checkConnection` function, produces `disruptionSamples` which represent the startTime of the Http GET and
an associated `sampleErr` that trackes if the Http GET was successful (sampleErr set to `nil`) or failing (the error
is saved). The `disruptionSamples` are stored in a slice referenced by the `disruptionSampler`.

* (5) The `consumeSamples` function takes the disruptionSamples and determines when disruption started and stopped. It
then records Events and records Intervals/Conditions on the monitorRecorder.


{{% card-code header="[origin/pkg/monitor/backenddisruption/disruption_backend_sampler.go](https://github.com/openshift/origin//blob/master/pkg/monitor/backenddisruption/disruption_backend_sampler.go#L504)" %}}

```go
func (b *disruptionSampler) consumeSamples(ctx context.Context, interval time.Duration, monitorRecorder Recorder, eventRecorder events.EventRecorder) {
```

{{% /card-code %}}

* (6) Intervals on the monitorRecorder are used by the synthetic tests.
4 changes: 2 additions & 2 deletions content/en/docs/release-oversight/disruption-testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,14 +31,14 @@ TBD


### Disruption test framework overview
To check for disruptions while upgrading OCP clusters
To check for disruptions while upgrading OCP clusters
* The tests are defined by [AllTests](https://github.com/neisw/origin/blob/46f376386ab74ecfe0091552231d378adf24d5ea/test/e2e/upgrade/upgrade.go#L53)
* The disruption is defined by [clusterUpgrade](https://github.com/neisw/origin/blob/46f376386ab74ecfe0091552231d378adf24d5ea/test/e2e/upgrade/upgrade.go#L270)
* These are passed into [disruption.Run](https://github.com/neisw/origin/blob/2a97f51d4981a12f0cadad53db133793406db575/test/extended/util/disruption/disruption.go#L81)
* Which creates a new [Chaosmonkey](https://github.com/neisw/origin/blob/59599fad87743abf4c84f05952552e6d42728781/vendor/k8s.io/kubernetes/test/e2e/chaosmonkey/chaosmonkey.go#L48) and [executes](https://github.com/neisw/origin/blob/59599fad87743abf4c84f05952552e6d42728781/vendor/k8s.io/kubernetes/test/e2e/chaosmonkey/chaosmonkey.go#L78) the disruption monitoring tests and the disruption
* The [backendDisruptionTest](https://github.com/neisw/origin/blob/0c50d9d8bedbd2aa0af5c8a583418601891ee9d4/test/extended/util/disruption/backend_sampler_tester.go#L34) is responsible for
* Creating the event broadcaster, recorder and monitor
* Attempting to query the backend and timing out after the max interval (1 second typically)
* [Attempting to query the backend](../backend_queries) and timing out after the max interval (1 second typically)
* Analyzing the disruption events for disruptions that exceed allowable values
* When the disruption is complete the disruptions tests are validated via Matches / BestMatcher to find periods that exceed allowable thresholds
* [Matches](https://github.com/neisw/origin/blob/43d9e9332d5fb148b2e68804200a352a9bc683a5/pkg/synthetictests/allowedbackenddisruption/matches.go#L11) will look for an entry in [query_results](https://github.com/openshift/origin/blob/master/pkg/synthetictests/allowedbackenddisruption/query_results.json) if an exact match is not found it will utilize [BestMatcher](https://github.com/neisw/origin/blob/4e8f0ba818ed5e89cf09bf2902be857859a2125c/pkg/synthetictests/historicaldata/types.go#L128) to look for data with the closest variants match
Expand Down
2 changes: 1 addition & 1 deletion layouts/shortcodes/card-code.html
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@
<div class="card-body code" style="padding:0.0em 0.85em;">
{{ $.Inner }}
</div>
</div>
</div>
Binary file added static/query_backends1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.