Skip to content

Commit

Permalink
server: Retry cert reloading & test case step
Browse files Browse the repository at this point in the history
In workflow runs like this:
https://github.com/open-policy-agent/opa/actions/runs/7803493290/job/21283458848#step:3:317

We can see two problems. This commit is meant to address them.

First, the test failed with this message:

```
expected unknown certificate authority error but got: Get "https://127.0.0.1:38699/v1/data": write tcp 127.0.0.1:52786->127.0.0.1:38699: write: connection reset by peer
```

Now this step in the test is retried like the other steps in the test
since it can fail too.

Second, the error `failed to reload TLS config` appears many times in
the logs for that test. This issue is caused by the server attempting to
read the new cert, key, and CA contents from disk while they are still
being written to. This PR also introduces a 100ms pause between upto 5
attempts to reload the config for any given change to the state on disk.
This should mean that the error is seen only when is is actually an
issue and the reload has failed after a reasonable time. In most cases,
running locally, the reload happens without error on the first run.

Signed-off-by: Charlie Egan <charlie@styra.com>
  • Loading branch information
charlieegan3 committed Apr 4, 2024
1 parent d1821df commit a1cefe2
Show file tree
Hide file tree
Showing 2 changed files with 40 additions and 12 deletions.
22 changes: 18 additions & 4 deletions server/certs.go
Original file line number Diff line number Diff line change
Expand Up @@ -170,12 +170,26 @@ func (s *Server) certLoopNotify(logger logging.Logger) Loop {
for evt := range watcher.Events {
removalMask := fsnotify.Remove | fsnotify.Rename
mask := fsnotify.Create | fsnotify.Write | removalMask
if (evt.Op & mask) != 0 {
if (evt.Op & mask) == 0 {
continue
}

// retry logic here handles cases where the files are still being written to as events are triggered.
retries := 0
for {
err = s.reloadTLSConfig(s.manager.Logger())
if err != nil {
logger.Error("failed to reload TLS config: %s", err)
if err == nil {
logger.Info("TLS config reloaded")
break
}
logger.Info("TLS config reloaded")

retries++
if retries >= 5 {
logger.Error("Failed to reload TLS config after retrying: %s", err)
break
}

time.Sleep(100 * time.Millisecond)
}
}

Expand Down
30 changes: 22 additions & 8 deletions server/server_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -5146,14 +5146,28 @@ func TestCertPoolReloading(t *testing.T) {
},
}

req, err := http.NewRequest("GET", fmt.Sprintf("https://%s/v1/data", serverAddress), nil)
if err != nil {
t.Fatal(err)
}
// make a request and check that the server doesn't trust the client cert yet since it has no CA cert
retries = 10
expectedError := "remote error: tls"
for {
if retries == 0 {
t.Fatal("server didn't return expected error before deadline")
}

_, err = client.Do(req)
if !strings.Contains(err.Error(), "remote error: tls") {
t.Fatalf("expected unknown certificate authority error but got: %s", err)
req, err := http.NewRequest("GET", fmt.Sprintf("https://%s/v1/data", serverAddress), nil)
if err != nil {
t.Fatal(err)
}

_, err = client.Do(req)
if !strings.Contains(err.Error(), expectedError) {
t.Log("retrying, expected error:", expectedError, "but got:", err)
retries--
time.Sleep(300 * time.Millisecond)
continue
}

break
}

// update the cert pool file to include the CA cert
Expand Down Expand Up @@ -5255,7 +5269,7 @@ func TestCertPoolReloading(t *testing.T) {
}

if !strings.Contains(err.Error(), "remote error: tls") {
t.Fatalf("expected unknown certificate authority error but got: %s", err)
t.Fatalf("expected unknown certificate authority error (server has different CA) but got: %s", err)
}

break
Expand Down

0 comments on commit a1cefe2

Please sign in to comment.