Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add HTTP timeout to Splunk logger, fixes #43000 #43001

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
13 changes: 13 additions & 0 deletions daemon/logger/splunk/splunk.go
Expand Up @@ -42,6 +42,7 @@ const (
splunkGzipCompressionKey = "splunk-gzip"
splunkGzipCompressionLevelKey = "splunk-gzip-level"
splunkIndexAcknowledgment = "splunk-index-acknowledgment"
splunkHTTPTimeout = "splunk-timeout"
envKey = "env"
envRegexKey = "env-regex"
labelsKey = "labels"
Expand Down Expand Up @@ -228,12 +229,23 @@ func New(info logger.Info) (logger.Logger, error) {
}
}

var timeout time.Duration
if timeoutStr, ok := info.Config[splunkHTTPTimeout]; ok {
value, err := strconv.ParseInt(timeoutStr, 10, 64)

if err != nil {
return nil, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should wrap (errors.Wrap()) the error to give the error a bit more context. Currently it would produce something like time: unknown unit "msec" in duration "100msec", what describes what's wrong with the value, but still makes it difficult for the user to find "what value was this?". Something like:

return nil, errors.Wrapf(err, "invalid value for %s", splunkHTTPTimeout)

Which would produce:

invalid value for splunk-timeout: time: unknown unit "msec" in duration "100msec"

}
timeout = time.Duration(value) * time.Millisecond
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating. Erm, I thought I commented, but looks like I never did 😬

Similar to my comment on #43100 (comment), I'm wondering if we should make the splunk-timeout option accept a "duration" string, instead of millisecond. Let me quote that comment here, as it may clarify why I think that would be good to do;

Perhaps we should consider to parse the value as a time.Duration (time.ParseDuration()) to make it easier (and less error-prone) for the user to set the right interval. Having to specify the value in ms works "ok(ish)" if you want to, say, specify a second (1000 ms), but can become a bit awkward after that. It's also error-prone, for example, if a user wants to specify 1 second, but specifies 1 and ends up with 1 ms.

And looking at the code, I see we already use duration strings for fluentd-retry-wait, so it's a bit more consistent.

So I would suggest to;

  • use time.ParseDuration()
  • validate the no negative values are specified
  • perhaps also a minimum / maximum (not sure what reasonable values are? between 100ms and 10s ? (open to suggestions)

Would be good to have a minimal unit test (I see there's currently no tests in this package, which is a bit unfortunate, but let's not continue that "tradition" 😅)

Happy to hear your thoughts!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, that sounds like a good suggestion. Sorry for missing it. I will get to it asap.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the duration string support and am testing for negative values. I didn't enforce min/max, given that the default is no timeout, it doesn't make sense to bound max value.

}

transport := &http.Transport{
TLSClientConfig: tlsConfig,
Proxy: http.ProxyFromEnvironment,
}
client := &http.Client{
Transport: transport,
Timeout: timeout,
atoulme marked this conversation as resolved.
Show resolved Hide resolved
}

source := info.Config[splunkSourceKey]
Expand Down Expand Up @@ -584,6 +596,7 @@ func ValidateLogOpt(cfg map[string]string) error {
case splunkGzipCompressionKey:
case splunkGzipCompressionLevelKey:
case splunkIndexAcknowledgment:
case splunkHTTPTimeout:
case envKey:
case envRegexKey:
case labelsKey:
Expand Down
57 changes: 57 additions & 0 deletions daemon/logger/splunk/splunk_test.go
Expand Up @@ -4,6 +4,7 @@ import (
"compress/gzip"
"context"
"fmt"
"net"
"net/http"
"os"
"runtime"
Expand Down Expand Up @@ -1392,3 +1393,59 @@ func TestDeadlockOnBlockedEndpoint(t *testing.T) {
case <-done:
}
}

func TestCustomHttpTimeout(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this should be named TestCustomHTTPTimeout to follow Go's naming conventions

tcpAddr := &net.TCPAddr{IP: []byte{127, 0, 0, 1}, Port: 0, Zone: ""}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Port and Zone fields are redundant here (0 is the default value for Port (int), and "" the default for Zone (string)). Perhaps a more idiomatic way to write this is to use net.IPv4 instead of []byte{};

tcpAddr := &net.TCPAddr{IP: net.IPv4(127, 0, 0, 1)}

tcpListener, err := net.ListenTCP("tcp", tcpAddr)
if err != nil {
t.Fatal(err)
}
counter := 0
Comment on lines +1402 to +1403
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps would be slightly cleaner to move the tcpListener.Close() from the end of the test to defer here

}
defer tcpListener.Close()
conter := 0

(I think it's ok to ignore error handling)

go http.Serve(tcpListener, http.HandlerFunc(func(writer http.ResponseWriter, request *http.Request) {
if request.Method == "POST" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we're already importing the net/http package, let's use the http.MethodPost const here;

if request.Method == http.MethodPost {

counter++
// block so connection will timeout.
time.Sleep(5000 * time.Millisecond)
}
writer.WriteHeader(200)
}))

info := logger.Info{
Config: map[string]string{
splunkURLKey: "http://" + tcpListener.Addr().String(),
splunkTokenKey: "1234",
splunkHTTPTimeout: "100",
},
ContainerID: "containeriid",
ContainerName: "/container_name",
ContainerImageID: "contaimageid",
ContainerImageName: "container_image_name",
}

loggerDriver, err := New(info)
if err != nil {
t.Fatal(err)
}
batchSendTimeout = 1 * time.Second
start := time.Now()
if err := loggerDriver.Log(&logger.Message{Line: []byte("message1"), Source: "stdout", Timestamp: time.Now()}); err != nil {
t.Fatal(err)
}
err = loggerDriver.Close()
if err != nil {
t.Fatal(err)
}

duration := time.Since(start)
if duration > 1*time.Second {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should likely take the value we set above, to prevent this check from accidentally being out of sync if that value would ever be changed;

if duration > batchSendTimeout {

I'm a bit worried that this could become a bit "flaky"; time taken could possibly be "slightly more than 1 second" and make the test randomly fail. For such cases it may be good to give it a "bit" more time (and leave a comment that describes we're doing so). Something like;

// check that the timeout is used, but give it a bit more time to prevent
// the test to become flaky.
if duration > batchSendTimeout+75*time.Millisecond {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm.... so looking at this... I'm now wondering if this PR is doing the right thing (I'll leave another comment describing what I mean)

t.Fatal("Sending didn't use the custom timeout")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps include the actual time in the error;

t.Fatalf("Sending didn't use the custom timeout: took %v", duration)

}
if counter == 0 {
t.Fatal("No messages received")
}

err = tcpListener.Close()
if err != nil {
t.Fatal(err)
}
}