Skip to content

httpcache vs. Conditional Request vs. X-Varied-Authorization header #437

@andygrunwald

Description

@andygrunwald

Context

I run go-githubapp with github.com/gregjones/httpcache (github.com/gregjones/httpcache/diskcache in particular). I run it in an "Github App context", means: The user installs a Github app, based on this, I get permission to make request:

[...]
clientCreator, err := githubapp.NewDefaultCachingClientCreator(
	config.Github,
	githubapp.WithClientTimeout(60*time.Second),
	githubapp.WithClientCaching(false, func() httpcache.Cache { return httpTransportCache }),
	githubapp.WithClientMiddleware(
		githubapp.ClientLogging(zerolog.InfoLevel, githubapp.LogRateLimitInformation(&githubapp.RateLimitLoggingOption{
			Limit:     true,
			Remaining: true,
			Used:      true,
			Reset:     true,
			Resource:  true,
		})),
	),
)
[...]

followed by ...

installationClient, err := clientCreator.NewInstallationClient(<ID>)

followed by ...

issue, resp, err := installationClient.Issues.Get(context.Background(), "andygrunwald", "go-jira", 687)

This works great, as expected.
Caching is enabled and it writes and read the cache. However, I also activated logging via

loggerConsoleOutput := zerolog.ConsoleWriter{Out: os.Stdout, TimeFormat: time.RFC3339}
	logger := zerolog.New(loggerConsoleOutput).Level(zerolog.DebugLevel).With().Timestamp().Logger()
	zerolog.DefaultContextLogger = &logger

to get details about the usage of the cache:

2025-03-06T23:11:40+01:00 INF github_request cached=false elapsed=400.52325 method=GET path=https://api.github.com/repos/andygrunwald/go-jira/issues/687 ratelimit={"limit":7950,"remaining":7943,"reset":"2025-03-06T23:19:40+01:00","resource":"core","used":7} size=-1 status=200
2025-03-06T23:11:40+01:00 INF github_request cached=true elapsed=0.086375 method=GET path=https://api.github.com/repos/andygrunwald/go-jira/issues/687 ratelimit={"limit":7950,"remaining":7943,"reset":"2025-03-06T23:19:40+01:00","resource":"core","used":7} size=-1 status=200

I discovered that the cache is not working as I expected and as described in Use conditional requests if appropriate.

Caching deep dive

The caching entry of a request contains the response headers like (stripped version)

HTTP/2.0 200 OK
Cache-Control: private, max-age=60, s-maxage=60
Etag: W/"7640648674d954ec24d5ee4c94bae93674a9a85852d6f7192cbee73f83c78da0"
Last-Modified: Mon, 14 Oct 2024 01:13:57 GMT
Vary: Accept, Authorization, Cookie, X-GitHub-OTP,Accept-Encoding, Accept, X-Requested-With
X-Varied-Accept: application/vnd.github.squirrel-girl-preview
X-Varied-Authorization: token <SOME TOKEN>

httpcache is respecting (and comparing) all headers listed in Vary in https://github.com/gregjones/httpcache/blob/901d90724c7919163f472a9812253fb26761123d/httpcache.go#L160
This includes X-Varied-Authorization.
X-Varied-Authorization contains the token that has a limited lifetime.

Once the token lifetime is reached, it makes an uncached (not conditional) request, as X-Varied-Authorization varies, but the content itself does not change at all. In bigger repositories, when you aim to crawl those, you can hit the rate limit pretty quickly.

As far as I understand, this behavior is in line with RFC 7234.

Question on X-Varied-Authorization

In this usecase (crawling repositories by a Github app), I was wondering:

Is there any (negative) side affect to discard (read: Remove) the X-Varied-Authorization when writing the cache?

When the X-Varied-Authorization header is not compared in https://github.com/gregjones/httpcache/blob/901d90724c7919163f472a9812253fb26761123d/httpcache.go#L121, the cache hitrate would increase by a lot.

Another benefit: You would not cache an auth secret (here, I am not 100% sure if this is a real issue, as the token is not valid anymore).

Implementation

Next thought would be "How would this be implemented?".
The simplest solution would be "to hack this in" by forking httpcache and adding a piece of if-condition into the varyMatches or related.

Another option: Can we somehow "hook" into the process somewhere? I would like to avoid copying the diskcache part, as it receives only a []byte which would require a lot of parsing.

Small runnable example

If requested, I can provide a minimal code example to reproduce it.

Disclaimer

I do understand that this is not exactly a go-githubapp issue. Rather a caching issue. However, it is pretty much related to the usecase of go-githubapp.

I would appreciate your thoughts on this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions