Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gateway: Cache-Control, max-age=ttl, ETag for /ipns requests #329

Open
ion1 opened this issue Oct 9, 2015 · 17 comments · Fixed by #459
Open

Gateway: Cache-Control, max-age=ttl, ETag for /ipns requests #329

ion1 opened this issue Oct 9, 2015 · 17 comments · Fixed by #459
Assignees
Labels
help wanted Seeking public contribution on this issue kind/bug A bug in existing code (including security flaws) topic/gateway Issues related to HTTP Gateway

Comments

@ion1
Copy link

ion1 commented Oct 9, 2015

The gateway does not seem to provide a Cache-Control/max-age header or an ETag for /ipns URLs.

It might be useful to assume some reasonable non-zero max-age for content on IPNS. A minute? Fifteen seconds?

It would also be good to serve an ETag corresponding to whatever IPFS hash the IPNS path resolved to.

@jbenet
Copy link
Member

jbenet commented Oct 9, 2015

It might be useful to assume some reasonable non-zero max-age for content
on IPNS. A minute? Fifteen seconds?

It could be determined by the IPNS record -- some will have a calculable
TTL.

@ghost
Copy link

ghost commented Oct 9, 2015

Until then, we could set must-revalidate and a quasi infinite max-age

@ion1
Copy link
Author

ion1 commented Oct 9, 2015

I’m not sure if I have understood this correctly, but I’m under the impression that must-revalidate only applies if the content has already expired. http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.4

@ghost
Copy link

ghost commented Oct 9, 2015

You're right, thanks -- so that means must-revalidate + max-age=0 = cache-indefinitely-unless-updated

@ghost
Copy link

ghost commented Oct 9, 2015

Note that for /ipfs/, we can still set a quasi-infinite max-age.

@ion1
Copy link
Author

ion1 commented Oct 9, 2015

It seems to me that must-revalidate means that given an object that expired according to max-age, a cache must return an error instead of a stale copy if it fails to connect to the origin server. I’m not sure must-revalidate is of any use here. Unless I’m misinterpreting it, of course.

@whyrusleeping whyrusleeping added the topic/gateway Issues related to HTTP Gateway label Oct 11, 2015
@ion1
Copy link
Author

ion1 commented Oct 14, 2015

DNS lets you say explicitly for how long a record can be cached. Perhaps we should do that with IPNS names? ipfs name publish --ttl=3h which would be taken advantage of by IPFS nodes who have resolved the name as well as by the gateway in the Cache-Control header.

@ion1
Copy link
Author

ion1 commented Oct 28, 2015

It was mentioned on IRC that resolve does not provide TTL information to be used for max-age.

resolveOnce could be changed to return a TTL value. resolve could start at infinity-ish and compute the minimum of the previous TTL and what resolveOnce returned at each iteration.

Given that the DNS lookup functions in the net package for Go do not seem to provide TTL information at the moment, the DNS resolveOnce could return a reasonable guess like one minute. If a DNS record points to /ipns/<keyhash>, that IPNS record can still decrement the TTL according to the iteration above if a low cache time is desired.

If resolve returned a TTL, the if strings.HasPrefix(urlPath, ipfsPathPrefix) check in gateway_handler.go could go away and the header be set unconditionally.

@ion1
Copy link
Author

ion1 commented Oct 29, 2015

Would this be a reasonable starting point?

diff --git a/namesys/interface.go b/namesys/interface.go
index 09c296c..b0b740d 100644
--- a/namesys/interface.go
+++ b/namesys/interface.go
@@ -47,6 +47,14 @@ const (
    // trust resolution to eventually complete and can't put an upper
    // limit on how many steps it will take.
    UnlimitedDepth = 0
+
+   // ImmutableTTL is the time-to-live value to be reported for immutable
+   // objects.
+   ImmutableTTL = 10 * 365 * 24 * time.Hour
+
+   // UnknownTTL is the time-to-live value to be reported for mutable
+   // paths when accurate information is not available.
+   UnknownTTL = time.Minute
 )

 // ErrResolveFailed signals an error when attempting to resolve.
@@ -88,7 +96,10 @@ type Resolver interface {
    // There is a default depth-limit to avoid infinite recursion.  Most
    // users will be fine with this default limit, but if you need to
    // adjust the limit you can use ResolveN.
-   Resolve(ctx context.Context, name string) (value path.Path, err error)
+   //
+   // Resolve also returns a time-to-live value which indicates the
+   // maximum amount of time the result may be cached.
+   Resolve(ctx context.Context, name string) (value path.Path, ttl time.Duration, err error)

    // ResolveN performs a recursive lookup, returning the dereferenced
    // path.  The only difference from Resolve is that the depth limit
@@ -97,7 +108,7 @@ type Resolver interface {
    //
    // Most users should use Resolve, since the default limit works well
    // in most real-world situations.
-   ResolveN(ctx context.Context, name string, depth int) (value path.Path, err error)
+   ResolveN(ctx context.Context, name string, depth int) (value path.Path, ttl time.Duration, err error)
 }

 // Publisher is an object capable of publishing particular names.
diff --git a/namesys/base.go b/namesys/base.go
index e552fce..29ef997 100644
--- a/namesys/base.go
+++ b/namesys/base.go
@@ -2,6 +2,7 @@ package namesys

 import (
    "strings"
+   "time"

    context "github.com/ipfs/go-ipfs/Godeps/_workspace/src/golang.org/x/net/context"

@@ -9,27 +10,36 @@ import (
 )

 type resolver interface {
-   // resolveOnce looks up a name once (without recursion).
-   resolveOnce(ctx context.Context, name string) (value path.Path, err error)
+   // resolveOnce looks up a name once (without recursion).  It also
+   // returns a time-to-live value which indicates the maximum amount of
+   // time the result may be cached.
+   resolveOnce(ctx context.Context, name string) (value path.Path, ttl time.Duration, err error)
 }

 // resolve is a helper for implementing Resolver.ResolveN using resolveOnce.
-func resolve(ctx context.Context, r resolver, name string, depth int, prefixes ...string) (path.Path, error) {
+func resolve(ctx context.Context, r resolver, name string, depth int, prefixes ...string) (path.Path, time.Duration, error) {
+   // Start with a long TTL.
+   ttl := ImmutableTTL
+
    for {
-       p, err := r.resolveOnce(ctx, name)
+       p, ttlOnce, err := r.resolveOnce(ctx, name)
+       // Use the lowest TTL reported by the resolveOnce invocations.
+       if ttlOnce < ttl {
+           ttl = ttlOnce
+       }
        if err != nil {
            log.Warningf("Could not resolve %s", name)
-           return "", err
+           return "", ttl, err
        }
-       log.Debugf("Resolved %s to %s", name, p.String())
+       log.Debugf("Resolved %s to %s (TTL %v -> %v)", name, p.String(), ttlOnce, ttl)

        if strings.HasPrefix(p.String(), "/ipfs/") {
            // we've bottomed out with an IPFS path
-           return p, nil
+           return p, ttl, nil
        }

        if depth == 1 {
-           return p, ErrResolveRecursion
+           return p, ttl, ErrResolveRecursion
        }

        matched := false
@@ -44,7 +54,7 @@ func resolve(ctx context.Context, r resolver, name string, depth int, prefixes .
        }

        if !matched {
-           return p, nil
+           return p, ttl, nil
        }

        if depth > 1 {

@whyrusleeping
Copy link
Member

@ion1 yeah, we could return a ttl with every resolution... I'm generally against more return types than 1+error, but this is likely a cleaner approach than others

@ion1
Copy link
Author

ion1 commented Oct 31, 2015

<jbenet> […] i'm overall -1 on returning the TTL on every Resolve call. i think that should be a separate function call.
<jbenet> most uses of Resolve do not care.
<jbenet> and its important to keep the interface as clean as we can possible have it.
<ion> jbenet: Fair enough. So it would be better to add a new function to the Resolver interface?
<jbenet> ion: yeah a new function SGTM.

I’m going to do that then.

ion1 referenced this issue in ion1/go-ipfs Nov 5, 2015
Closes ipfs/go-ipfs#1818.

License: MIT
Signed-off-by: Johan Kiviniemi <devel@johan.kiviniemi.name>
ion1 referenced this issue in ion1/go-ipfs Nov 5, 2015
Closes ipfs/go-ipfs#1818.

License: MIT
Signed-off-by: Johan Kiviniemi <devel@johan.kiviniemi.name>
ion1 referenced this issue in ion1/go-ipfs Nov 5, 2015
Closes ipfs/go-ipfs#1818.

License: MIT
Signed-off-by: Johan Kiviniemi <devel@johan.kiviniemi.name>
ion1 referenced this issue in ion1/go-ipfs Nov 5, 2015
Closes ipfs/go-ipfs#1818.

License: MIT
Signed-off-by: Johan Kiviniemi <devel@johan.kiviniemi.name>
ion1 referenced this issue in ion1/go-ipfs Nov 5, 2015
Closes ipfs/go-ipfs#1818.

License: MIT
Signed-off-by: Johan Kiviniemi <devel@johan.kiviniemi.name>
ion1 referenced this issue in ion1/go-ipfs Nov 7, 2015
Closes ipfs/go-ipfs#1818.

License: MIT
Signed-off-by: Johan Kiviniemi <devel@johan.kiviniemi.name>
ion1 referenced this issue in ion1/go-ipfs Nov 7, 2015
Closes ipfs/go-ipfs#1818.

License: MIT
Signed-off-by: Johan Kiviniemi <devel@johan.kiviniemi.name>
ion1 referenced this issue in ion1/go-ipfs Nov 7, 2015
Closes ipfs/go-ipfs#1818.

License: MIT
Signed-off-by: Johan Kiviniemi <devel@johan.kiviniemi.name>
ion1 referenced this issue in ion1/go-ipfs Nov 8, 2015
Closes ipfs/go-ipfs#1818.

License: MIT
Signed-off-by: Johan Kiviniemi <devel@johan.kiviniemi.name>
@ion1 ion1 mentioned this issue Nov 9, 2015
11 tasks
ion1 referenced this issue in ion1/go-ipfs Nov 13, 2015
Closes ipfs/go-ipfs#1818.

License: MIT
Signed-off-by: Johan Kiviniemi <devel@johan.kiviniemi.name>
@lidel
Copy link
Member

lidel commented Jan 2, 2019

Just for the record, this is still an open issue for /ipns/ responses returned by go-ipfs v0.4.18.

2019-01-02--14-08-56

Etag is present, but we are missing an explicit IPNS expiration header.

AFAIK it could be either:

  • Cache-Control header with max-age
  • or Expires header with a date

Why?
There is a Last-Modified header so most browsers will use "Heuristic Freshness" to determine how long to cache that asset for. This is a weak cache hint, as RFC7234 suggests something like (current time - last modified time) / 10, however, there are subtle differences between Firefox and Chrome.

Due to this we should provide explicit cache expiration headers for /ipns/ to unify experience across all vendors.

@magik6k magik6k added the help wanted Seeking public contribution on this issue label Jan 2, 2019
@magik6k
Copy link
Member

magik6k commented Jan 2, 2019

This will sort-of depend on ipfs/kubo#5884, as ipns supports dns resolution too. 'Real' IPNS records support TTL too, but AFAIK it's currently not used (basically it's set to 0).

@lidel
Copy link
Member

lidel commented Jan 18, 2022

actionable task extracted from ipfs/kubo#8074 (review), seems to be a natural continuation fo this old issue here

TLDR: we are not leveraging TTLs in Gateway responses, which means things are not cached for as long as they could be. This is getting critical for our gateway infra, without this we are unable to cache things on Gateways as efficiently as we could.

What we need

  • Gateway responses for /ipns/* should return Cache-Control based on:
    • TTL from IPNS record
    • TTL of a DNSLink TXT record
  • Remove Last-Modified header (right now returns now timestamp to leverage browser heuristycs: disable-last-modified-behaviour-ipns-routes kubo#8074, but we can remove it only when proper Cache-Control is in place)
  • Etag based on the content path from the record. Together with Cache-control it enables cache at edge and user agents to revalidate cache instead of re-downloading data that did not change and which they already have. We already do this for files and dirs, just mentioning here for completeness.

Implementation notes

  • iiuc we need to extend the namesys interface to return TTL of both record types in addition to Path (as suggested earlier)
  • wire ttl up in gateway code, so cache-control header is returned
  • have tests that confirm /ipns/ content paths have proper cache controls on all gateway types: path, subdomain, and dnslink
  • We already have DNS.MaxCacheTTL in Kubo config, adding Ipns.MaxCacheTTL sounds sensible (to allow gateway operators force check for new IPNS record after some sane time)

cc ipfs/kubo#5884, ipfs/kubo#7564

@hacdias hacdias self-assigned this May 31, 2023
@hacdias hacdias transferred this issue from ipfs/kubo Jun 1, 2023
@ipfs ipfs deleted a comment from welcome bot Jun 1, 2023
@hacdias hacdias changed the title max-age, ETag for /ipns requests Gateway: max-age, ETag for /ipns requests Jun 1, 2023
@BigLep BigLep mentioned this issue Nov 9, 2023
11 tasks
@lidel
Copy link
Member

lidel commented Mar 13, 2024

Reopening. We have spec at https://specs.ipfs.tech/http-gateways/path-gateway/#cache-control-response-header, but boxo/gateway does not implement it fully.

This is not fixed fully yet.

IPNS Records are handled correctly thanks to #459 but we don't set Cache-Control with max-age based on TTL as noted in #459 (comment). I remember we did not have TTL there because Golang standard library is not exposing access to original TTL of DNS record when resolver from operating system is used.

Remaining work here

  • Always return Cache-Control on /ipns/{dnslink} responses. Either correctly wire up TTL from DNS TXT record of DNSLink, or use 5 minute as default fallback when it is not available.
  • port Cache-Control implementation from routing/http/server: add cache control #584, where value is public, max-age=<ttl>, stale-while-revalidate=<dht-or-other-expiration-window>, stale-if-error=<dht-or-other-expiration-window>
  • update gateway-conformance tests to test for presence of cache-control on DNSLink response

@lidel lidel reopened this Mar 13, 2024
@lidel lidel added the kind/bug A bug in existing code (including security flaws) label Mar 13, 2024
@lidel lidel changed the title Gateway: max-age, ETag for /ipns requests Gateway: Cache-Control, max-age=ttl, ETag for /ipns requests Mar 13, 2024
@2color
Copy link
Member

2color commented Mar 14, 2024

port Cache-Control implementation from #584, where value is public, max-age=, stale-while-revalidate=, stale-if-error=

For DNSLink, what would you recommend the stale-while-revalidate be, since it has nothing to do with the DHT?

@lidel
Copy link
Member

lidel commented Mar 14, 2024

Good question. Unless there is a better value, I suggest using Amino DHT expiration window (48h) in cases like this, where we need a sensible hard ceiling on caching mutable things.

Rationale: DNSLink resolves to some CID. Returning a cached HTTP response body is ok as long the the data behind that CID could be (in theory) retrievable from the public swarm provider. So if we count from the moment we resolved dnslink and successfully retrieved it, 48h is the max window we can assume the retrieval of the same payload would be successful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Seeking public contribution on this issue kind/bug A bug in existing code (including security flaws) topic/gateway Issues related to HTTP Gateway
Projects
Status: 🥞 Todo
Development

Successfully merging a pull request may close this issue.

8 participants
@ion1 @jbenet @lidel @whyrusleeping @2color @magik6k @hacdias and others