Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resolving a node's own /ipns entry: trade-offs between consistency and partition tolerance #2278

Closed
hackergrrl opened this issue Feb 2, 2016 · 4 comments

Comments

@hackergrrl
Copy link
Contributor

On a recent issue, @jbenet wrote:

I think the resolve here should return immediately correctly here. Since there is no network the lookup fails to find any other records beyond the node's own, and thus uses that one.

The solution here (using an offline routing system) is not the right one; we should be fixing the underlying problems in "why resolve is not returning our own record" as valid.

Presently, we write our own pubkey to our local DHT on ipfs init, so that it gets replicated to other nodes on the network.

However, when our node tries to resolve its own key, it queries the network, choosing to not trust its local cache in case the value has changed.

This behaviour results in some painful UX:

  1. we cannot resolve our own key when offline
  2. when a brand new node is created and then immediately attempts to resolve its own key (e.g. our sharness tests), it will fail to resolve, since the network has not yet been populated with the node's key

At the root of this is the CAP theorum:

The current approach puts consistency first: we make every possible effort to resolve a key to its newest value, and fail if we don't feel we can make that guarantee.

As a result, we fail to be partition tolerant: nodes not on the network or nodes who are connected only to peers who don't [yet] have your key in their DHTs.


@whyrusleeping and I discussed this a bit in IRC, but hoping to get thoughts from @jbenet (and anyone else interested!) on this.

@hackergrrl
Copy link
Contributor Author

Proposals

If we care more about consistency, let's inform the user immediately that because they are offline they cannot mount /ipns'. Or because their node is too young. It's not pleasant UX, but it's consistent with an overarching effort to put consistency first.

If we care more about partition tolerance, let's happy resolve from the local dht cache when the network is unavailable or does not provide a response.

It seems important to treat IPNS the same way throughout ipfs though. (I hear @jbenet is a strong proponent of ensuring IPNS returns correct information above all else.)

@Kubuxu
Copy link
Member

Kubuxu commented Feb 2, 2016

Couldn't we split the differences (for me the most important thing is that records are resolved fast and seamlessly for end users) and fix some issues with IPNS until Record System gets implemented (in undefined future) as currently it is really unusable for some applications (hosting websites vis IPNS, even ipfs.io uses dnslink=/ipfs/Qm...).

Problem arises because not for every application best possible consistency is important and due to CAP it means performance suffers.

Proposal

Introduce time to old and use it with conjunction with existing expiration time (24h) to allow regulated caching of resolved IPNS records. Time to old would specify how long node can use IPNS entry without confirming that it is actual. This cache would be separate from the DHT as passive updates (without full network resolution) shouldn't update the timer. If cached IPNS record is used and half of time to old has elapsed, full network resolution of that record should be performed.

This gives us known and sure caching time and still allows best possible consistency (time to old = 0) as then caching should not be performed.

It also creates two stage cache (if time to old is two times longer than resolution time) where entries that are being used would never be unavailable to the user (it happens now, while resolving page from gateway via IPNS it takes 3-10s at first then from about a minute, minute and a halt it is almost instant and then takes again 3-10s depending on many factors).

In conjunction with #1921 it would allow for usage of IPNS in applications where latency of response is important.

This also implies that publishing a IPNS entry with time to old greater than 0 and resolving it locally until that time elapses is instant.

@hackergrrl
Copy link
Contributor Author

Ref: a similar, recent discussion: #2178


@Kubuxu: I'm not sure I see the difference between the existing TTL (EOL) and the proposed time to old. TTL already means "trust this value until it expires". They both seem to answer the same question: "how long can I trust this record to be valid?"


I spoke with @jbenet a bit about this. I'll try to summarize what he said and my own understanding.

For the sake of retrieving a record that we have cached locally, there are two relevant checks that happen:

  1. The DHT asks itself "can I retrieve the value for this key from 16 (this is hardcoded ATM) different sources to verify its newness?". If it can't, or if the DHT has no peers, this fails.
  2. If the DHT returns an entry that it feels is sufficiently fresh, the record is checked for validity. Validity is (or will) be an application-specific predicate. It could exist in time (valid for N hours) or in space. It likely also involves checking a signature against the record itself.

My concern was that the DHT was placing such a harsh constraint on record retrieval: getting the latest record from 16 sources. This is an unreasonable requirement to place on the retrieval all records -- not all applications require this level of rigour.

@jbenet's explanation was that we should eventually have a record system layer above (or maybe below) the DHT layer. This would allow a separation of concerns between record retrieval and DHT value retrieval. The new algorithm for retrieving a record would instead become:

  1. The record system asks the DHT for a value, given a key.
  2. The DHT sees it in its local cache, but asks its peers, attempting like before to get 15 sources to confirm the value as freshest.
  3. This fails for some reason (not enough peers, key is too new to be widespread on the network), and so the DHT returns a failure.
  4. The record system instead looks at its local record store for the record.
  5. It finds it, and then checks it for validity (TTL/EOL, signature, etc).
  6. If it passes, return it.

This sounds reasonable: records should push their validity requirements onto the record system, not onto the DHT. It's a subtle difference, but I think this abstraction resolves the concerns mentioned here and in #2178.

@radfish
Copy link

radfish commented Mar 4, 2019

For those landing here via a search: since 0.4.19 offline resolution is supported via --offline flag either to the resolve command or globally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants