Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for Named Records #6

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

aschmahmann
Copy link

Motivation

libp2p protocols have been pretty successful despite there not being a global namespace of protocol IDs. People can just segment them by namespace and develop without requiring a centralized point of friction. This lets people experiment with their own protocols, rename them, modify them, etc. and let the spec evolve more naturally.

Given that there are tons of data transfer protocols that could (and do) exist and people can easily spin up new ones without committing to a global table, it'd be nice for them to be able to use the IPNI as a routing system also without committing to a global table

Proposal

Have a "protocol" called "Named Record" which is just a combination of Name and Record fields and let people figure plug in whatever they want.

Impact

Shouldn't change the IPNI protocol itself since it will accept arbitrary metadata bytes. However, it has two advantages:

  1. There is a defined way for people to add new protocols without going through the global table for a PR
  2. We have a defined way of mapping metadata not currently defined in this document (IPNI.md) to any routing API that is abstracted beyond the IPNI internals of "metadata are arbitrary bytes that may or may not be associated with one or more protocols"

@@ -292,6 +292,11 @@ Specified protocols are expected to be ordered in increasing order.
* http
* the proposed `uvarint` protocol is `0x3D0000`.
* the following bytes are not yet defined.
* Named Record
* Protocol `uvarint` is <TBD> in the multicodec table
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was originally going to go for something like 0x0920 but given these are strings we can certainly add another byte and go further up the table. Once we're reasonably happy here I'll open a PR on the code table and get the number.

IPNI.md Outdated
* Protocol `uvarint` is <TBD> in the multicodec table
* the following bytes should be a dag-cbor encoded struct of:
* Name, a string
* Record, a valid dag-cbor type
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any objections here? We could make this bytes if it'll make people more comfortable but given we're already in dag-cbor land this flexibility seems fine.

IPNI.md Outdated
@@ -292,6 +292,11 @@ Specified protocols are expected to be ordered in increasing order.
* http
* the proposed `uvarint` protocol is `0x3D0000`.
* the following bytes are not yet defined.
* Named Record
* Protocol `uvarint` is <TBD> in the multicodec table
* the following bytes should be a dag-cbor encoded struct of:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on if we should be saving on bytes here by making this a CBOR tuple where element 0 is the name and 1 is the Record? I took this format ro start with to mimic the GraphSync-FILv1 one.

* Protocol `uvarint` is <TBD> in the multicodec table
* the following bytes should be a dag-cbor encoded struct of:
* Name, a string
* Record, a valid dag-cbor type

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side thread: What's the reason for "Specified protocols are expected to be ordered in increasing order." and will there be problems if the same protocol is specified in the same metadata multiple times? Trying to understand if this restriction will interact with this PR in a meaningful way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea with the order is that you do not have to parse the entire metadata to find if the protocol ID you are looking for is supported or not as long as a protocol with larger ID is encountered.

@aschmahmann aschmahmann marked this pull request as ready for review January 4, 2023 14:43
@willscott
Copy link
Member

I think we're happy to support this

  • the other entries defined in multicodecs are https://github.com/multiformats/multicodec/blob/master/table.csv#L141-L142
    so maybe 0x920 is good for this one?
  • the relevant thing here is going to be the note that you're losing some of your metadata bytes to the named record. the ~100bytes total space given to metadata will include the length of the name in that allocation here.

@aschmahmann
Copy link
Author

Yeah, if it's 100 bytes then probably want to save every byte we can. My understanding from @masih was that the spec was out of date and it's now 1KiB (where the extra 10 bytes won't really matter), did I misunderstand and https://github.com/ipni/storetheindex/blob/93eb46eebb83b377c3802ac5f831ab5cb2db8023/api/v0/ingest/schema/schema.go#L20-L21 is really implementation-specific rather than an out-of-date spec?

@willscott
Copy link
Member

I guess we don't cut off until 1k currently, but most records are <50bytes, and we have to store all of them. Our scale calculations were at 100b, so i think we'd like to not encourage larger metadata records if we can avoid it. The difference between 100TB and 1PB is significant

@masih
Copy link
Member

masih commented Jan 9, 2023

A small note on the max 1KiB metadata size: the discussion on the max metadata size was in context of Naam. I agree that we do want to avoid large metadata records in general. In the context of IPNS records, the number of those records in the system as a whole are far smaller than the number of advertisements for regular content.

Therefore, I do think we should be careful with size of metadata here because this is about regular content advertisements.

@aschmahmann
Copy link
Author

I changed the record format to preserve more bytes. Any additional recommendations or thoughts? Should there be some recommendation on namespacing? If so any thoughts on what that recommendation would be? Perhaps something like myapp/myrecord/1 if we're feeling short on bytes and don't want to emulate the libp2p convention with /myapp/myprotocol/x.y.z.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants