Skip to content

feat: Arweave offsets as subdomains.#753

Merged
samcamwilliams merged 11 commits intoimpr/gunfrom
feat/offset-names
Mar 16, 2026
Merged

feat: Arweave offsets as subdomains.#753
samcamwilliams merged 11 commits intoimpr/gunfrom
feat/offset-names

Conversation

@samcamwilliams
Copy link
Collaborator

@samcamwilliams samcamwilliams commented Mar 15, 2026

This PR introduces an additional ~name@1.0 resolver, enabled by default, that interprets subdomains containing only numbers as references to byte offsets in Arweave.

This allows users to resolve subdomains like these:
https://89269846458279.arweave.net
http://353022643731662.localhost:8734/

Additionally, the syntax supports unit specifiers in the form of kb, mb, through to yb. The *ibibyte form is also supported through the addition of an i after the primary specifier. In either case, the b can be dropped at users preference:
https://89269846460k.arweave.net
http://353022643732kb.localhost:8734/
http://160399273m.localhost:8734/
http://101t.localhost:8734/
http://100tib.localhost:8734/

Previously, if one did not have a human-readable name for their Arweave page, visitors would have to load/share a URL that was often over 100 characters -- excluding any site-specific components (subpaths, query parameters etc). This is unnecessarily off-putting for newbies.

Because of Arweave's linear address space, every piece of data has a specific byte offset at which its content starts and ends. This PR simply makes that address space practically accessible via the URL bar on HyperBEAM nodes.

These 'names' (alternative identifiers might be more appropriate?) are not super friendly to look at, but they do come with a much of neat advantages:

  1. They are short: New data today gets at base a 15 digit identifier compared with the 43 characters of Arweave IDs plus the 52 characters used for gateway secure sandboxes. Most data crosses at least the kb boundary, and as such can be expressed with a k post-fix, dropping 2 characters. Larger data often passes a GB boundary, allowing 8 characters to be dropped. Lucky uploaders will occasionally get a t boundary, giving them 4 character names. The uploader of the petabyte'th byte will gain a 2 character name: 1p.
  2. They are free: Any data uploaded to Arweave gets an offset assigned without any additional charge.
  3. They are permanent: Once assigned and out of Arweave's fork recovery depth (~19 blocks), an offset's content will never change.
  4. Data age is evident from the domain alone: Older data has lower byte offsets.
  5. Serving requires zero indexing: All Arweave nodes are able to lookup the chunks and proofs associated with a given offset from their internal state. Subsequently, all that is needed to serve these addresses is a functioning Arweave node. By contrast, serving traditional 43 character domains found inside bundles requires the node to have created the (relatively heavy) indexes that relate ID->Arweave offset.

You can find the offset for any existing Arweave data with the following AO-Core hashpath: http://localhost:8734/~arweave@2.9/raw=1rL73ctmqTVv7qkqAsD4jIz5tU6WxgOAqABYuMHg5mQ/offset.

%% the `tx_path` for the chunk to find the size of the bundle that contains the
%% item. We then use the `note` attached to the Merkle leaf of the `data_path`
%% for the chunk to find the offset of the end of the chunk inside the bundle.
item_size_from_offset(StartOffset, ChunkJSON, Opts) ->
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 questions:

  1. Can we use the merkle notes/offsets to recompute the chunk absolute end offset? If so we can switch HB to using the chunk2 binary interface rather than than the chunk json interface. The main reason we're still using the json interface is because I was able to add the absolute end offset to the JSON response in a backward compatible manner (something that's harder to do with the binary serialization). But if the getting the offset from the Data path allows us to compute teh absolute end offset (maybe merkle note + tx start offset?) then we can migrate to chunk2 without compatibility issues.

  2. Will this logic handle merkle rebasing? It's been a while since I reviewed that code, but I think we add a special identifier when we we merge two merkle trees so that we know how far to shift the right-hand subtree. Current logic I think doesn't handle that shift? (may be largely moot as I'm not sure anyone has used merkle rebasing in production, so the only merkle rebased TXs in the weave might be dev transactions)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We can get the TX-relative end-of-chunk offset, but not the global one from the note. Not sure that alone solves your problem, unfortunately? That said, HTTP headers should! Why not just add an absolute-end-offset in outbound headers on chunk2?
  2. Good point! It did not, but I think the updated version should. Although I also agree that it shouldn't stop us experimenting either way, as there are no live uses of rebasing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, HTTP headers should! Why not just add an absolute-end-offset in outbound headers on chunk2?

Ah right, I think you mentioned that before as well. I keep forgetting that workaround. Okay when I get a moment I'll see about adding that info to the arweave node headers.

%% @doc Load an ANS-104 item whose header begins at the given global offset.
load_item_at_offset(StartOffset, Length, Opts) ->
maybe
{ok, ChunkJSON, FirstChunk} ?= item_chunk_from_offset(StartOffset, Opts),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Occasionally we have bundle headers that require 2 chunks. I've only seen it for the L1 TX bundles (e.g. the 15,000 item Turbo bundles require). This is where that case is handled in copycat: https://github.com/permaweb/HyperBEAM/blob/impr/gun/src/dev_copycat_arweave.erl#L379-L428

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@samcamwilliams samcamwilliams merged commit 9414b31 into impr/gun Mar 16, 2026
@samcamwilliams samcamwilliams deleted the feat/offset-names branch March 16, 2026 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants