feat: Arweave offsets as subdomains.#753
Conversation
src/dev_arweave_offset.erl
Outdated
| %% the `tx_path` for the chunk to find the size of the bundle that contains the | ||
| %% item. We then use the `note` attached to the Merkle leaf of the `data_path` | ||
| %% for the chunk to find the offset of the end of the chunk inside the bundle. | ||
| item_size_from_offset(StartOffset, ChunkJSON, Opts) -> |
There was a problem hiding this comment.
2 questions:
-
Can we use the merkle notes/offsets to recompute the chunk absolute end offset? If so we can switch HB to using the
chunk2binary interface rather than than thechunkjson interface. The main reason we're still using the json interface is because I was able to add the absolute end offset to the JSON response in a backward compatible manner (something that's harder to do with the binary serialization). But if the getting the offset from the Data path allows us to compute teh absolute end offset (maybe merkle note + tx start offset?) then we can migrate to chunk2 without compatibility issues. -
Will this logic handle merkle rebasing? It's been a while since I reviewed that code, but I think we add a special identifier when we we merge two merkle trees so that we know how far to shift the right-hand subtree. Current logic I think doesn't handle that shift? (may be largely moot as I'm not sure anyone has used merkle rebasing in production, so the only merkle rebased TXs in the weave might be dev transactions)
There was a problem hiding this comment.
- We can get the TX-relative end-of-chunk offset, but not the global one from the note. Not sure that alone solves your problem, unfortunately? That said, HTTP headers should! Why not just add an
absolute-end-offsetin outbound headers onchunk2? - Good point! It did not, but I think the updated version should. Although I also agree that it shouldn't stop us experimenting either way, as there are no live uses of rebasing.
There was a problem hiding this comment.
That said, HTTP headers should! Why not just add an absolute-end-offset in outbound headers on chunk2?
Ah right, I think you mentioned that before as well. I keep forgetting that workaround. Okay when I get a moment I'll see about adding that info to the arweave node headers.
src/dev_arweave_offset.erl
Outdated
| %% @doc Load an ANS-104 item whose header begins at the given global offset. | ||
| load_item_at_offset(StartOffset, Length, Opts) -> | ||
| maybe | ||
| {ok, ChunkJSON, FirstChunk} ?= item_chunk_from_offset(StartOffset, Opts), |
There was a problem hiding this comment.
Occasionally we have bundle headers that require 2 chunks. I've only seen it for the L1 TX bundles (e.g. the 15,000 item Turbo bundles require). This is where that case is handled in copycat: https://github.com/permaweb/HyperBEAM/blob/impr/gun/src/dev_copycat_arweave.erl#L379-L428
…used as an offset name
…n of mid-message offsets
Co-authored-by: Rani <charmful0x@gmail.com>
bea4a36 to
6d90c64
Compare
This PR introduces an additional
~name@1.0resolver, enabled by default, that interprets subdomains containing only numbers as references to byte offsets in Arweave.This allows users to resolve subdomains like these:
https://89269846458279.arweave.nethttp://353022643731662.localhost:8734/Additionally, the syntax supports unit specifiers in the form of
kb,mb, through toyb. The*ibibyteform is also supported through the addition of aniafter the primary specifier. In either case, thebcan be dropped at users preference:https://89269846460k.arweave.nethttp://353022643732kb.localhost:8734/http://160399273m.localhost:8734/http://101t.localhost:8734/http://100tib.localhost:8734/Previously, if one did not have a human-readable name for their Arweave page, visitors would have to load/share a URL that was often over 100 characters -- excluding any site-specific components (subpaths, query parameters etc). This is unnecessarily off-putting for newbies.
Because of Arweave's linear address space, every piece of data has a specific byte offset at which its content starts and ends. This PR simply makes that address space practically accessible via the URL bar on HyperBEAM nodes.
These 'names' (alternative identifiers might be more appropriate?) are not super friendly to look at, but they do come with a much of neat advantages:
kbboundary, and as such can be expressed with akpost-fix, dropping 2 characters. Larger data often passes a GB boundary, allowing 8 characters to be dropped. Lucky uploaders will occasionally get atboundary, giving them 4 character names. The uploader of the petabyte'th byte will gain a 2 character name:1p.You can find the offset for any existing Arweave data with the following AO-Core hashpath:
http://localhost:8734/~arweave@2.9/raw=1rL73ctmqTVv7qkqAsD4jIz5tU6WxgOAqABYuMHg5mQ/offset.