Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Index format that contains the hashing algorithm information #214

Closed
masih opened this issue Aug 27, 2021 · 0 comments · Fixed by #217
Closed

New Index format that contains the hashing algorithm information #214

masih opened this issue Aug 27, 2021 · 0 comments · Fixed by #217
Assignees

Comments

@masih
Copy link
Member

masih commented Aug 27, 2021

The current IndexSorted format contains the digest of the CIDs within in a CAR and offset to where the data is located. This format does not contain enough information to reconstruct list of multihashes in the CAR from index alone, because it does not store the hash function used to generate the digests.

Data providers to indexer nodes should ideally be able to use the CAR index alone to supply the list of CIDs in a CAR without having to scan the entire CAR file. When data providers run as part of FileCoin miners, there will be indeed cases where the only accessible information about a CAR is its detached index.

Since for indexing purposes, the codec information in CID is ignored (see ipfs/kubo#6815) the CAR index should at least expose the hash function used to generate the digests within the index. This way the data providers are able to re-construct the list of CAR multihashes.

masih added a commit to multiformats/multicodec that referenced this issue Sep 2, 2021
Define a new codec for CARv2 `MultihashIndexSorted`.

See:
- ipld/go-car#217
- ipld/go-car#214
masih added a commit to multiformats/multicodec that referenced this issue Sep 2, 2021
Define a new codec for CARv2 `MultihashIndexSorted`.

See:
- ipld/go-car#217
- ipld/go-car#214
masih added a commit that referenced this issue Sep 2, 2021
Implement a new CARv2 index that contains enough information to
reconstruct the multihashes of the data payload, since `CarIndexSorted`
only includes multihash digests. Note, this index intentionally ignores
any given record with `multihash.IDENTITY` CID hash.

Add a test that asserts offsets for the same CID across sorted index and
new multihash sorted index are consistent.

Add tests that assert marshal unmarshalling of the new index type is as
expected, and it does not load records with `multihash.IDENTITY` digest.

Note, there is a need for a multicodec to be defined for the new index
type. For now TODOs are left since it requires coordination across
repos.

Relates to:
- multiformats/multicodec#227

Fixes:
- #214
masih added a commit that referenced this issue Sep 2, 2021
Implement a new CARv2 index that contains enough information to
reconstruct the multihashes of the data payload, since `CarIndexSorted`
only includes multihash digests. Note, this index intentionally ignores
any given record with `multihash.IDENTITY` CID hash.

Add a test that asserts offsets for the same CID across sorted index and
new multihash sorted index are consistent.

Add tests that assert marshal unmarshalling of the new index type is as
expected, and it does not load records with `multihash.IDENTITY` digest.

Note, there is a need for a multicodec to be defined for the new index
type. For now TODOs are left since it requires coordination across
repos.

Relates to:
- multiformats/multicodec#227

Fixes:
- #214
@masih masih self-assigned this Sep 2, 2021
masih added a commit to multiformats/multicodec that referenced this issue Sep 2, 2021
Define a new codec for CARv2 `MultihashIndexSorted`.

See:
- ipld/go-car#217
- ipld/go-car#214
masih added a commit to multiformats/go-multicodec that referenced this issue Sep 2, 2021
Update submodule to `1bcdc083898abb3e92b132f951e0a2fe0dcd485b`.

Run `go generate`.

Note, the code generation includes other changes to the codec table that
have been merged but not generated here.

See:
- multiformats/multicodec#227
- multiformats/multicodec@1bcdc08
- ipld/go-car#214
masih added a commit to multiformats/go-multicodec that referenced this issue Sep 2, 2021
Update submodule to `1bcdc083898abb3e92b132f951e0a2fe0dcd485b`.

Run `go generate`.

Note, the code generation includes other changes to the codec table that
have been merged but not generated here.

See:
- multiformats/multicodec#227
- multiformats/multicodec@1bcdc08
- ipld/go-car#214
masih added a commit to multiformats/go-multicodec that referenced this issue Sep 2, 2021
Update submodule to `1bcdc083898abb3e92b132f951e0a2fe0dcd485b`.

Run `go generate`.

Note, the code generation includes other changes to the codec table that
have been merged but not generated here.

See:
- multiformats/multicodec#227
- multiformats/multicodec@1bcdc08
- ipld/go-car#214
masih added a commit that referenced this issue Sep 2, 2021
Implement a new CARv2 index that contains enough information to
reconstruct the multihashes of the data payload, since `CarIndexSorted`
only includes multihash digests. Note, this index intentionally ignores
any given record with `multihash.IDENTITY` CID hash.

Add a test that asserts offsets for the same CID across sorted index and
new multihash sorted index are consistent.

Add tests that assert marshal unmarshalling of the new index type is as
expected, and it does not load records with `multihash.IDENTITY` digest.

Relates to:
- multiformats/multicodec#227

Fixes:
- #214
masih added a commit that referenced this issue Sep 7, 2021
Implement a new CARv2 index that contains enough information to
reconstruct the multihashes of the data payload, since `CarIndexSorted`
only includes multihash digests. The new index builds on top of the
existing `IndexSorted` by adding an additional layer of grouping the
multi-width indices by their multihash code.

Note, this index intentionally ignores
any given record with `multihash.IDENTITY` CID hash.

Add a test that asserts offsets for the same CID across sorted index and
new multihash sorted index are consistent.

Add tests that assert marshal unmarshalling of the new index type is as
expected, and it does not load records with `multihash.IDENTITY` digest.

Relates to:
- multiformats/multicodec#227

Fixes:
- #214
@masih masih closed this as completed in #217 Sep 7, 2021
masih added a commit that referenced this issue Sep 7, 2021
Implement a new CARv2 index that contains enough information to
reconstruct the multihashes of the data payload, since `CarIndexSorted`
only includes multihash digests. The new index builds on top of the
existing `IndexSorted` by adding an additional layer of grouping the
multi-width indices by their multihash code.

Note, this index intentionally ignores
any given record with `multihash.IDENTITY` CID hash.

Add a test that asserts offsets for the same CID across sorted index and
new multihash sorted index are consistent.

Add tests that assert marshal unmarshalling of the new index type is as
expected, and it does not load records with `multihash.IDENTITY` digest.

Relates to:
- multiformats/multicodec#227

Fixes:
- #214
Jorropo pushed a commit to ipfs/boxo that referenced this issue Mar 22, 2023
Implement a new CARv2 index that contains enough information to
reconstruct the multihashes of the data payload, since `CarIndexSorted`
only includes multihash digests. The new index builds on top of the
existing `IndexSorted` by adding an additional layer of grouping the
multi-width indices by their multihash code.

Note, this index intentionally ignores
any given record with `multihash.IDENTITY` CID hash.

Add a test that asserts offsets for the same CID across sorted index and
new multihash sorted index are consistent.

Add tests that assert marshal unmarshalling of the new index type is as
expected, and it does not load records with `multihash.IDENTITY` digest.

Relates to:
- multiformats/multicodec#227

Fixes:
- ipld/go-car#214


This commit was moved from ipld/go-car@42b9e28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant