identity hash #4697

ivan386 · 2018-02-13T12:20:52Z

Version information:

go-ipfs version: 0.4.13-
Repo version: 6
System version: amd64/windows
Golang version: go1.9.2

Type: Feature

Description:

With identity hash we can inline data in link. Useful for short or zero length data.

Trivial hash function

If the data to be hashed is small enough, one can use the data itself (reinterpreted as an integer) as the hashed value. The cost of computing this "trivial" (identity) hash function is effectively zero. This hash function is perfect, as it maps each input to a distinct hash value.

The meaning of "small enough" depends on the size of the type that is used as the hashed value. For example, in Java, the hash code is a 32-bit integer. Thus the 32-bit integer Integer and 32-bit floating-point Float objects can simply use the value directly; whereas the 64-bit integer Long and 64-bit floating-point Double cannot use this method.

Other types of data can also use this perfect hashing scheme. For example, when mapping character strings between upper and lower case, one can use the binary encoding of each character, interpreted as an integer, to index a table that gives the alternative form of that character ("A" for "a", "8" for "8", etc.). If each character is stored in 8 bits (as in extended ASCII[7] or ISO Latin 1), the table has only 28 = 256 entries; in the case of Unicode characters, the table would have 17×216 = 1114112 entries.

The same technique can be used to map two-letter country codes like "us" or "za" to country names (262 = 676 table entries), 5-digit zip codes like 13083 to city names (100000 entries), etc. Invalid data values (such as the country code "xx" or the zip code 00000) may be left undefined in the table or mapped to some appropriate "null" value.

Additional information:

z3vosKrgsh -> 01 55 00 03 00 00 00 is valid 3 bytes length "identity" hash of raw block with same length and data (00 00 00)

>ipfs block get z3vosKrgsh
Error: blockservice: key not found

>ipfs get z3vosKrgsh
Error: merkledag: not found

The text was updated successfully, but these errors were encountered:

ghost · 2018-02-13T13:58:43Z

All hashes in IPFS must be based on cryptographic hash functions. Multihash can (and does) support non-cryptographic hashes as well, but IPFS and IPLD hashes MUST be cryptographically verifiable. See #4371.

ivan386 · 2018-02-13T15:34:39Z

It data itself. No need to publish it and no need to retrieve it. Data already there. See here why we need it : #4680 (comment)

kevina · 2018-02-13T21:18:39Z

This is something I could work on assuming it is something that is wanted. cc @Stebalien

Stebalien · 2018-02-13T21:29:18Z

Yes, this is definitely something we want (well, something I want).

kevina · 2018-02-13T21:31:08Z

Okay I work on in within a week.

jbenet · 2018-02-22T11:21:55Z

All hashes in IPFS must be based on cryptographic hash functions. Multihash can (and does) support non-cryptographic hashes as well, but IPFS and IPLD hashes MUST be cryptographically verifiable. See #4371.

I think it's OK to allow the identity hash function. You could argue it's cryptographically safe for the purposes of #4371 -- we may only need CRH?

Kubuxu · 2018-02-23T02:48:37Z

There are two main issue with not safe hashes: collisions allowing to fake the data and breaking principle of DAG (weak hashes allow creation of cyclic graphs by collisions). Those issues are not present in case of identity hash so I also think it is safe to use.

mib-kd743naq · 2018-04-01T10:28:38Z

Reposting a question from the internets:

mib_kd743naq: hi! it's been a while
              the identity "hash" thing, allowing for embedding of very short data directly as the "pointer"
              is this available in a release or at least in master, or it is still WiP?

Stebalien · 2018-04-01T17:09:36Z

@mib-kd743naq Multihash supports it but it isn't supported in go-ipfs for storing blocks (yet).

mib-kd743naq · 2018-04-01T18:14:53Z

@Stebalien does the backlog label imply this is downgraded to "nice to have... someday"? Without making promises, do you have a ballpark realistic guess which version of go-ipfs may contain this feature?

Stebalien · 2018-04-01T18:27:54Z

@mib-kd743naq it means "we want this but nobody's actively working on it". Honestly, it shouldn't be that hard to implement, someone just needs to take it up and do it (I guess it warrants a "help wanted" in that case as well).

kevina · 2018-04-01T19:06:03Z

I have thought a bit about this and can likely implement it in the next one or two weeks if it is something that is wanted now.

mib-kd743naq · 2018-04-02T10:07:57Z

@kevina It'd be amazing if you can do this. Here is a simple test case for you:

xxd -r -p <<<0a0408011804121f0a080155000444415441121153686f7274203420627974652066696c651804 \
  | ipfs block put --format=protobuf

This should be a directory with a single 4 byte file containing DATA as its sole line

kevina · 2018-04-02T21:45:15Z

I think the best way to implement this would be at the blockstore level. In particular I propose this is implemented as a wrapper blockstore (in the same way the cacheing blockstores are implemented) with the following semantics (when I say "pass on the request" I mean pass on the request to the underlying blockstore):

Get(*cid.Cid) (blocks.Block, error) - if an id-hash extract the contents from the hash and return that, otherwise pass on the request
Has(*cid.Cid) (bool, error) - if an id-hash then always return true, otherwise pass on the request
Put(blocks.Block) error - - if an id-hash and HashOnRead is enabled verify that the contents match the hash, if HashOnRead is not enabled then a no-op; if not a id-hash pass on the request
PutMany([]blocks.Block) error - extract the id-hashes and call Put on each, pass on the rest
DeleteBlock(*cid.Cid) error - if an id-hash then a no-op, otherwise pass on the request
AllKeysChan(ctx context.Context) (<-chan *cid.Cid, error) - pass on the request (it is impossible to list the identify hashes as they are never stored)
HashOnRead(enabled bool) - toggle the HashOnRead flag and pass on the request

If this sounds good I will go ahead and implement it.

CC @Stebalien

Stebalien · 2018-04-02T21:54:16Z

SGTM.

mib-kd743naq · 2018-04-03T06:30:18Z

if an id-hash and HashOnRead is enabled verify that the contents match the hash

But the content is the hash, that line should be "noop if id-hash"...

Stebalien · 2018-04-03T06:59:58Z

But the content is the hash, that line should be "noop if id-hash"...

Yeah, that threw me as well. He's talking about verifying that the bytes returned by Block.Data() match the bytes in the identity hash CID returned by Block.Cid(). Someone could theoretically construct a Block where the CID doesn't match the data (even with the identity hash).

However, we can define special block type for identity Cids so we can check this invariant by doing a simple type check and only fall back on comparing bytes if we get a type that we're not expecting.

kevina · 2018-04-03T07:47:41Z

Note: Working on this right now but have not pushed anything.

Note that HashOnRead only applied when reading a hash from the BlockStore not storing it, so the new semantics of Put and PutMany is:

Put(blocks.Block) error - - if an id-hash then a no-op; if not a id-hash pass on the request
PutMany([]blocks.Block) error - remove the id-hashes and pass on the rest

Sorry for the confusion.

kevina self-assigned this Feb 13, 2018

Stebalien mentioned this issue Apr 1, 2018

Use identity multihash for small objects #4450

Closed

Stebalien added topic/perf Performance status/deferred Conscious decision to pause or backlog kind/feature A new feature labels Apr 1, 2018

Stebalien added the help wanted Seeking public contribution on this issue label Apr 1, 2018

kevina mentioned this issue Apr 3, 2018

Add initial support for identity hashes. #4910

Closed

ghost added status/in-progress In progress and removed status/deferred Conscious decision to pause or backlog labels Apr 3, 2018

kevina removed the help wanted Seeking public contribution on this issue label Apr 3, 2018

ivan386 mentioned this issue Apr 11, 2018

Length Limit for CID/Multihash? #4918

Closed

kevina mentioned this issue Jun 25, 2018

Add initial support for identity hashes (Part 1) #5117

Merged

momack2 added this to In Progress in ipfs/go-ipfs May 9, 2019

magik6k closed this as completed May 14, 2019

ivan386 mentioned this issue Aug 12, 2019

Adding additional file metadata to UnixFSv1 ipfs/specs#217

Closed

ivan386 mentioned this issue Sep 2, 2019

Add file metadata and permissions to UnixFSv1 ipfs/specs#220

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

identity hash #4697

identity hash #4697

ivan386 commented Feb 13, 2018

ghost commented Feb 13, 2018

ivan386 commented Feb 13, 2018 •

edited

Loading

kevina commented Feb 13, 2018

Stebalien commented Feb 13, 2018

kevina commented Feb 13, 2018

jbenet commented Feb 22, 2018

Kubuxu commented Feb 23, 2018

mib-kd743naq commented Apr 1, 2018

Stebalien commented Apr 1, 2018

mib-kd743naq commented Apr 1, 2018

Stebalien commented Apr 1, 2018

kevina commented Apr 1, 2018

mib-kd743naq commented Apr 2, 2018

kevina commented Apr 2, 2018 •

edited

Loading

Stebalien commented Apr 2, 2018

mib-kd743naq commented Apr 3, 2018

Stebalien commented Apr 3, 2018

kevina commented Apr 3, 2018

identity hash #4697

identity hash #4697

Comments

ivan386 commented Feb 13, 2018

Version information:

Type: Feature

Description:

Additional information:

ghost commented Feb 13, 2018

ivan386 commented Feb 13, 2018 • edited Loading

kevina commented Feb 13, 2018

Stebalien commented Feb 13, 2018

kevina commented Feb 13, 2018

jbenet commented Feb 22, 2018

Kubuxu commented Feb 23, 2018

mib-kd743naq commented Apr 1, 2018

Stebalien commented Apr 1, 2018

mib-kd743naq commented Apr 1, 2018

Stebalien commented Apr 1, 2018

kevina commented Apr 1, 2018

mib-kd743naq commented Apr 2, 2018

kevina commented Apr 2, 2018 • edited Loading

Stebalien commented Apr 2, 2018

mib-kd743naq commented Apr 3, 2018

Stebalien commented Apr 3, 2018

kevina commented Apr 3, 2018

ivan386 commented Feb 13, 2018 •

edited

Loading

kevina commented Apr 2, 2018 •

edited

Loading