Skip to content
This repository has been archived by the owner on Apr 29, 2020. It is now read-only.

Why do IPFS hashes start with "Qm"? #22

Closed
1 task done
moreati opened this issue Jul 26, 2015 · 27 comments
Closed
1 task done

Why do IPFS hashes start with "Qm"? #22

moreati opened this issue Jul 26, 2015 · 27 comments
Labels

Comments

@moreati
Copy link

moreati commented Jul 26, 2015

Answer:

IPFS represents the hash of files and objects using Multihash format and Base58 encoding. The letters Qm happen to correspond with the algorithm (SHA-256) and length (32 bytes) used by IPFS.

TODO:

  • Is the prefix always 'Qm'? Yes, in current builds
@jbenet
Copy link
Contributor

jbenet commented Jul 27, 2015

Is the prefix always 'Qm'?

No, if objects were hashed with other functions, the prefix would be different. try out this binary: https://github.com/jbenet/go-multihash/tree/master/multihash

@moreati
Copy link
Author

moreati commented Jul 27, 2015

Thanks, when does IPFS use another function? So far I've only seen Qm.

@jbenet
Copy link
Contributor

jbenet commented Jul 27, 2015

@moreati we use sha256-256, but it's not hard for someone to re-compile ipfs to use another function as a default, or change the importer code to add a way to specify the multihash choice.

@RichardLitt
Copy link
Contributor

I see this question a lot; reopening it and labelling it as 'answered' to increase visibility.

@RichardLitt
Copy link
Contributor

How is Qm the string? I'm a bit confused on that, because I don't see it in any of the tables on the Multihash repo.

@Kubuxu
Copy link

Kubuxu commented Oct 6, 2016

It is base58-btc encode of the two bytes prefix of that multihash.

@JustinDrake
Copy link

I'm a bit confused by "The letters Qm happen to correspond with the algorithm (SHA-256) and length (32 bytes)"

Is it that Q corresponds to SHA-256, and m corresponds to 32?

@hsanjuan
Copy link
Member

@JustinDrake If I got it right, multihashes start with a byte (0x12) which indicates the hashing algorithm, followed by another byte for length (0x20) . "Qm" letters are the result of those bytes encoded in base58.

Source: https://github.com/multiformats/go-multihash/blob/master/multihash.go#L146

@alexanderattar
Copy link

maybe I just haven't put all the pieces together, but I'm wondering if anyone here can off-hand explain the steps for how one might precompute the IPFS hash (Under the current build) given some JSON. Thanks in advance!

@RichardLitt
Copy link
Contributor

@alexanderattar You can do that using ipfs add -n, iirc. It doesn't add it to IPFS, it merely spits out the hash for you.

@alexanderattar
Copy link

ah thanks @RichardLitt! I guess my question is actually in regard to how I would approach this programmatically by running the JSON through the encoding algorithms without necessarily using the IPFS CLI. I am looking to take some JSON I have in JavaScript and precompute the hash before sending to IPFS if that helps explain the use-case.

@alexanderattar
Copy link

alexanderattar commented Apr 18, 2017

I just want to add that I have tried encrypting JSON via sha256 to get a hash such as 4f72333148622e4ae56e9c65d57aee47186cd6910ca080757ab72cc0c650f6bb and have prefixed this with 1220 and then taken the entire string with the prefix:

122000c75938d356b000b34e7f7885f8982f29d89af76c234a8d439486b40fdc5469

and after running that through a base58 encoding, I get a hash that resembles an IPFS hash with the prefixed Qm, but the hash is not consistent with was is returned from adding the same JSON to IPFS via the command-line. I am wondering if I am doing something wrong, or missing a step. Thanks again!

@Kubuxu
Copy link

Kubuxu commented Apr 18, 2017

Try doing the add with --raw-leaves option instead but now you have to add a CID at the front. https://github.com/ipld/cid

Where raw block has multicodec of 0x55.

@alexanderattar
Copy link

Thanks @Kubuxu but I just want to reiterate that I am not looking to use the ipfs CLI, but rather go through the necessary encryption and encoding steps to get from JSON to a IPFS hash. So far my method has been:

Take the JSON and encode via SHA256 to get the digest, then prefix the digest with 1220 as described here, so the entire hex string is composed of the prefix plus the digest and then base58 encoded. Does any of this approach sound incorrect?

@Kubuxu
Copy link

Kubuxu commented Apr 18, 2017

IPFS by default also wraps the file you give it into some metadata used by ipfs itself. That is why it is different. --raw-leaves utilizes CID to communicate to others that data under the hash has no wrapping.

@alexanderattar
Copy link

Wow, I was not aware of that, but that does explain why the hash is different. Is there documentation on the metadata IPFS wraps the file in before generating the SHA256 digest? I tried using the --raw-leaves flag which indeed gives me a different hash:

{"Name":"myfile.txt","Hash":"zb2rhnyuQdBJVhb3j7FAL1NRUrQu4TMkb7zED9S5sh2YCKd62"}

but I have not found any documentation on what algorithms and processing the data goes through to produce this hash.

@madavieb
Copy link

This issue has been moved to https://discuss.ipfs.io/t/why-do-ipfs-hashes-start-with-qm/477.

@lautarodragan
Copy link

@alexanderattar have you had any luck generating the hash simulating the metadata?

@alexanderattar
Copy link

Hi @lautarodragan, I recently revisited this and got some help from someone who was working on something similar using the js-ipfs implementation. Check out this thread: ipfs/js-ipfs#1205. Hope it helps!

@lautarodragan
Copy link

Thanks @alexanderattar! Don't know why I didn't see your response earlier, but it's a lot of help. That test you wrote sheds some light on the inner workings of the IPFS hashs. I'll give it a try!

@alexanderattar
Copy link

alexanderattar commented Jul 5, 2018 via email

@NiKiZe
Copy link

NiKiZe commented Jul 17, 2018

It would be good to have an example implementation of (lets call it IPFS multihash) ipfsmh
the goal would be for it to work just as sha256sum or md5 does on a file, and it would be usable as digest.

using ipfs add -n file requires ipfs init so it is not an option to use if one only wants to get the hash pre-add.

Background:
what I would actually want this to be used for is to have packages (sourcecode for programs) be downloadable via ipfs, but would use legacy http for fallback. this way it could be populated on the fly without the ones creating the packages having to use ipfs in any way.

@Stebalien
Copy link

@NiKiZe that sounds like a great idea! Currently, that tool would have to depend on the go-ipfs repo itself but we should be able to extract the requisite components into separate repos eventually.

@NiKiZe
Copy link

NiKiZe commented Jul 17, 2018

@Stebalien thanks for your reply.
There is a few parts to this, the ipfsmh could of-course be based on the official client as a start, however as I wrote (but then removed before posting DOH!), having a ipfs client or even go installed where this "needs" to run might not be an option, it needs to be self contained C or Python, which seems to be doable from what I understand?

The code would preferably be small enough that it can be copied over to a different machine by hand without to much effort.
Another reason would be to document how the hashing actually works.
Understandably this would not implement the latest and greatest logic. but then again it can't (or at least shouldn't) change either if it is used as a hash such as sha256. (but thinking about that part more there is other things that I might be confused about but is not relevant for this issue)

@Stebalien
Copy link

having a ipfs client or even go installed where this "needs" to run might not be an option, it needs to be self contained C or Python, which seems to be doable from what I understand?

I agree that a full ipfs client is not the solution. However, go builds really portable binaries so I don't really see any reason to go with C (let alone Python).

The code would preferably be small enough that it can be copied over to a different machine by hand without to much effort.

There are a lot of options that can affect the resulting hash so the tool would need to replicate all the options available on ipfs add.


Note: if you want something reasonably portable, you can probably build it with js-ipfs and run it with node (or a browser). However, that's going to be a bit clunky.

@NiKiZe
Copy link

NiKiZe commented Jul 18, 2018

To run a go binary, go needs to be installed
To run js we need an engine to run that
I agree that it is portable in the sense that one bin runs on anything.. however it is not portable in the sense that it has dependencies that in many cases are not available, and can't be made available.
From an educational point of view however many have understanding of Python
I will tell you more, but again that part is not relevant for this issue. (I will try to ping you on IRC)

@Stebalien
Copy link

To run a go binary, go needs to be installed

Nope, go is a compiled language. It just compiles really fast so you can run go programs with go run myprogram.go. However, you can compile them with go build myprogram.go.

To run js we need an engine to run that

Same with Python. Personally, I wouldn'g go with either.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests