Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion to avoid exra layer of indirection by using a more compact representation of Cid #3

Closed
kevina opened this issue Sep 23, 2016 · 12 comments
Labels
need/analysis Needs further analysis before proceeding

Comments

@kevina
Copy link
Contributor

kevina commented Sep 23, 2016

The conversion from plain Key to Cid introduced an extra layer of indirection that can likely be avoided.

Currently Cid takes up the width of 5*64 bits. With it's current definition of:

type Cid struct {
    version uint64
    codec   uint64
    hash    mh.Multihash
}

May I suggest

type Cid struct {
    version uint32
    codec   uint32
    hash    mh.Multihash
}

As 32 bits should be more than enough for version and codec which I don't think will ever get very large (even 16 bits should be enough). If combined with multiformats/go-multihash#29 the size will now be 3*64 bits, which is the same size an array slice.

As the same size of a slice it becomes practical to pass it around by value which avoids an extra layer of indirection. Even though it is passed by value, the internals can still be kept hidden, as is done with many other go datatypes.

Just a suggestion. It might make a difference when you have an array of 1000's of Cids (which I might in some of my maintenance commands for the filestore, ipfs/kubo#2634).

@whyrusleeping
Copy link
Member

@kevina Yeah, this is a pretty good point. I'm not opposed to this.

@kevina
Copy link
Contributor Author

kevina commented Nov 17, 2016

Okay, this is something I will measure then. I will also measure the benefits (and cons) of using a string for a Multihash.

It may be a while before I can get to it as I can't just change the Cid form a pointer to a value type without lots of code change so I will need to isolate the benchmark code to have a few dependencies as possible before I can measure anything.

@kevina
Copy link
Contributor Author

kevina commented Nov 18, 2016

I did some informal test and determined that the overhead for the benchmark in ipfs/kubo#3376 was around 17ms. Most of the time for that benchmark was in the base32 decoding (see whyrusleeping/base32#1). This might still be useful but nothing urgent.

@Stebalien
Copy link
Member

So, I opened a competing issue (#25) because having to call KeyString() every time one wants to use a CID in a map is a pain (and slow). However, making multihash a string would also fix this issue.

@Kubuxu
Copy link
Member

Kubuxu commented Aug 11, 2017

@Stebalien as I am updating GC code that relies heavily on the indexing by Cid having the Cid stored as string possibly could improve it quite a bit.

@Stebalien
Copy link
Member

Taking a look at my heap profile, it looks like the KeyString function is now responsible for ~10% of my allocated memory.

@kevina
Copy link
Contributor Author

kevina commented Oct 19, 2017

@Stebalien making multihash a string is actually a different issue in a different repo, multiformats/go-multihash#29. @whyrusleeping was concerned about the cost of converting the string to a []byte but maybe now its the cost of converting a []byte to a string that is really the problem.

@Stebalien
Copy link
Member

Ah. I see. Either that or just storing CIDs in as byte strings (fully serialized) would work.

@kevina
Copy link
Contributor Author

kevina commented Oct 19, 2017

@Stebalien so if I understand you correctly, what you want is for the official CID representation to be a serialized string. That would be the most efficient in memory representative however they may be a non-trivial cost then if we need information on the Cid itself.

If there is any interest this is something I could look into.

@Stebalien
Copy link
Member

My main concern is that I want it to be usable as a key in a map. We can achieve this by:

  1. Storing the multihash as a string and using the raw Cid (not a pointer to it) as a map key.
  2. Storing the encoded CID inside of the CID as a string (struct { version uint64, codec uint64, cidStr string }). KeyString would return this (slightly more space).
  3. Just using the bare CID as a string.

@kevina
Copy link
Contributor Author

kevina commented Oct 20, 2017

@Stebalien ah, now I see what you are after. In most of these cases we should stop passing around CIDs as pointers (see #38). I am not sure if the Cid type necessary should be directly storable as map keys, but we should definitely aim to reduce the cost of KeyString.

@kevina
Copy link
Contributor Author

kevina commented Aug 31, 2018

We are going with a string representation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/analysis Needs further analysis before proceeding
Projects
None yet
Development

No branches or pull requests

4 participants