Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to retrieve blocks even if given using a different CID version #5285

Merged
merged 2 commits into from
Aug 8, 2018

Conversation

kevina
Copy link
Contributor

@kevina kevina commented Jul 26, 2018

This is one way to do it.

Another way is to create (yet another) wrapper blockstore. However, this block store needs to be implemented very carefully to avoid breaking things. In particular only the Get method can be changed.

Although it would seam logical the Has method can not be modified because otherwise that might prevent blocks from getting added that should as we check if a block exists before adding it in the blockservice. This could create a problem if for example a CIDv0 block is in the blockstore and a CIDv1 of the block is added and pinned. If the Has method was modified then the CIDv1 will never get added and if the CIDv0 version is not pinned it would get removed when the garbage collector is ran and both the CIDv0 and CIDv1of the block is will become inaccessible.

I added a test case to test for the above case. Any alternative solutions would need to pass my tests.

This needs to be implemented carefully to avoid breaking pinning. I added a test case to test to show the problem.

See #5285 (comment) for updated method.

Towards https://github.com/ipfs/ipfs/issues/337.

@kevina kevina requested a review from Kubuxu as a code owner July 26, 2018 00:34
@ghost ghost assigned kevina Jul 26, 2018
@ghost ghost added the status/in-progress In progress label Jul 26, 2018
@kevina kevina added the topic/cidv1b32 Topic cidv1b32 label Jul 27, 2018
@magik6k magik6k self-requested a review July 27, 2018 09:58
@magik6k
Copy link
Member

magik6k commented Jul 27, 2018

I'd say that this should be done on the blockstore layer so that bitswap can benefit from that too. We'll need to make GC operate on multihashes to keep it working in that case.

@kevina
Copy link
Contributor Author

kevina commented Jul 27, 2018

I'd say that this should be done on the blockstore layer so that bitswap can benefit from that too.

I forgot about that. I have not tested with bitswap, but for bitswap to work I assume the Has method needs to be implemented, which will create the problem outlined above.

We'll need to make GC operate on multihashes to keep it working in that case.

That will be difficult or at least messy. The problem is if you only have the multihash you won't be able to delete anything until a full migration is done. If the multihash corresponds to a CIDv1 we will not be able to delete the block as we lost the Codec field of the CID, thus making it impossible to reconstruct the CID and find the block in the blockstore to delete.

It may be possible to modify the GC other ways to make this work though...

It may be possible to make this work, but it could be messy.

@kevina
Copy link
Contributor Author

kevina commented Jul 29, 2018

After thinking about this for a while I think I found a way to do this at the blockstore level and not break pinning.

The initial problem with doing this at the blockstore level was that if a CidV0 is already in the blockstore and an identical (content wise) CidV1 block is added, the CidV1 block will not get added due to the blockservice first checking if the block exists using Has. This will cause pinning problems as outlined above.

However, when ever a block is pinned is is also fetched first. So the solution I came up was when a say CidV0 block is in the blockstore but a CidV1 block is fetched, in addition to returning the CidV1 block also add it the to blockstore, so that if the CidV0 is ever deleted the CidV1 can still be retrieved.

This is not optional because it might lead to some unnecessary duplication and if you add a block without pinning the block might not get added, but it seams a fairly clean and easy solution for now.

A more optimized solution is to only store the block if it is fetched by the pinner, but that will involve having two versions of the blockstore and I don't think the duplication will be a problem in practice. Especially since this is a temporary measure until we migrate the blockstore to store blocks my multihash instead of full Cids.

@whyrusleeping @Stebalien what to you think?

@Stebalien
Copy link
Member

Stebalien commented Aug 7, 2018

After discussing this on IRC, I believe that this would be strictly better than the situation we have today so I'm fine with it.

But we should try to get in the codec-agnostic blockstore ASAP.

@kevina
Copy link
Contributor Author

kevina commented Aug 7, 2018

But we should try to get in the codec-agnostic blockstore ASAP.

I agree.

I will rebase this shortly.

…rsion.

This is done via a wrapper blockstore.

License: MIT
Signed-off-by: Kevin Atkinson <k@kevina.org>
@kevina
Copy link
Contributor Author

kevina commented Aug 7, 2018

Rebased.

@kevina kevina changed the title RFC: Add ability to retrieve blocks even if given using a different CID version Add ability to retrieve blocks even if given using a different CID version Aug 7, 2018
@kevina kevina added the need/review Needs a review label Aug 7, 2018
@Stebalien
Copy link
Member

By strictly better I mean that, without this patch, we'd go to the network to fetch the block in question (possibly failing, possibly succeeding). In the success case, we'd store the duplicate block anyways. With this patch, we just store it up-front.

Copy link
Member

@magik6k magik6k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 small thing, rest lgtm


func tryOtherCidVersion(c *cid.Cid) *cid.Cid {
prefix := c.Prefix()
if prefix.Codec != cid.DagProtobuf {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should check hash alg too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

License: MIT
Signed-off-by: Kevin Atkinson <k@kevina.org>
@kevina kevina added RFM and removed need/review Needs a review labels Aug 7, 2018
@whyrusleeping whyrusleeping merged commit 324615d into master Aug 8, 2018
@whyrusleeping whyrusleeping deleted the kevina/cidv0v1 branch August 8, 2018 04:19
@ghost ghost removed the status/in-progress In progress label Aug 8, 2018
@rob-deutsch
Copy link
Contributor

Okay, so I have a potentially silly question - why do Bockstores/Blockservices/Bitswap still operate on cid.Cid instead of multihash.Multihash?

I would've thought the multihash is the only important attribute for storage of the block - and that the other fields of CIDv1 were only useful for linking purposes.

@kevina
Copy link
Contributor Author

kevina commented Sep 22, 2018

@rob-deutsch you are correct. The blockservice will be converted as part of the move to Cidv1 base32. See #5231 Bitswap will continue to use Cids for some future plan where the links are important. I do not know anything about this.

In any case this is not the place to discuss this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFM topic/cidv1b32 Topic cidv1b32
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants