You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
add --nocopy should add files to the filestore instead of the blockstore, but it doesn't always do so. Blocks are not added to the filestore if the same blocks happen to be in the blockstore. In this case, the blocks won't end up in the filestore, and so won't be seen in filestore ls. This hinders the user from achieving non-duplicated storage of data: the user ends up with the data in the original files AND in the blockstore, despite having used --nocopy.
To avoid the duplication one might think to unpin the roots existing pior to the add and refering to the leaves that match the blocks that are about to be added, and then repo gc to clear these leaves from the blockstore, before running add --nocopy, but
GC does not always cleanup the blocks in the filestore. I've seen block X in the filestore ls, pin ls X saying 'not pinned', then repo gc, and yet filestore ls X still finds it. I've also seen GC remove unpinned blocks from the filestore, so perhaps the former behavior is just a bug? (UPDATE: Blocks were held by MFS: ipfs files)
This might not be a general solution, because the leaves might be shared with unrelated roots R. As long as R is pinned, this is fine (I'm not looking to re-point R's leaf pointers to the filestore.), but after R is unpinned, we have the problematic duplication: data in raw filesystem file and same data in the blockstore. In the extreme case, this problem can make nocopy a noop (save net zero space after counting the raw filesystem files; instead of adding the negligible cost of intermediate nodes for the benefit of publishing the data to IPFS).
This approach cannot be applied retroactively after the add -- if you realize that the blocks existed in the blockstore after the add, then you have to clean them and then re-run add --nocopy again. This is less intuitive than having to clean up duplicates after finding out that there are duplicates.
Proposal: allow the add to duplicate between filestore and blockstore, but provide tools to manage this. If user requests something be added to the filestore with --nocopy, then add it unconditionally, but also give them a block rm option to remove from blockstore or filestore explicitly. The removal would only be allowed if the block is not referenced by a pinned parent. This check already happens, but with the proposed change the references need to be distinguished by the backend type of the block, not only by the block hash, because obviously the block hash is going to be referenced by the parent that was just added with --nocopy.
Then, to get into a state with only one copy of the data (in the original filesystem files): ipfs filestore dups | xargs -n 1 ipfs block rm --from-blockstore.
Both parts seem doable. The patch to the first part (I'm already using it):
--- a/filestore/filestore.go
+++ b/filestore/filestore.go
@@ -169,15 +169,6 @@ func (f *Filestore) GetSize(c cid.Cid) (int, error) {
// Has returns true if the block with the given Cid is
// stored in the Filestore.
func (f *Filestore) Has(c cid.Cid) (bool, error) {
- has, err := f.bs.Has(c)
- if err != nil {
- return false, err
- }
-
- if has {
- return true, nil
- }
-
return f.fm.Has(c)
}
The text was updated successfully, but these errors were encountered:
GC does not always cleanup the blocks in the filestore. I've seen block X in the filestore ls, pin ls X saying 'not pinned', then repo gc, and yet filestore ls X still finds it. I've also seen GC remove unpinned blocks from the filestore, so perhaps the former behavior is just a bug?
Both parts seem doable. The patch to the first part (I'm already using it):
That's probably not what you want to do. We wrap the blockstore with the filestore so that code will just cause us to think you don't have blocks that aren't in the filestore. I'm not sure if it'll break anything in practice (bitswap uses GetSize and most services use Get) but it's still a bad idea.
Blocks are not added to the filestore if the same blocks happen to be in the blockstore.
We can probably add a flag to the add command to handle this case but this isn't something the core team is likely to look into any time soon. It'll have to be an external contribution.
Both parts seem doable. The patch to the first part (I'm already using it):
That's probably not what you want to do.
Ok, then the consumer of the store API needs to be changed to simply call store.Add instead of if not store.Has then store.Add. The already-present check should be inside the store implementation, because as this case here shows, this logic is not generic and is really within the scope of a particular store implementation. A store may be implemented from composite stores, like in Filestore case here, and the semantics of which sub-store is modified on Add and which is looked up on Has etc should be entirely up to the implementation of the composite store.
Then, the Filestore will be able to implement the desired semantics: upon Add, it will check if the FS store contains the block and if not then add it to the FS store; upon Has, the logic will remain as it is now (check the non-filesystem-based blockstore, then check FS store).
Version information: 0.4.19-dev
Type: enhancement
Description:
add --nocopy
should add files to the filestore instead of the blockstore, but it doesn't always do so. Blocks are not added to the filestore if the same blocks happen to be in the blockstore. In this case, the blocks won't end up in the filestore, and so won't be seen infilestore ls
. This hinders the user from achieving non-duplicated storage of data: the user ends up with the data in the original files AND in the blockstore, despite having used--nocopy
.To avoid the duplication one might think to unpin the roots existing pior to the add and refering to the leaves that match the blocks that are about to be added, and then
repo gc
to clear these leaves from the blockstore, before runningadd --nocopy
, butGC does not always cleanup the blocks in the filestore. I've seen block X in the(UPDATE: Blocks were held by MFS:filestore ls
,pin ls X
saying 'not pinned', thenrepo gc
, and yetfilestore ls X
still finds it. I've also seen GC remove unpinned blocks from the filestore, so perhaps the former behavior is just a bug?ipfs files
)add --nocopy
again. This is less intuitive than having to clean up duplicates after finding out that there are duplicates.Proposal: allow the add to duplicate between filestore and blockstore, but provide tools to manage this. If user requests something be added to the filestore with
--nocopy
, then add it unconditionally, but also give them ablock rm
option to remove from blockstore or filestore explicitly. The removal would only be allowed if the block is not referenced by a pinned parent. This check already happens, but with the proposed change the references need to be distinguished by the backend type of the block, not only by the block hash, because obviously the block hash is going to be referenced by the parent that was just added with--nocopy
.Then, to get into a state with only one copy of the data (in the original filesystem files):
ipfs filestore dups | xargs -n 1 ipfs block rm --from-blockstore
.Both parts seem doable. The patch to the first part (I'm already using it):
The text was updated successfully, but these errors were encountered: