Skip to content
This repository has been archived by the owner on Aug 2, 2021. It is now read-only.

Pinning content #1509

Merged
merged 72 commits into from
Aug 7, 2019
Merged

Pinning content #1509

merged 72 commits into from
Aug 7, 2019

Conversation

jmozah
Copy link
Collaborator

@jmozah jmozah commented Jun 23, 2019

Useful Pinning Links:

ethersphere/user-stories#10
https://swarmresear.ch/t/pinned-content-in-swarm/18
#1274

  • Pin during upload

    • swarm up --pin )
    • Add a pin header in the http API
    • If header present, set the pin Counter to 1 for the file & the manifest in the retrievalDataIndex
    • Store the top level manifest/file hash in the pinIndex
  • Pin after upload
    - swarm pin <top level manifest/file hash>
    - Increase the pin counter for all the chunks in the merkle tree

  • Take care of different types of upload

    • Tar upload
    • Raw upload
    • Multipart upload
  • Unpin pinned content

    • swarm unpin <top level manifest/file hash>
    • decrement the pinCounter by 1 for the entire merkle tree
  • List pinned files
    - display all the top level hashes that are pinned
    - Also display their pin counters and size

  • During GC, don't collect pinned chunks

Copy link
Member

@acud acud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmozah a few notes about this PR:

  1. the delta is huge (61 files so far). we'd have to break this down to smaller units
  2. there's no test coverage whatsoever for the new functionality
  3. the necessity of pinCounter (which is always set to 1 in the api) is probably arguable
  4. pinCounter definitely should not be uint8. Imagine that within a certain pin you limit yourself to 255 instances of the same chunk, which is very low (just imagine all of the .DS_Store files and other possible examples of just 32 byte repetitions)
  5. i'm not sure i understand why we would need to store information about whether to traverse a manifest as part of the localstore index. it seems like to much of high-level information to store somewhere
    i think that a good next step would be to discuss the correct test vectors for this functionality and proceed from there

api/api.go Outdated Show resolved Hide resolved
api/api.go Outdated Show resolved Hide resolved
api/api.go Outdated Show resolved Hide resolved
api/api.go Outdated Show resolved Hide resolved
api/api.go Outdated Show resolved Hide resolved
storage/localstore/localstore.go Outdated Show resolved Hide resolved
storage/localstore/localstore.go Show resolved Hide resolved
storage/localstore/localstore.go Outdated Show resolved Hide resolved
storage/localstore/localstore.go Outdated Show resolved Hide resolved
storage/localstore/localstore.go Outdated Show resolved Hide resolved
@acud acud assigned jmozah and acud Jul 1, 2019
@acud acud requested review from zelig and janos July 3, 2019 06:40
@jmozah jmozah changed the title [WIP] Pinning content Pinning content Jul 16, 2019
@jmozah jmozah requested a review from zelig August 1, 2019 20:44
zelig
zelig previously approved these changes Aug 1, 2019
api/client/client_test.go Outdated Show resolved Hide resolved
state/dbstore.go Show resolved Hide resolved
storage/localstore/gc.go Outdated Show resolved Hide resolved
storage/pin/pin.go Show resolved Hide resolved
storage/pin/pin.go Outdated Show resolved Hide resolved
// 1) Whether the file is a RAW file or not
// 2) Size of the pinned file or collection
// 3) the number of times that particular file or collection is pinned.
func (p *API) ListPinFiles() (map[string]FileInfo, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this function is used only in tests, maybe not to export it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No.. This is exposed outside too. This is the only way through which the node owner will know what files are pinned , their size and their type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. In general, exposing such listing functionality should be done with some form of pagination, to prevent very large responses. Maybe that can be considered later if needed.

Copy link
Collaborator Author

@jmozah jmozah Aug 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any other command in swarm done with pagination? Just to see and understand. I agree that we should do a pagination... this will be especially useful for dapps which are pinning and pinning service operators like us. But as you said.. we should take it later.. I want to push this core functionality first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that there is in classical way. But manifests implement a tree that can be considered as a way not to represent all available data at once.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I get it.

janos
janos previously approved these changes Aug 2, 2019
Copy link
Member

@janos janos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 @jmozah

zelig
zelig previously approved these changes Aug 3, 2019
Copy link
Member

@zelig zelig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well done. this PR defo holds the record for most comments ever :)

@jmozah jmozah dismissed stale reviews from zelig and janos via 1ca63c6 August 5, 2019 07:34
zelig
zelig previously approved these changes Aug 5, 2019
@jmozah
Copy link
Collaborator Author

jmozah commented Aug 5, 2019

@acud @zelig @janos I have finished all your comment and multiple re-comments. I hope i have taken care of all the issues you have raised. Thank you for the review. Please approve and merge this PR if you are okay.

Copy link
Member

@janos janos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that I have approved it, but I've only submitted a comment. Sorry, my last comment was intended to be an approval.

@acud acud merged commit b635061 into ethersphere:master Aug 7, 2019
@acud
Copy link
Member

acud commented Aug 7, 2019

@jmozah thanks for your infinite patience and endless iterations on this. this is commendable and admirable. and once more thank you for the great work on this PR. it has made great distance and I think that the end-product is great.
bravo 🙏 :shipit:

@jmozah
Copy link
Collaborator Author

jmozah commented Aug 7, 2019

@acud Thanks for your persistence and nit picking :-). The end result looks great because of you.

@skylenet skylenet added this to the 0.4.4 milestone Aug 9, 2019
chadsr added a commit to chadsr/swarm that referenced this pull request Sep 23, 2019
* 'master' of github.com:ethersphere/swarm: (54 commits)
  api, chunk, cmd, shed, storage: add support for pinning content (ethersphere#1509)
  docs/swarm-guide: cleanup (ethersphere#1620)
  travis: split jobs into different stages (ethersphere#1615)
  simulation: retry if we hit a collision on tcp/udp ports (ethersphere#1616)
  api, chunk: rename Tag.New to Tag.Create (ethersphere#1614)
  pss: instrumentation and refactor (ethersphere#1580)
  api, cmd, network: add --disable-auto-connect flag (ethersphere#1576)
  changelog: fix typo (ethersphere#1605)
  version: update to v0.4.4 unstable (ethersphere#1603)
  swarm: release v0.4.3 (ethersphere#1602)
  network/retrieve: add bzz-retrieve protocol (ethersphere#1589)
  PoC: Network simulation framework (ethersphere#1555)
  network: structured output for kademlia table (ethersphere#1586)
  client: add bzz client, update smoke tests (ethersphere#1582)
  swarm-smoke: fix check max prox hosts for pull/push sync modes (ethersphere#1578)
  cmd/swarm: allow using a network interface by name for nat purposes (ethersphere#1557)
  pss: disable TestForwardBasic (ethersphere#1544)
  api, network: count chunk deliveries per peer (ethersphere#1534)
  network/newstream: new stream! protocol base implementation (ethersphere#1500)
  swarm: fix bzz_info.port when using dynamic port allocation (ethersphere#1537)
  ...
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants