Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion - Onboarding projects with large DataCap requirements #94

Closed
jnthnvctr opened this issue Feb 17, 2021 · 15 comments
Closed

Discussion - Onboarding projects with large DataCap requirements #94

jnthnvctr opened this issue Feb 17, 2021 · 15 comments

Comments

@jnthnvctr
Copy link
Collaborator

There are a few ongoing projects that have substantial DataCap requirements - over and above what exists in the ecosystem today.

  1. The Shoah Foundation will require petabyte scale allocations to archive their entire data asset
  2. Filecoin Discover will require petabyte scale allocations to archive to onboard their data asset
    ...
    Inevitably there will be more

This issue is to kick off discussion about the ways in which as a community we can plan and support early use cases.

To disentangle two issues that I believe arise here:

  1. Early in this program we have limited amounts of DataCap in the ecosystem - though in a slightly more mature state this may not be a limitation. I believe there are two approaches here:
    a) In the rubric today, over subsequent allocations and elections Notaries increase their DataCap allocation - so it is possible that we simply run the process as it exists today and just hold many rounds of elections successively.
    b) An alternate approach is to define a process (while Notaries have less DataCap than the projects) where these projects can apply to the community to receive an allocation that is purpose built to allocate to this use case (and administered by a set of the existing Notaries). The benefit here is that other use cases that apply in this timeframe for DataCap would not be blocked.

  2. No single Notary would be able to service either of these projects properly (or other large scale ones). My proposal here would be that is actually fine - and Notaries should collaboratively support large scale efforts (which also will require additional scrutiny to make sure the Client is using the DataCap appropriately).

@s0nik42
Copy link

s0nik42 commented Feb 17, 2021

My preference tends to be for 1.b . It has the advantage to avoid bottleneck when onboarding a large client without impacting the notaries day to day allocation.
Notaries can jointly approved a datacap allocation plan for that specific client. Then a notary can be selected and could get a special allocation for providing the datacap to that specific client accordingly to the plan.
Advantages against 2 :

  • Avoid bottleneck
  • Keep the spirit that client are followed by a preferred notary
  • Less confusion , better tracking, and simplified interaction between the client and the notaries (Imagine if the client needs to report the CIDs to multiple location accordingly to the datacap, has to answer the same questions 10 times, etc...)
    Avantages against 1.a :
  • Ensure the datacap will be allocated accordingly to the initial plan. That datacap could be attached to a specific Notary address making it simplier for the notary and the ecosystem.

Happy to clarify if its not clear enough :)

@dkkapur
Copy link
Collaborator

dkkapur commented Mar 30, 2021

Hi folks - proposing the following for getting the discussion going on potential implementation paths:

(Let's define "large client" as a project/use case/Client needing > 500 TiB of DataCap.)

  • Large clients can apply using a dedicated application process (separate, established application form - similar to a Notary application) through an issue on this repo, on which anyone in the community or any Notary can participate in the due diligence process (ask questions, ask for clarifications etc., via comments).
  • This includes specific details on use-case, ideal DataCap amount, detailed deal strategy (how will they bring their data onto the network, shape of deals, with whom they will make deals, etc.), expected DataCap usage rate, and more.
  • The application needs to stay open for at least 2 weeks, during which if all open questions have been addressed, we discuss this in the Notary Governance call and try to establish consensus that this particular Client is worth providing DataCap for.
  • At least 7 Notaries need to be in support of the application, and volunteer to be signers on the DataCap allocation
  • Once the application is considered "approved", a DataCap "faucet" is built for them, with a maximum allocation rate of 100TiB/request, and 1 request/day.
  • Client can make a request to the faucet directly, and at least 4 (majority) of the Notaries need to "approve" the message (like a multisig) in order for the DataCap to be granted. These Notaries are also responsible for holding the Client responsible for following through on the intended plan.

This specifically allows for:

  • everyone in the community to participate in the due diligence process of the Client
  • there is a dedicated faucet set up for the Client so as not to block allocations for others in the ecosystem
  • multiple "watchers" on the account, with a majority of Notaries needing to approve allocations each time
  • large enough allocations to keep the Client unblocked for substantial amounts of time and make meaningful progress towards getting data onto Filecoin

@jnthnvctr @s0nik42 thoughts?

@s0nik42
Copy link

s0nik42 commented Apr 7, 2021

@dkkapur , I like the proposition, I think that type of client will need a single point of contact in any cases to deal with Fil+. I will recommend that we identified one of the 7 notaries to pick up that role when the project start.

Actions could be :

  • Handle client requests
  • Push other notaries when it the approval take times.
  • Get a global vision on the client rollout
  • Handle customer specifities in the process
  • Etc ...

@dkkapur
Copy link
Collaborator

dkkapur commented Apr 13, 2021

@s0nik42 thanks - agreed, we should have a single notary-lead chosen from the set as well. For an initial version of this type of faucet, I would suggest we scope this to the following to ensure that we are on a safe path to test and unblock projects such as Starling without creating too much of a risk for the Fil+ program:

  • public datasets that are mission aligned with Filecoin and Filecoin Plus
  • between 1-10 PiB
  • no miner receives DataCap > %5 of any allocation, all miners chosen are in good standing in reputation systems like filrep.io
  • Clients need to have used up > 90% of the prior DataCap allocation before coming to the faucet to request more, and we have enough automated systems / dashboards in place to ensure the group of Notaries signing have access to the data required to continue allocating more DataCap and verify that the client is operating in good faith and in accordance with the principles of the program + allocation strategy outlined in the original application
  • stored data should be readily retrievable on the network and this is regularly verified

What do you think of this? IMO erring on the side of caution early on to ensure we build safe practices for scaling this up with confidence in the future is a good way to proceed.

@s0nik42
Copy link

s0nik42 commented Apr 14, 2021

Hi @dkkapur, I think this is very good to start with.

@dkkapur
Copy link
Collaborator

dkkapur commented Apr 27, 2021

@s0nik42 thanks! We've had various conversations in Slack and offline with interested Clients and Notaries on this one in the last two weeks, so in tomorrow's Notary Governance call - let's finalize the approach for the initial proposal!

Recommending that we move forward based on the following (updating the bullets I shared above):

  • public datasets that are mission aligned with Filecoin and Filecoin Plus
  • DataCap request is greater than 500 TiB
  • client outlines a clear deal/DataCap allocation strategy in their application that is reasonable, which includes max % allocation per miner and methodology for identification of miners that are chosen (i.e., using reputation systems like https://filrep.io).
  • Clients need to have used up > 90% of the prior DataCap allocation before coming to the faucet to request more, and we have enough automated systems / dashboards in place to ensure the group of Notaries signing have access to the data required to continue allocating more DataCap and verify that the client is operating in good faith and in accordance with the principles of the program + allocation strategy outlined in the original application
  • stored data should be readily retrievable on the network and this is regularly verified (though the use of manual or automated verification that includes retrieving data from various miners over the course of the DataCap allocation timeframe
  • Instead of having a flat allocation rate throttle across all projects as was initially proposed above, we should instead have a throttle which is also contextualized based on the use case. Proposing that allocation amount and weekly allocation rate is specified in the application, and throttles are as follows:
    First allocation: lesser of 5% of total DataCap requested or 50% of weekly allocation rate
    Second allocation: lesser of 10% of total DataCap requested or 100% of weekly allocation rate
    Third allocation onwards: lesser of 20% of total DataCap request or 200% of weekly allocation rate

These updates enable a more nuanced approach to be taken with what is deemed "fair" or "reasonable" by a select set of Notaries that are then comfortable tracking and enforcing this. We should focus efforts towards building tooling that will help bring transparency into the system to ensure DataCap is being used to make Filecoin more useful!

Tactical next steps include:

  • setting up another application process for large clients requesting a faucet of this sort
  • testing and verifying that the multiSig notary allocation system works

@dkkapur
Copy link
Collaborator

dkkapur commented Apr 27, 2021

Draft of some of the questions that need to be included in the client application. Current plan is to manage these applications in a separate repo, i.e., github.com/filecoin-plus-large-clients. Would appreciate feedback on this!

Client Application

Core Information

  • Organization name:
  • Website / social media:
  • Total amount of DataCap being requested:
  • On-chain address to be notarized:

Project details

  • Share a brief history of your project and organization
  • What is the primary source of funding for this work?
  • What other projects/ecosystem stakeholders is this project associated with?

Use-case details

  • Describe the data being stored onto Filecoin
  • Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data)
  • What is the expected retrieval frequency for this data?
  • For how long do you plan to keep this dataset stored on Filecoin? Will this be a permanent archival or a one-time storage deal?

DataCap allocation plan

  • In which geographies do you plan on making storage deals?
  • What is your expected data onboarding rate? How many deals can you make in a day, in a week? How much DataCap do you plan on using per day, per week?
  • How will you be distributing your data to miners? Is there an offline data transfer process?
  • How do you plan on choosing the miners with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.
  • How will you be distributing data and DataCap across miners storing data?

@Fenbushi-Filecoin
Copy link

Draft of some of the questions that need to be included in the client application. Current plan is to manage these applications in a separate repo, i.e., github.com/filecoin-plus-large-clients. Would appreciate feedback on this!

Client Application

Core Information

  • Organization name:
  • Website / social media:
  • Total amount of DataCap being requested:
  • On-chain address to be notarized:

Project details

  • Share a brief history of your project and organization
  • What is the primary source of funding for this work?
  • What other projects/ecosystem stakeholders is this project associated with?

Use-case details

  • Describe the data being stored onto Filecoin
  • Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data)
  • What is the expected retrieval frequency for this data?
  • For how long do you plan to keep this dataset stored on Filecoin? Will this be a permanent archival or a one-time storage deal?

DataCap allocation plan

  • In which geographies do you plan on making storage deals?
  • What is your expected data onboarding rate? How many deals can you make in a day, in a week? How much DataCap do you plan on using per day, per week?
  • How will you be distributing your data to miners? Is there an offline data transfer process?
  • How do you plan on choosing the miners with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.
  • How will you be distributing data and DataCap across miners storing data?

Hey Deep, for the required materials on the application, you can use our guidelines for reference, which we believe can better handle big clients.
https://github.com/filecoin-project/filecoin-plus-client-onboarding/blob/main/Fenbushi%20Capital/Filecoin%20Plus%20Client%20Onboarding%20Guidellines%20-%20Fenbushi.pdf

@dkkapur
Copy link
Collaborator

dkkapur commented Apr 29, 2021

@Fenbushi-Filecoin - this is great, thank you for sharing! I will look through and propose some changes to this application structure. If there are any specific questions that from your experience have proven to be valuable, please let me know.

@dkkapur
Copy link
Collaborator

dkkapur commented Apr 29, 2021

Per the call this week (2021-04-27 governance call), we're working on getting a v1 implementation of this up and running in the next few weeks! I will keep this issue updated with progress.

@dkkapur
Copy link
Collaborator

dkkapur commented May 11, 2021

@Fenbushi-Filecoin - thanks again for sharing your comprehensive DataCap allocation writeup! Here are some things I think we should consider incorporating into the application:

  • Source of the dataset (in addition to what the data is, we should ask clients to elaborate where the data was obtained)
  • Sample data in some form (URL, link, IPFS CID or something similar)

@dkkapur
Copy link
Collaborator

dkkapur commented May 11, 2021

Took a deeper dive today into potential sources of issues in a system of this sort, and would like to propose the following in addition to all the above listed points. This is largely in an effort to serve an initial set of datasets that we can use to prove out the process with as we start to move to a larger scale of DataCap allocation and distribution.

  • Set an explicit upper bound per dataset application at 5 PiB
  • Initially run this application/DataCap allocation flow up 50 PiB in total - at which point we can circle back with takeaways and improve both the experience as well as the efficiency of the system
  • (repeating this from an earlier discussion in a governance call) 7 notaries signing onto the multisig, at least 3 regions should be represented
  • there should be no open disputes in the Fil+ ecosystem against a client in order for them to quality for DataCap to be assigned in this manner to them / one of their projects

@dkkapur
Copy link
Collaborator

dkkapur commented May 17, 2021

Update: per the conversation in the last notary governance call, https://github.com/filecoin-project/filecoin-plus-large-datasets has been set up to start testing this process out!

@XnMatrixSV
Copy link

@dkkapur Hi, there might be some problems with information of project details when submitting New issue of large-datasets application. Please check it.
filecoin-project/filecoin-plus-large-datasets#9
Describe the data being stored onto Filecoin.
Confirm that this is a public data set that can be retrieved by anyone on the Network
What is the expected retrieval frequency for this data?
For how long do you plan to keep this dataset stored on Filecoin? Is this a permanent archival or a temporary storage deal?
filecoin-project/filecoin-plus-large-datasets#7
Share a brief history of your project and organization.
What is the primary source of funding for this project?
What other projects/ecosystem stakeholders is this project associated with?

@dkkapur
Copy link
Collaborator

dkkapur commented Oct 7, 2022

This topic continues to evolve as part of the broader LDN theme / path to DataCap. As such, closing out this issue for now.

@dkkapur dkkapur closed this as completed Oct 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants