Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

PoSt Committee: Proposal for PoSTs - New challenge methods & removing VDFs #61

Closed
bvohaska opened this issue Jan 20, 2019 · 14 comments
Closed
Assignees
Labels

Comments

@bvohaska
Copy link

Goals

  1. Provide PoST with a reliable and verifiably-random challenge. This challenge must be verifiably random and uninfluenceable by some threshold of participants given by the security parameter s. i.e. a challenge can not be simulated or chosen by the miner generating the PoST or some threshold of miners who might collude with the PoST miner.

  2. Provide PoST a mechanism with which to provide time-boxing without the need for a VDF.

Context:

The goal of a Proof-of-SpaceTime (PoST) is to provide cryptographic assurance that a miner m_i has faithfully stored some data DATA for a period of time time. As part of the current Proof-of-Replication security model, it is assumed that there is no method by which m_i could have generated a PoST for DATA without having continuously stored threshold percentage of that data; ideally this threshold percentage thres_store would be 100%. In practice, thres_store is tuned by some security parameter s and is related to the cryptographic internals of PoST. As such PoST, can be represented as a blackbox function PoST that takes a representation of DATA and a challenge C,

Out_post = PoST(C,DATA)

Either the PoST method or the challenge C should provide a mechanism via which each PoST that miner m_i generates is time-dependent. This should ensure thatm_i should not be able to generate a valid PoST for DATA outside of some time box. This is directly tied to storage contract that miner m_i and some client c_j have agreed to on-chain as well as the amount of reward that m_i will receive over time from the aforementioned contract. An example of this might look like:

Contract-m_i<-->c_j = {
	Store DATA for Z days
	Total FIL for storage = d FIL
	Proving period = Z/10
	FIL paid per PoST which is presented every Z/10 time units
}

Additionally, we require that C be randomly generated. This prevents a miner from exploiting the cryptographic internals of the PoST method.

Current Method:

In order to generate a PoST a miner must

  • Calculate some output Out_vdf of a VDF seeded with some on-chain value called seed
  • Calculate PoST as Out_post = PoST(Out_vdf, DATA)

This method assumes that the VDF can not be cheated and seed was generated fairly.

##Potential Issues:

  • Sole reliance on on-chain randomness for generating a random challenge C. If a miner were able to manipulate C, they might gain some advantage in generating a PoST. This would lead to that miner having an advantage in receiving rewards for successful PoSTs.
  • Concerns surrounding VDF speedup attacks. If it were possible to speedup a VDF then a miner could gain the same reward advantage described above.
  • Need for VDF hardware. There is a significant cost in engineering and deploying VDF hardware. There is a reasonable assumption that such hardware might be needed. If a solution exists that is software-only, this would be preferable.

##Solution:

Suppose,

  • m_i in M : M is the set of miners who have been active in the last w block rounds

  • pk_i in PK : PK is the set of all public keys and pk_i is the public key of m_i

  • seed is a value stored on-chain

  • VRF_ski(pk_i, seed) is a Verifiable Random Function generated from seed and pk_i and committed to m_i through sk_i

  • map_k(x) is a function that maps a string x to a set of miners M_k : |M_k| = k and there exists some canonical ordering of m_i.

  • thresh_k(s) is a function that returns an integer less than k given a security parameter s

  • rounds_min(s)

  • Sign_sk_i(x) is a signature of x using the secret key sk_i of miner m_i

Assume,

  • seed was generated verifiably randomly at block round r-1
  • At least 2/3 of miners in M are honest

Then,

For a given block round r with associated seed and PoST generating miner m_i compute,

  1. (l, pi) = VRF_ski(pk_i, seed) : l is a random string generated by the VRF and pi is the VRF proof

  2. M_k = {m_l0, m_l1, ..., m_lk} = map_k(l)

For each m_li in M_k, request a random string returned as a tuple,

  1. (r_li, Sign_sk_li(r_li))

Once m_i has received greater than t = thresh_k(s) responses, m_i computes

  1. H_i = H(r_l0 || r_l1 || ... || r_lt)

  2. S_i = Sign_sk_li(H_i)

H_i is now the random challenge used to generate a PoST of some data for block round r. Miner m_i will now compute the PoST and compute,

  1. pi_post = PoST(H_i, DATA)
  2. AggSig = BLS Aggregation of S_iwithSign_ski(pi_post)`
  3. msg_onchain = {(r_li, ...,r_lt), Recvover,AggSig, pi_post}

Where Recover is a bit string that maps m_li to a bit in Recover. If a bit in Recover is set then m_li was included in H_i in that order. Size(Recover) = Ceil(log2(t)) . Note that r_li can be small given some security parameter s. M_k can be determined by verifiers by validating (l, pi) and calculating map_k(l).

By using a VRF in (1) to generate M_k via proxy (2), we can ensure that the miner m_i can not choose which group of miners will participate in generating a random challenge H_i. We have stated that at least 2/3 of miners are honest by assumption. We must also state that t be sufficiently close to k and k be sufficiently large to ensure that m_i can not choose t entities that could collude together to give m_i and advantage. Note that it follows that if at lease of of the t miners represented in (4) is honest, then m_i should not gain a sufficient advantage in choosing a PoST random challenge as the PoST could not be generated before the honest r_li was received.

Note on ensuring solution attacks are unsuccessful. Unlike Algorand, we may not wish to sort this partition (sortition in Algorand) via the Filecoin Power Table as high weight miners would receive disproportionately many requests for an r_li. Instead we insist that M be chosen from the set of miners who (1) have nonzero power and (2) have participated in at least one of the last rounds_min(s) block rounds.

Using the above construction for generating PoST challenges, we can utilize most of the security model from Algorand and apply it to a much smaller consensus-like problem for challenge generation. In this new model, we need to ensure with probability p_thresh that least one of the miners used in H_i is honest. Given that 2/3 miners are assumed to be honest and M_k was sampled uniformly across a pool of miners with at least this level of honesty, we require that ,

thresh_k(s) > 2/3*|M_k| + 1

This ensures that for any set of miners M_thresh_k < M_k, there will be at least one miner m_li that is honest with a probability p_min. This probability is to be explored in another document.

Conclusion

  • Using the above method, we can remove VDFs from PoST and add a tunable source of randomness to each PoST. We will no longer need to consider VDF hardware as a function of Filecoin.
  • We can borrow much of the security model and proofs for Algorand and use them to describe the security of this method.
  • As a tradeoff, we will need to store thresh_k(s)*len(r_i) bits on-chain per PoST.
  • There is an extension of this protocol where each r_i is generated via (1). Each miner m_li will use l as r_i. As a result, the block leader can perform compression of each r_i and store each r_i as a pointer to any PoSTs that miner m_li may have generated at block round r. Furthermore, we could require that M_k be extracted from the list of miners who are expected to provide PoSTs at block round r. Though this list could be much smaller than the M_k described in this document and may present issues in tuning the security parameter s.

Related issues:

@bvohaska bvohaska self-assigned this Jan 20, 2019
@lucaniz
Copy link

lucaniz commented Jan 21, 2019

Brian, thanks for writing this up.
Some questions I have:

  1. Assumptions

Assume,
seed was generated verifiably randomly at block round r-1

I think we should think about this assumption. As Jeromy pointed out in this issue https://github.com/filecoin-project/research-private/issues/113

resampling mitigation has some hard problems. One, is that resampling from the latest block is potentially problematic. If the miner selects a block that eventually is not selected as the ‘winner’, they will have to restart that portion of their PoSt. The ‘simple’ fix for this is for miners to complete PoSts on all potential forks, this can get really expensive. Another potential fix would be to re-sample from some point back in the chain, this would greatly reduce the chance that a selected chain is not accepted, however, it trades off timing security, meaning that an attacker can now gain a time advantage up to the look back parameter.

I can be wrong, but it seems to me that as it is the seed assumption (taking the seed from block round r-1) could lead to the first problem mentioned by @whyrusleeping . on the other hand, even going some time back in the chain could be a not that smart alternative, as stated in https://github.com/filecoin-project/research-private/issues/113

  1. Threshold challenge evaluation
  1. M_k = {m_l0, m_l1, ..., m_lk} = map_k(l)
    For each m_li in M_k, request a random string returned as a tuple,
  2. (r_li, Sign_sk_li(r_li))
    Once m_i has received greater than t = thresh_k(s) responses, m_i computes
  3. H_i = H(r_l0 || r_l1 || ... || r_lt)

The miner m_i can basically select by himself the subset of r_li's that countribute to the evaluation of H_i. This means that, among all the r_li's the he receives, he can select the subset of cardinality t that he wants, and somehow influence the challenge H_i. Am I missing something?

  1. Waiting until the end (?)
    Can't a malicious miner just wait until the end and then evaluate all the PoST in the last step after reconstructing all the challenges in one shot?

I say that because one of the reason why we were using VDF (concatenating them) was that we wanted to prevent a malicious miner to be able to wait until the last time-slot and generate all the challenges in one shot.

@nicola
Copy link
Contributor

nicola commented Jan 21, 2019

Hey @bvohaska quick note here:

  • what prevents a miner to wait till the end of the proving period and get the challenges at the very end, compute very quickly the PoSt, and submit it?
  • how does this differ from taking the number from the ticket chain? (my answer would be avoiding the single bit of bias)

I really like the intuition behind this. Look forward to discuss this more!

@sternhenri sternhenri transferred this issue from another repository Jan 21, 2019
@bvohaska
Copy link
Author

Extension to the above solution

The previous solution only works for PoSTs where the total proving period is less than the expected SEAL time. An extended construction is presented here.

Additions to the security model

Suppose a miner enters a contract where they must prove the existence of DATA at least k in for a proving period of n blocks starting with block r. Furthermore, let the contract specify m proving periods for which the miner must submit PoSTs on-chain. Any solution should ensure that,

  1. A miner should not be able to generate PoST for a future proving period.
  2. A miner should not be able to generate a PoST for a time sufficiently far in the past. For example, suppose a miner decides to wait a certain number of block before generating a PoST. After some time ``r - wiggle(s), no valid PoST should be able to be generated. wiggle(s)` is the number of blocks in the past for which a miner should be able to generate a valid PoST; `wiggle(s)` is a function of a security parameter `s` and should consider network latency.
  3. A miner should not be able to generate PoST_2 before first generating PoST_1 and PoST_0 .
  4. A miner m_li will receive some small reward for participating in this protocol.

Clarifications

This protocol uses the term proving period to mean the amount to time between PoSTs. The total amount of time that a miner m_i must prove storage of DATA is denoted as the contract length. Other documentation may use different terms.

In step (1),

(l_m, pi) = VRF_ski(pk_i, seed_r) : l_m is a random string generated by the VRF for proving period m and pi is the VRF proof

l_m and r are linked via the blockchain. A storage contract will specify which blocks will correspond to m. See the example contract below for more details.

seed should be listed as seed_r as seed depends on the current block. Moreover, seed can not be predicted for block r+1 by assumption.

Between steps (2) and (3) there should be three additional steps,

(2b) Request(m_li) = {l_m, pi, 'r'}

(2c) m_li verifies m_li is in map_k(l_m) and rejects if not

(2d) m_li must check that

Updated Protocol

  • seed_r is an unpredictable value stored on-chain. This value should not be easily influenced by any threshold of parties.
  • VRF_ski(pk_i, seed_r) is a Verifiable Random Function generated from seed and pk_i and committed to m_i through sk_i

Additional assumptions:

  • A miner m_i has entered into a storage contract with a client c_j,

    Contract(m_i, c_j) = {
    	Store DATA for Z blocks
    	FIL reward =  Y (FIL)
    	Proving period = Z/10
    	FIL to be paid per PoST (every Z/10 blocks)
    }
    

    10 is arbitrary and used for example. Suppose k = 10 , k must divide Z.

    m_i must provide a PoST every Z/10 blocks which can be submitted to the chain as one large transaction or as 10 separate transactions.

For a given block round r with associated seed_r and PoST generating miner m_i compute,

  1. (l_m, pi) = VRF_ski(pk_i, seed_r) : l_m is a random string generated by the VRF for proving period m and pi is the VRF proof

  2. M_k = {m_l0, m_l1, ..., m_lk} = map_k(l)

For each m_li in M_k, request a random string r_li,

  1. (b) Request(m_li) = {l, pi, r}

  2. (c) m_li must verify m_li in map(l_m) or reject the request.

  3. (d) m_li must verify that proving period m is within wiggle(s) blocks of the intended proving period on the storage contract. If not, reject.

m_li returns the following as a tuple,

  1. (r_li, Sign_sk_li(r_li))

Once m_i has received greater than t = thresh_k(s) responses, m_i computes

  1. H_i = H(r_l0 || r_l1 || ... || r_lt)

  2. S_i = Sign_sk_li(H_i)

H_i is now the random challenge used to generate a PoST of some data for block round r. Miner m_i will now compute the PoST and compute,

  1. pi_post = PoST(H_i, DATA)
  2. AggSig = BLS Aggregation of S_iwithSign_ski(pi_post)`
  3. msg_onchain = {(r_li, ...,r_lt), Recvover,AggSig, pi_post}

If all PoST messages are to be sent together,

  1. Generate Mega_PoST = Compress(msg_onchain_i) for all on-chain messages corresponding to each proving period i. Submit Mega_PoST on-chain.

Compress can be any function that concisely represents a msg_onchain such as a SNARK. SNARKS may be particularly attractive since all functions above can be represented as elliptic curve operations.

Conclusion

For a sufficiently small proving period (see notes above), a miner m_i will gain a negligible advantage in waiting to generate a PoST. Moreover, should that miner wait too long, a PoST will no longer be able to be generated for a given proving period and would result in a penalty.

Since each PoST relies on a pseudo-random sampling of the Filecoin network, m_i will be delayed in generating a PoST until it has seen a threshold of messages from M_k.This results in m_i not being able to generate PoSTs on its own nor can m_i reasonably collude with malicious m_li to generate PoSTs.

This construction is fairly general and should be modifiable to include other requirements and constraints as needed.

##Additional Protocol Notes:

  • If the proving period is smaller than time(SEAL) then there is no incentive for a malicious actor m_bad to perform a SEALSTACKing attack.
  • This construction does not allow for proving periods that are smaller than a block time.
  • There are alternative constructions that strengthen the security model using ZKPs during the 2* steps but these constructions are computationally expensive. As a result, the incentive given to m_li would need to be larger which in turn would raise the total storage cost of DATA.

@bvohaska
Copy link
Author

@lucaniz:

  • seed is a problem but it's a problem for all protocols that don't use a random beacon or MPC. It's a bit out of scope for this protocol but it should always be considered. However, this construction should be robust against a selection/biasing attack as long as the VRF and H operate in a large enough field.
  • Given the above, the attack could test different combinations of H_i. For a k = 15 and a thresh_k(s) = 10 this is something like 2^22 choices. Nevertheless, this shouldn't enable the attacker to gain any advantage as they would need to know H_i a priori to get some sort of economic advantage. Moreover, we assume at least one honest miner contributes to H_i. As long as r_li is sufficently long, then guessing H_i will be impractical.
  • Hopefully, this new construction addresses your last point about waiting to produce all PoSTs.

@nicola:

  • Good point. The new construction should address this.
  • If by number you mean seed then I think it might not matter as long as the VRF and H draw from a sufficiently large space. In the extreme case, suppose one could choose seed. Then one could choose a favorable VRF output. If the VRF space is sufficiently large then the attacker would need to perform a large number of operations to determine a favorable l in order to generate a favorable M_k. If the number of honest miners is large (we assume this), the the attacker would need to find a seed such that the resulting M_k has at least thresh_k(s) + 1 malicious actors. In this model we would need to calculate the expected number of VRF and map(l) calculations Calc needed to find a favorable M_k with probability p_favorable within a block time. For our purposes, as long as seed is mostly fair, Calc will be impractically large.

@lucaniz
Copy link

lucaniz commented Jan 24, 2019

Hi Brian,
thanks for answering.
A couple of additional comments:

  • Given the above, the attack could test different combinations of H_i. For a k = 15 and a thresh_k(s) = 10 this is something like 2^22 choices. Nevertheless, this shouldn't enable the attacker to gain any advantage as they would need to know H_i a priori to get some sort of economic advantage. Moreover, we assume at least one honest miner contributes to H_i. As long as r_li is sufficently long, then guessing H_i will be impractical.

You are 100% right when saying that it's impractical for an attacker to guess H_i. But what I'm saying is different: what i'm saying is that an attacker can discard H_i if it's a challenge he can not answer, and generate a new one. In the end (in your example), he only needs a vector of 10 strings. If the first 10 he receives lead to a challenge he can not answer, he'll try a different combination/a different 10-strings set until he hits a challenge H_i he is confortable with.

This attack would not be viable if any subset of 10 strings out of 15 gives the same result (and so the challenge generation is kind of deterministic, once you have 10 strings [basically no matter which 10 strings you have, you'll get the same H_i]). Nico said that's doable with threshold signatures (as Dfinity does), but I don't know if you were meaning this.

Moreover, it seems to me that either we have a PoST/we go on chain every (T_seal -\epsilon), or we are in trouble.

As an additional note, we could also make Seal-Stamp to go on-chain every (T_seal -\epsilon) (and nto every block, Nico and I were discussing about it), but it still seems undesirable.

I'm wondering if this can be overcome somehow, but it seems kind of inherent (if we don't use VDFs)

Minor details:

Once m_i has received greater than t = thresh_k(s) responses, m_i computes
4. H_i = H(r_l0 || r_l1 || ... || r_lt)
5. S_i = Sign_sk_li(H_i)

I guess in 5. we are using sk_i (like m_i 's secret key) to sign H_i, right?

@lucaniz
Copy link

lucaniz commented Jan 25, 2019

Idea on the updated protocol

@bvohaska and I discussed the updated protocol (find it below)

Updated Protocol

  • seed_r is an unpredictable value stored on-chain. This value should not be easily influenced by any threshold of parties.
  • VRF_ski(pk_i, seed_r) is a Verifiable Random Function generated from seed and pk_i and committed to m_i through sk_i
    Additional assumptions:
  • A miner m_i has entered into a storage contract with a client c_j,
Contract(m_i, c_j) = {
	Store DATA for Z blocks
	FIL reward =  Y (FIL)
	Proving period = Z/10
	FIL to be paid per PoST (every Z/10 blocks)
}

10 is arbitrary and used for example. Suppose k = 10 , k must divide Z.
m_i must provide a PoST every Z/10 blocks which can be submitted to the chain as one large >transaction or as 10 separate transactions.
For a given block round r with associated seed_r and PoST generating miner m_i compute,

  1. (l_m, pi) = VRF_ski(pk_i, seed_r) : l_m is a random string generated by the VRF for proving period m and pi is the VRF proof
  2. M_k = {m_l0, m_l1, ..., m_lk} = map_k(l)
    For each m_li in M_k, request a random string r_li,
  3. (b) Request(m_li) = {l, pi, r}
  4. (c) m_li must verify m_li in map(l_m) or reject the request.
  5. (d) m_li must verify that proving period m is within wiggle(s) blocks of the intended proving period on the storage contract. If not, reject.
    m_li returns the following as a tuple,
  6. (r_li, Sign_sk_li(r_li))
    Once m_i has received greater than t = thresh_k(s) responses, m_i computes
  7. H_i = H(r_l0 || r_l1 || ... || r_lt)
  8. S_i = Sign_sk_li(H_i)
    H_i is now the random challenge used to generate a PoST of some data for block round r. Miner >m_i will now compute the PoST and compute,
  9. pi_post = PoST(H_i, DATA)
  10. AggSig = BLS Aggregation of S_iwithSign_ski(pi_post)`
  11. msg_onchain = {(r_li, ...,r_lt), Recvover,AggSig, pi_post}
    If all PoST messages are to be sent together,
  12. Generate Mega_PoST = Compress(msg_onchain_i) for all on-chain messages corresponding to >each proving period i. Submit Mega_PoST on-chain.
    Compress can be any function that concisely represents a msg_onchain such as a SNARK. >SNARKS may be particularly attractive since all functions above can be represented as elliptic curve >operations.

two considerations:

  1. instead of an hash of strings we could ask for a signature from a threshold signature scheme which allows for coming up with a value once t signatures are collected (and, moreover, no matter which those t signatures are, the result is the same with any t signatures). @nicola was pointing me to this thing.

  2. in 4(c)

  1. (c) m_li must verify m_li in map(l_m) or reject the request.

we would add also a verification of a proof made at the former step (sent along with strings used to get the challenges and their signatures). In this. If the proof does not verify/is missing or m_li \notin map(l_m) m_li rejects.

The intuition we had is that, if thresh_k(s) miners are honest, then prover is getting enough strings (or signatures, according to point 1)) if and only if he did the proof right at the former step.

In this way (we have to double check), it seems that the only proof that need to be verified is the one at the very end.

@lucaniz lucaniz self-assigned this Feb 12, 2019
@lucaniz
Copy link

lucaniz commented Feb 28, 2019

Progress on Committee sampling during research week in Madrid. We came up with a new method for sampling a seed in a committee. Writing up the new construction in the overleaf doc, next step is proving security

@nicola nicola changed the title Proposal for PoSTs - New challenge methods & removing VDFs PoSt Committee: Proposal for PoSTs - New challenge methods & removing VDFs Mar 19, 2019
@nicola nicola mentioned this issue Mar 21, 2019
@nicola
Copy link
Contributor

nicola commented Mar 23, 2019

Issues with this construction so far:

Nodes must be paid or rewarded somehow (this can happen either through a payment or through power), this means that the list of elected committee must be reported on chain and for each there must be some form of status update. Considering ~20 users per committee every 10 mins, it would result in (20624 = 2,880) 2800 state updates per miner per day..

Next steps

  • Find the practical number of state updates per day considered fine
  • Evaluate possible smaller committee sizes that could fit these scaling requirements

@bvohaska
Copy link
Author

bvohaska commented Mar 28, 2019

@nicola: good point. One if the things we've been exploring is letting a committee vote on more than one challenge and reduce the number of updates.

A small technical point, all updates happen at the end of the protocol and only once. So while there are quite a few state updates they happen only at the end of the protocol. This immediately spawns the question: "Then how do I get rewarded with Power if this reward is tied to a specific deal?". The best answer swe have so far are (1) a Power reward from participating in this committee should have a finite and constant TTL OR (2) perhaps the reward is a psudeo-FIL that allows the recipient to use the reward to reduce gas or transaction costs.

ping @zixuanzh and @whyrusleeping for a sanity check.

@bvohaska
Copy link
Author

For updates, @lucaniz, I, at al. were recently inspired to explore new threshold schemes that (1) don't need to particularly scalable beyond 30 participants. We're pretty excited about this.

@nicola
Copy link
Contributor

nicola commented Apr 1, 2019

I don't have clear how your solution (1 and 2) and threshold schemes that can scale beyond 30 participants can solve the problem of having to pay/update power table for thousand of users

@bvohaska
Copy link
Author

bvohaska commented Apr 2, 2019

@nicola: In short, we are trying to reduce the number of participants needed to make such a committee protocol work. In (1), we are making assumptions about the implementation for practicality. In (2), we are making assumptions about security for practicality.

(1) Shrink the number of times an update is needed (but not the total amount of updates). This assumes that it is easier and faster to batch update any table. Given the assumed speed of updates to actors, a single batched update to 2k users should not be particularly expensive. (@frrist, @dignifiedquire: is this true in implementation?). The underlying assumption is that individual updates will be taxing on caching.

(2) Shrink the total number of updates. Thresholding allows us to assume a 1-honest member security model. This changes the total number of participants needed to make the protocol robust against malicious activity. In this case, our new Chernoff bound would represent P(X > 1 honest) > 0.95.

@nicola nicola added proofs and removed proofs labels Apr 3, 2019
@nicola
Copy link
Contributor

nicola commented May 23, 2019

@lucaniz, @bvohaska I guess there is a lot of write up on this, how far are we to put the PDF up online and close this issue?

@bvohaska
Copy link
Author

@nicola hoping to have something published internally by Friday. Publish externally a week or two later to allow for comments.

@lucaniz lucaniz closed this as completed Aug 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants